Problem
When add_many encounters a duplicate key (intra-batch or cross-batch within the same index), the C++ executor partially adds keys before the memory_order_relaxed stop flag propagates to other threads. The Python caller receives RuntimeError: Duplicate keys not allowed in high-level wrappers, but keys processed before the stop are physically committed. This leaves the index in an inconsistent state — the caller has no way to know which keys were added.
The current behavior is not useful to anyone — it's a partial commit disguised as an error.
Affected code (index_dense.hpp:2085-2086):
if (!multi() && config().enable_key_lookups && contains(key))
return add_result_t{}.failed("Duplicate keys not allowed in high-level wrappers");
Downstream impact
In iscc-usearch, a batch add() with one duplicate key drops the entire batch but silently commits some keys. The index size() overcounts, bloom filters go out of sync, and the dirty counter is wrong. See iscc/iscc-usearch#21.
Proposal
Hard-code a silent skip for duplicate keys instead of erroring:
if (!multi() && config().enable_key_lookups && contains(key))
return add_result_t{}; // silent no-op, executor continues
The existing contains(key) check already runs on every key — this just changes the branch outcome from "error that corrupts state" to "skip and continue." No additional overhead, no config flag needed.
Callers who need to detect duplicates can check contains() before calling add(). Callers who want multi-value-per-key already use multi=True.
Scope
index_dense.hpp: one-line change at the duplicate-detection branch
- Tests: batch add with intra-batch and cross-batch duplicates verifying silent skip
Problem
When
add_manyencounters a duplicate key (intra-batch or cross-batch within the same index), the C++ executor partially adds keys before thememory_order_relaxedstop flag propagates to other threads. The Python caller receivesRuntimeError: Duplicate keys not allowed in high-level wrappers, but keys processed before the stop are physically committed. This leaves the index in an inconsistent state — the caller has no way to know which keys were added.The current behavior is not useful to anyone — it's a partial commit disguised as an error.
Affected code (
index_dense.hpp:2085-2086):Downstream impact
In
iscc-usearch, a batchadd()with one duplicate key drops the entire batch but silently commits some keys. The indexsize()overcounts, bloom filters go out of sync, and the dirty counter is wrong. See iscc/iscc-usearch#21.Proposal
Hard-code a silent skip for duplicate keys instead of erroring:
The existing
contains(key)check already runs on every key — this just changes the branch outcome from "error that corrupts state" to "skip and continue." No additional overhead, no config flag needed.Callers who need to detect duplicates can check
contains()before callingadd(). Callers who want multi-value-per-key already usemulti=True.Scope
index_dense.hpp: one-line change at the duplicate-detection branch