Skip to content

Skip duplicate keys in add() instead of erroring #6

@titusz

Description

@titusz

Problem

When add_many encounters a duplicate key (intra-batch or cross-batch within the same index), the C++ executor partially adds keys before the memory_order_relaxed stop flag propagates to other threads. The Python caller receives RuntimeError: Duplicate keys not allowed in high-level wrappers, but keys processed before the stop are physically committed. This leaves the index in an inconsistent state — the caller has no way to know which keys were added.

The current behavior is not useful to anyone — it's a partial commit disguised as an error.

Affected code (index_dense.hpp:2085-2086):

if (!multi() && config().enable_key_lookups && contains(key))
    return add_result_t{}.failed("Duplicate keys not allowed in high-level wrappers");

Downstream impact

In iscc-usearch, a batch add() with one duplicate key drops the entire batch but silently commits some keys. The index size() overcounts, bloom filters go out of sync, and the dirty counter is wrong. See iscc/iscc-usearch#21.

Proposal

Hard-code a silent skip for duplicate keys instead of erroring:

if (!multi() && config().enable_key_lookups && contains(key))
    return add_result_t{};  // silent no-op, executor continues

The existing contains(key) check already runs on every key — this just changes the branch outcome from "error that corrupts state" to "skip and continue." No additional overhead, no config flag needed.

Callers who need to detect duplicates can check contains() before calling add(). Callers who want multi-value-per-key already use multi=True.

Scope

  • index_dense.hpp: one-line change at the duplicate-detection branch
  • Tests: batch add with intra-batch and cross-batch duplicates verifying silent skip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions