Skip to content

Concurrency hardening for BfTree parallel workloads#1158

Merged
JordanMaples merged 8 commits into
mainfrom
jordanmaples/bftree_parallelism
Jun 30, 2026
Merged

Concurrency hardening for BfTree parallel workloads#1158
JordanMaples merged 8 commits into
mainfrom
jordanmaples/bftree_parallelism

Conversation

@JordanMaples

@JordanMaples JordanMaples commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Problem

NeighborProvider::append_vector performs a read-modify-write cycle (read existing neighbors → append → write back) without synchronization. Under concurrent mutation, the last writer wins and silently drops the other's edges. Stress testing shows edge loss under multi-thread contention on a single vertex.

Additionally, the dual-store set_element wrote quant before full-precision, meaning if the full-precision write failed after quant succeeded, it would leave a quantized ghost with no backing data. And delete did not clean up the quant store entry.

Solution

Striped Mutex table for per-vertex synchronization

  • StripedLocks: a fixed-size table of Mutex<()> stripes (diskann-bftree/src/locks.rs).
  • Stripe count is derived from hardware parallelism: (cpus * 4).next_power_of_two().max(64) — constant memory regardless of dataset size, suitable for billion-scale workloads.
  • Vertex IDs map to stripes via Fibonacci multiply-shift hashing (floor(2^64 / phi)), keeping false contention negligible.
  • Each mutating op acquires exactly one stripe lock for the target vertex, does its work, and drops the guard — no nested locking, so deadlock is structurally impossible.

Why striped instead of per-vertex locks?

BfTree's value proposition is spilling to disk for datasets exceeding RAM. At 1B vectors, per-vertex locks would consume gigabytes of in-memory locks — defeating the purpose. Striped locks keep memory constant at the cost of rare false contention.

Reads are lock-free

get_neighbors and get_vector acquire no stripe lock. They rely on bf-tree's internal thread safety and the fact that DiskANN search is approximate by design: a reader may briefly observe a partially-updated neighbor list, which affects recall momentarily rather than correctness. Writers (set_element, delete, set_neighbors/write_neighbors, append_vector/write_append) serialize on the stripe.

Dual-store write ordering

  • Full-precision is now written first in set_element.
  • Failure mode is benign: vector reachable in full-precision but not yet in quantized search (rather than a quantized ghost with no backing data).

Quant store cleanup on delete

  • New DeleteQuant trait: real impl for QuantVectorProvider (deletes the quant entry), no-op for NoStore.
  • delete now removes both full-precision and quant entries under the stripe lock. Neighbor-topology cleanup remains in the upper DiskANNIndex layer (drop_adj_list).

Testing

Three new concurrent stress tests in diskann-bftree/src/neighbors.rs:

  • test_concurrent_append_no_lost_edges — 8 threads × 10 edges to the same vertex; asserts every edge survives.
  • test_concurrent_append_independent_vertices — 8 threads appending to disjoint vertex ranges; asserts total edge count.
  • test_concurrent_read_write_consistency — mixed readers + writers on the same vertex; validates no torn reads.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens diskann-bftree for concurrent workloads by synchronizing neighbor-list mutations (preventing lost updates), correcting dual-store (full + quant) lifecycle behavior, and adding concurrent stress tests to validate the new behavior.

Changes:

  • Add striped per-vertex RwLock synchronization to NeighborProvider (read locks for reads, write locks for mutations) and introduce concurrent stress tests.
  • Adjust hard-delete behavior to also delete quantized vectors when present via a small abstraction over the quant store.
  • Reorder dual-store writes in set_element to write full-precision first, then quantized.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
diskann-bftree/src/neighbors.rs Adds striped locking around neighbor-list read/modify/write and adds concurrent stress tests.
diskann-bftree/src/provider.rs Adds quant-store cleanup on delete via DeleteQuant and reorders full/quant write sequencing in set_element.
diskann-bftree/src/quant.rs Adds quant-store delete support used by hard-delete cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-bftree/src/provider.rs
Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-bftree/src/provider.rs
Comment thread diskann-bftree/src/provider.rs
@harsha-simhadri

Copy link
Copy Markdown
Contributor

Jordan, what is the perf impact of this change? Is ingestion speed the same as before? thanks

@JordanMaples JordanMaples force-pushed the jordanmaples/bftree_parallelism branch 2 times, most recently from 04a1dd8 to c94cdcc Compare June 15, 2026 15:44
@codecov-commenter

codecov-commenter commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.99329% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.85%. Comparing base (53257e9) to head (b8ca306).

Files with missing lines Patch % Lines
diskann-bftree/src/provider.rs 88.46% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1158      +/-   ##
==========================================
+ Coverage   89.79%   90.85%   +1.06%     
==========================================
  Files         488      489       +1     
  Lines       93312    93575     +263     
==========================================
+ Hits        83791    85022    +1231     
+ Misses       9521     8553     -968     
Flag Coverage Δ
miri 90.85% <98.99%> (+1.06%) ⬆️
unittests 90.82% <98.99%> (+1.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-bftree/src/lib.rs 59.09% <ø> (ø)
diskann-bftree/src/locks.rs 100.00% <100.00%> (ø)
diskann-bftree/src/neighbors.rs 96.67% <100.00%> (+2.58%) ⬆️
diskann-bftree/src/quant.rs 89.12% <100.00%> (+0.18%) ⬆️
diskann-bftree/src/provider.rs 90.45% <88.46%> (-0.11%) ⬇️

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-bftree/src/provider.rs
@JordanMaples

Copy link
Copy Markdown
Contributor Author

@copilot give it another review.

Copilot AI commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@copilot give it another review.

I re-reviewed and found one remaining concurrency issue: beam expansion paths were reading neighbors without taking the striped read lock. I fixed both call sites in commit 562c341 by acquiring self.provider.locks.read(n.into_usize()) before get_neighbors.

Validation run:

  • cargo test -p diskann-bftree --profile ci
  • cargo clippy -p diskann-bftree --all-targets --config 'build.rustflags=["-Dwarnings"]' -- -D warnings
  • Copilot code review ✅ (no additional comments)
  • CodeQL scan timed out in the validation tool (no rerun per tool guidance).

@JordanMaples JordanMaples force-pushed the jordanmaples/bftree_parallelism branch from 562c341 to 535d925 Compare June 17, 2026 16:23
@JordanMaples

JordanMaples commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@harsha-simhadri after chatting a bit with Mark and and changing the approach to locking in the provider. I ran a few more wiki100k streaming benchmarks on my devbox instance it looks like the perf overhead is negligible.

Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-bftree/src/provider.rs
Comment thread diskann-bftree/src/provider.rs

@harsha-simhadri harsha-simhadri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments inline. Ack on perf numbers.

@harsha-simhadri harsha-simhadri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments inline. Ack on perf numbers.

@JordanMaples JordanMaples force-pushed the jordanmaples/bftree_parallelism branch from 566d5ee to 412b5ce Compare June 24, 2026 16:37
@JordanMaples JordanMaples enabled auto-merge (squash) June 25, 2026 15:17
JordanMaples and others added 6 commits June 26, 2026 10:56
Add striped RwLock to NeighborProvider to eliminate the TOCTOU race in
append_vector's read-modify-write cycle. Under 8-thread contention on
the same vertex, the unprotected path loses 11-51% of edges.

Striped locks (16384 stripes, Fibonacci multiply-shift hash):
- Constant ~128 KB memory regardless of dataset size (vs ~8 GB at 1B
  vectors for per-vertex locks)
- RwLock allows concurrent readers during search; only writers serialize
- Read lock on get_neighbors, write lock on set/append/delete
- Internal get_neighbors_unlocked avoids deadlock in append_vector

Dual-store write ordering (SetElement for QuantVectorProvider):
- Write full-precision (authoritative) before quant
- If quant write fails, worst case is a vector missing from quantized
  search but still reachable in full-precision (benign)

Hard-delete cleanup (Delete trait):
- Added DeleteQuant trait; QuantVectorProvider deletes its entry,
  NoStore is a no-op
- Single generic Delete impl with Q: DeleteQuant bound
- Neighbor adjacency cleanup handled by upper DiskANNIndex layer

Tests:
- 3 concurrent stress tests proving the TOCTOU fix
- Verified 10/10 failures without locks, 10/10 passes with locks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move StripedLocks from NeighborProvider to BfTreeProvider so that
delete and set_element operations are synchronized under the same
lock table as neighbor mutations. This eliminates a potential race
between concurrent delete and set_element on the same vertex ID.

- Extract StripedLocks into its own module (locks.rs)
- BfTreeProvider owns Arc<StripedLocks>, shared with NeighborAccessor
- Dynamic stripe count based on available_parallelism (4x cores, min 64)
- delete() and set_element() now acquire write locks per vertex

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses review feedback requesting documentation of lock
acquisition count, ordering, deadlock freedom, and read behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JordanMaples JordanMaples force-pushed the jordanmaples/bftree_parallelism branch from 412b5ce to fd863e5 Compare June 26, 2026 17:59
Comment thread diskann-bftree/src/neighbors.rs Outdated
Comment thread diskann-bftree/src/neighbors.rs Outdated
JordanMaples and others added 2 commits June 29, 2026 13:04
Address review feedback on the concurrent neighbor-access tests:

- test_concurrent_append_no_lost_edges: repeat the scenario over 100
  outer iterations and scale the writer count to the host's available
  parallelism (with a floor) so lost-update regressions surface
  reliably; size max_degree to the workload.
- test_concurrent_read_write_consistency: add an upper-bound check that
  every observed neighbor ID is within the range writers can produce
  (catching torn reads), wrap the read/write phase in an outer repeat
  loop, and host-scale writers/readers.
- test_concurrent_append_independent_vertices: host-scale writers and
  leave the multi_thread runtime unsized.

Factor out shared test helpers (stress_thread_count, new_provider,
new_shared_provider) and apply them consistently across the bftree
neighbor tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The reader task asserted iterations > 0, but nothing guaranteed a reader was
polled before the writers finished and set done. Under thread oversubscription
a reader could first run after done was already true, read zero times, and trip
the assertion — a scheduling artifact, not a data-consistency defect.

Restructure the reader as a do-while so it validates the invariants at least
once before observing done. Reading the fully-written state is still a valid,
consistent observation, so all consistency checks remain meaningful.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JordanMaples JordanMaples merged commit 7718073 into main Jun 30, 2026
22 checks passed
@JordanMaples JordanMaples deleted the jordanmaples/bftree_parallelism branch June 30, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants