Concurrency hardening for BfTree parallel workloads by JordanMaples · Pull Request #1158 · microsoft/DiskANN

JordanMaples · 2026-06-12T21:09:27Z

Problem

NeighborProvider::append_vector performs a read-modify-write cycle (read existing neighbors → append → write back) without synchronization. Under concurrent mutation, the last writer wins and silently drops the other's edges. Stress testing shows edge loss under multi-thread contention on a single vertex.

Additionally, the dual-store set_element wrote quant before full-precision, meaning if the full-precision write failed after quant succeeded, it would leave a quantized ghost with no backing data. And delete did not clean up the quant store entry.

Solution

Striped `Mutex` table for per-vertex synchronization

StripedLocks: a fixed-size table of Mutex<()> stripes (diskann-bftree/src/locks.rs).
Stripe count is derived from hardware parallelism: (cpus * 4).next_power_of_two().max(64) — constant memory regardless of dataset size, suitable for billion-scale workloads.
Vertex IDs map to stripes via Fibonacci multiply-shift hashing (floor(2^64 / phi)), keeping false contention negligible.
Each mutating op acquires exactly one stripe lock for the target vertex, does its work, and drops the guard — no nested locking, so deadlock is structurally impossible.

Why striped instead of per-vertex locks?

BfTree's value proposition is spilling to disk for datasets exceeding RAM. At 1B vectors, per-vertex locks would consume gigabytes of in-memory locks — defeating the purpose. Striped locks keep memory constant at the cost of rare false contention.

Reads are lock-free

get_neighbors and get_vector acquire no stripe lock. They rely on bf-tree's internal thread safety and the fact that DiskANN search is approximate by design: a reader may briefly observe a partially-updated neighbor list, which affects recall momentarily rather than correctness. Writers (set_element, delete, set_neighbors/write_neighbors, append_vector/write_append) serialize on the stripe.

Dual-store write ordering

Full-precision is now written first in set_element.
Failure mode is benign: vector reachable in full-precision but not yet in quantized search (rather than a quantized ghost with no backing data).

Quant store cleanup on delete

New DeleteQuant trait: real impl for QuantVectorProvider (deletes the quant entry), no-op for NoStore.
delete now removes both full-precision and quant entries under the stripe lock. Neighbor-topology cleanup remains in the upper DiskANNIndex layer (drop_adj_list).

Testing

Three new concurrent stress tests in diskann-bftree/src/neighbors.rs:

test_concurrent_append_no_lost_edges — 8 threads × 10 edges to the same vertex; asserts every edge survives.
test_concurrent_append_independent_vertices — 8 threads appending to disjoint vertex ranges; asserts total edge count.
test_concurrent_read_write_consistency — mixed readers + writers on the same vertex; validates no torn reads.

Copilot

Pull request overview

This PR hardens diskann-bftree for concurrent workloads by synchronizing neighbor-list mutations (preventing lost updates), correcting dual-store (full + quant) lifecycle behavior, and adding concurrent stress tests to validate the new behavior.

Changes:

Add striped per-vertex RwLock synchronization to NeighborProvider (read locks for reads, write locks for mutations) and introduce concurrent stress tests.
Adjust hard-delete behavior to also delete quantized vectors when present via a small abstraction over the quant store.
Reorder dual-store writes in set_element to write full-precision first, then quantized.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
diskann-bftree/src/neighbors.rs	Adds striped locking around neighbor-list read/modify/write and adds concurrent stress tests.
diskann-bftree/src/provider.rs	Adds quant-store cleanup on delete via `DeleteQuant` and reorders full/quant write sequencing in `set_element`.
diskann-bftree/src/quant.rs	Adds quant-store delete support used by hard-delete cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

harsha-simhadri · 2026-06-13T01:06:49Z

Jordan, what is the perf impact of this change? Is ingestion speed the same as before? thanks

codecov-commenter · 2026-06-15T15:58:55Z

Codecov Report

❌ Patch coverage is 98.99329% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.85%. Comparing base (53257e9) to head (b8ca306).

Files with missing lines	Patch %	Lines
diskann-bftree/src/provider.rs	88.46%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1158      +/-   ##
==========================================
+ Coverage   89.79%   90.85%   +1.06%     
==========================================
  Files         488      489       +1     
  Lines       93312    93575     +263     
==========================================
+ Hits        83791    85022    +1231     
+ Misses       9521     8553     -968

Flag	Coverage Δ
miri	`90.85% <98.99%> (+1.06%)`	⬆️
unittests	`90.82% <98.99%> (+1.36%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-bftree/src/lib.rs	`59.09% <ø> (ø)`
diskann-bftree/src/locks.rs	`100.00% <100.00%> (ø)`
diskann-bftree/src/neighbors.rs	`96.67% <100.00%> (+2.58%)`	⬆️
diskann-bftree/src/quant.rs	`89.12% <100.00%> (+0.18%)`	⬆️
diskann-bftree/src/provider.rs	`90.45% <88.46%> (-0.11%)`	⬇️

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JordanMaples · 2026-06-16T20:24:37Z

@copilot give it another review.

Copilot · 2026-06-16T20:38:17Z

@copilot give it another review.

I re-reviewed and found one remaining concurrency issue: beam expansion paths were reading neighbors without taking the striped read lock. I fixed both call sites in commit 562c341 by acquiring self.provider.locks.read(n.into_usize()) before get_neighbors.

Validation run:

cargo test -p diskann-bftree --profile ci ✅
cargo clippy -p diskann-bftree --all-targets --config 'build.rustflags=["-Dwarnings"]' -- -D warnings ✅
Copilot code review ✅ (no additional comments)
CodeQL scan timed out in the validation tool (no rerun per tool guidance).

JordanMaples · 2026-06-17T18:12:21Z

@harsha-simhadri after chatting a bit with Mark and and changing the approach to locking in the provider. I ran a few more wiki100k streaming benchmarks on my devbox instance it looks like the perf overhead is negligible.

harsha-simhadri

Left some comments inline. Ack on perf numbers.

harsha-simhadri

Left some comments inline. Ack on perf numbers.

Add striped RwLock to NeighborProvider to eliminate the TOCTOU race in append_vector's read-modify-write cycle. Under 8-thread contention on the same vertex, the unprotected path loses 11-51% of edges. Striped locks (16384 stripes, Fibonacci multiply-shift hash): - Constant ~128 KB memory regardless of dataset size (vs ~8 GB at 1B vectors for per-vertex locks) - RwLock allows concurrent readers during search; only writers serialize - Read lock on get_neighbors, write lock on set/append/delete - Internal get_neighbors_unlocked avoids deadlock in append_vector Dual-store write ordering (SetElement for QuantVectorProvider): - Write full-precision (authoritative) before quant - If quant write fails, worst case is a vector missing from quantized search but still reachable in full-precision (benign) Hard-delete cleanup (Delete trait): - Added DeleteQuant trait; QuantVectorProvider deletes its entry, NoStore is a no-op - Single generic Delete impl with Q: DeleteQuant bound - Neighbor adjacency cleanup handled by upper DiskANNIndex layer Tests: - 3 concurrent stress tests proving the TOCTOU fix - Verified 10/10 failures without locks, 10/10 passes with locks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move StripedLocks from NeighborProvider to BfTreeProvider so that delete and set_element operations are synchronized under the same lock table as neighbor mutations. This eliminates a potential race between concurrent delete and set_element on the same vertex ID. - Extract StripedLocks into its own module (locks.rs) - BfTreeProvider owns Arc<StripedLocks>, shared with NeighborAccessor - Dynamic stripe count based on available_parallelism (4x cores, min 64) - delete() and set_element() now acquire write locks per vertex Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Addresses review feedback requesting documentation of lock acquisition count, ordering, deadlock freedom, and read behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address review feedback on the concurrent neighbor-access tests: - test_concurrent_append_no_lost_edges: repeat the scenario over 100 outer iterations and scale the writer count to the host's available parallelism (with a floor) so lost-update regressions surface reliably; size max_degree to the workload. - test_concurrent_read_write_consistency: add an upper-bound check that every observed neighbor ID is within the range writers can produce (catching torn reads), wrap the read/write phase in an outer repeat loop, and host-scale writers/readers. - test_concurrent_append_independent_vertices: host-scale writers and leave the multi_thread runtime unsized. Factor out shared test helpers (stress_thread_count, new_provider, new_shared_provider) and apply them consistently across the bftree neighbor tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The reader task asserted iterations > 0, but nothing guaranteed a reader was polled before the writers finished and set done. Under thread oversubscription a reader could first run after done was already true, read zero times, and trip the assertion — a scheduling artifact, not a data-consistency defect. Restructure the reader as a do-while so it validates the invariants at least once before observing done. Reading the fully-written state is still a valid, consistent observation, so all consistency checks remain meaningful. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

JordanMaples requested review from a team and Copilot June 12, 2026 21:09

Copilot started reviewing on behalf of JordanMaples June 12, 2026 21:09 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread diskann-bftree/src/provider.rs

Comment thread diskann-bftree/src/neighbors.rs Outdated

Comment thread diskann-bftree/src/provider.rs

Comment thread diskann-bftree/src/provider.rs

JordanMaples force-pushed the jordanmaples/bftree_parallelism branch 2 times, most recently from 04a1dd8 to c94cdcc Compare June 15, 2026 15:44

hildebrandmw reviewed Jun 16, 2026

View reviewed changes

Comment thread diskann-bftree/src/neighbors.rs Outdated

Comment thread diskann-bftree/src/provider.rs

Copilot started work on behalf of JordanMaples June 16, 2026 20:25 View session

Copilot finished work on behalf of JordanMaples June 16, 2026 20:38

JordanMaples force-pushed the jordanmaples/bftree_parallelism branch from 562c341 to 535d925 Compare June 17, 2026 16:23

hildebrandmw approved these changes Jun 17, 2026

View reviewed changes

Comment thread diskann-bftree/src/neighbors.rs Outdated

harsha-simhadri reviewed Jun 22, 2026

View reviewed changes

Comment thread diskann-bftree/src/provider.rs

harsha-simhadri reviewed Jun 22, 2026

View reviewed changes

Comment thread diskann-bftree/src/provider.rs

harsha-simhadri reviewed Jun 22, 2026

View reviewed changes

JordanMaples force-pushed the jordanmaples/bftree_parallelism branch from 566d5ee to 412b5ce Compare June 24, 2026 16:37

JordanMaples enabled auto-merge (squash) June 25, 2026 15:17

JordanMaples and others added 6 commits June 26, 2026 10:56

remove locks for reads to rely on bf-tree's internal locks

8ca406a

switching to mutex as we are not explicitly locking for reads

45e301c

arc to ref

df4dc32

Document locking protocol for BfTreeProvider

fd863e5

Addresses review feedback requesting documentation of lock acquisition count, ordering, deadlock freedom, and read behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

JordanMaples force-pushed the jordanmaples/bftree_parallelism branch from 412b5ce to fd863e5 Compare June 26, 2026 17:59

metajack approved these changes Jun 29, 2026

View reviewed changes

harsha-simhadri reviewed Jun 29, 2026

View reviewed changes

Comment thread diskann-bftree/src/neighbors.rs Outdated

harsha-simhadri reviewed Jun 29, 2026

View reviewed changes

Comment thread diskann-bftree/src/neighbors.rs Outdated

JordanMaples and others added 2 commits June 29, 2026 13:04

hildebrandmw approved these changes Jun 30, 2026

View reviewed changes

JordanMaples merged commit 7718073 into main Jun 30, 2026
22 checks passed

JordanMaples deleted the jordanmaples/bftree_parallelism branch June 30, 2026 16:01

Uh oh!

Conversation

JordanMaples commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Striped Mutex table for per-vertex synchronization

Why striped instead of per-vertex locks?

Reads are lock-free

Dual-store write ordering

Quant store cleanup on delete

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harsha-simhadri commented Jun 13, 2026

Uh oh!

codecov-commenter commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

JordanMaples commented Jun 16, 2026

Uh oh!

Copilot AI commented Jun 16, 2026

Uh oh!

JordanMaples commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harsha-simhadri left a comment

Choose a reason for hiding this comment

Uh oh!

harsha-simhadri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

JordanMaples commented Jun 12, 2026 •

edited

Loading

Striped `Mutex` table for per-vertex synchronization

codecov-commenter commented Jun 15, 2026 •

edited

Loading

JordanMaples commented Jun 17, 2026 •

edited

Loading