Skip to content

Remove try_into_vector_id conversion in prune.#1133

Merged
hildebrandmw merged 3 commits into
mainfrom
mhildebr/prune
Jun 5, 2026
Merged

Remove try_into_vector_id conversion in prune.#1133
hildebrandmw merged 3 commits into
mainfrom
mhildebr/prune

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

Pruning did kind of a sneaky trick of reusing the output neighbor buffer as temporary scratch space for indices during prune. While this worked, it relied on usize variables being convertible to VectorId, which is kind of dubious because (1) we are moving away from VectorIds being convertible to primitive values and (2) it doesn't really provide a good guarantee on when the conversion will succeed or not.

This PR fixes that by using a dedicated scratch to store these indices.

While I originally introduced a third Vec in prune::{Scratch, Context}, I realized that all the auxiliary variables could be collapsed into a single State. Note that we get the extra 2-bytes for neighbor basically for free because even without it, padding would bring the size of State to 8 bytes.

Also, this fixes a potential bug where the max occlusion size could be set larger than u16::MAX and thus silently truncate via _ as u16 by enforcing a u16::MAX bound in diskann::graph::Config. This far exceeds any practical value anyone should be setting for this value so is unlikely to result in any breakage.

@hildebrandmw hildebrandmw requested review from a team and Copilot June 4, 2026 22:28
@hildebrandmw hildebrandmw changed the title Remove try_from_vector_id conversion in prune. Remove try_into_vector_id conversion in prune. Jun 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the pruning scratch-space layout to remove reliance on converting candidate indices into VectorId during pruning, replacing the separate occlude_factor / last_checked scratch vectors with a unified per-position State. It also tightens max_occlusion_size to a u16-bounded configuration to avoid silent truncation.

Changes:

  • Replace prune scratch bookkeeping (occlude_factor, last_checked) with a single State vector and update pruning logic accordingly.
  • Enforce max_occlusion_size <= u16::MAX at the configuration level and update tests/defaults.
  • Adjust config conversion helpers (macro → traits/functions) and update experimental config accessors.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
diskann/src/graph/internal/prune.rs Consolidates prune scratch state into a State struct and updates Scratch/Context accordingly.
diskann/src/graph/index.rs Updates pruning implementation to use states and removes the try_into_vector_id conversion path.
diskann/src/graph/config/mod.rs Changes max_occlusion_size storage to NonZeroU16, adds generic non-zero parsing helpers, and updates tests.
diskann/src/graph/config/experimental.rs Updates to new ToNonZeroUsize helper usage.
diskann/src/graph/config/defaults.rs Changes default MAX_OCCLUSION_SIZE type to NonZeroU16.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann/src/graph/index.rs Outdated
Comment thread diskann/src/graph/index.rs
Comment thread diskann/src/graph/index.rs
Comment thread diskann/src/graph/index.rs
Comment thread diskann/src/graph/internal/prune.rs Outdated
Comment thread diskann/src/graph/config/mod.rs
Comment thread diskann/src/graph/config/defaults.rs
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 96.84211% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.40%. Comparing base (6168ef0) to head (31fdca3).

Files with missing lines Patch % Lines
diskann/src/graph/index.rs 93.47% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1133   +/-   ##
=======================================
  Coverage   89.40%   89.40%           
=======================================
  Files         485      485           
  Lines       92079    92107   +28     
=======================================
+ Hits        82324    82351   +27     
- Misses       9755     9756    +1     
Flag Coverage Δ
miri 89.40% <96.84%> (+<0.01%) ⬆️
unittests 89.06% <96.84%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann/src/graph/config/experimental.rs 100.00% <100.00%> (ø)
diskann/src/graph/config/mod.rs 98.14% <100.00%> (-0.10%) ⬇️
diskann/src/graph/internal/prune.rs 62.50% <100.00%> (-1.79%) ⬇️
diskann/src/graph/index.rs 96.05% <93.47%> (-0.10%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hildebrandmw hildebrandmw merged commit af939ef into main Jun 5, 2026
23 checks passed
@hildebrandmw hildebrandmw deleted the mhildebr/prune branch June 5, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants