RFC: integrate PiPNN as an alternative graph-index build algorithm#1049
RFC: integrate PiPNN as an alternative graph-index build algorithm#1049SeliMeli wants to merge 4 commits into
Conversation
|
@SeliMeli please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
Per rfcs/README.md step 4: rename from 00000-short-title.md to NNNNN-short-title.md using the zero-padded PR number (microsoft#1049). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an RFC proposing PiPNN as an opt-in, feature-gated alternative to Vamana for DiskANN disk-index graph construction, keeping disk format and search API unchanged.
Changes:
- Introduces RFC 01049 detailing PiPNN’s algorithm, integration plan, and two-stage rollout
- Specifies a
BuildAlgorithmselector design and feature-gating strategy (pipnn) - Documents benchmark results and Stage-1 milestones gating potential Stage-2 deprecation of Vamana full rebuilds
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1049 +/- ##
=======================================
Coverage 90.60% 90.60%
=======================================
Files 461 461
Lines 85494 85494
=======================================
+ Hits 77462 77465 +3
+ Misses 8032 8029 -3
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Adds an RFC proposing PiPNN (arXiv:2602.21247) as a second graph-index build algorithm for DiskANN's disk index. Integration is two-stage: Stage 1 lands PiPNN behind a build-algorithm selector with Vamana as default; Stage 2 (conditional on Stage 1 milestones) retires Vamana's full-rebuild path while keeping it for incremental inserts via the hybrid update model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per rfcs/README.md step 4: rename from 00000-short-title.md to NNNNN-short-title.md using the zero-padded PR number (microsoft#1049). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tones - Add new M1 for in-memory build/search parity with Vamana (PiPNN today only feeds into DiskIndexWriter; a path that populates a DiskANNIndex directly for in-mem-only consumers is missing). - Renumber M1-M7 → M2-M8. - Convert each milestone's plain-text paragraph into bullet lists (Scope / Validation / etc.) for readability per RFC reviewer feedback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Explicitly document feature-gated deserialization behavior: configs with "algorithm": "PiPNN" fail at parse time in non-pipnn binaries with a serde unknown-variant error. Not a backward-compatibility regression; configs without build_algorithm parse identically across feature combinations. - Add explanation for disk-edges path being not-slower than one-shot despite extra I/O (smaller working set, sequential append spills overlap with compute). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
943fa74 to
4fe210f
Compare
Summary
This RFC proposes adding PiPNN (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) as a second graph-construction algorithm for DiskANN's disk index, alongside the existing Vamana builder.
PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at up to 6.3× lower build time on our measured workloads (Enron 10M, BigANN 10M). Vamana remains the default and the only algorithm supported for incremental inserts; PiPNN is the proposed faster path for full rebuilds.
Two-stage integration
BuildAlgorithmselector withVamanaas the default. PiPNN is opt-in via apipnnCargo feature. Existing build sites see no behavior change. Stage 1 defines explicit milestones (M0–M7) gating Stage 2 readiness.Highlights
build_ram_limit_gbknob, bringing PiPNN's peak RSS to or below Vamana's at a configurable build-time cost.Reviewers
Please read the full RFC for trade-offs, benchmark tables, and milestone definitions.
🤖 Generated with Claude Code