Introduce diskann-record crate to support serialization + deserialization of DiskANN indexes#1188
Conversation
…oadable] traits [THIS PR] + impls for structs [TODO]
There was a problem hiding this comment.
⚠️ Not ready to approve
Sidecar artifact path handling allows unsafe/incorrect path shapes (including potential directory escape on save) and needs validation hardening before merging.
Pull request overview
Introduces a new diskann-record crate that defines a versioned JSON-manifest + sidecar-artifact framework for persisting DiskANN-related structures, including a Save/Load trait surface, wire-level value model, and basic round-trip tests. This is positioned as the foundational crate for the follow-up PR that will implement these traits for real index types.
Changes:
- Added
save+loadmodules withSave/LoadandSaveable/Loadabletraits, plussave_fields!/load_fields!macros. - Implemented wire types (
Value,Record,Handle), schemaVersion, and a losslessNumbercontainer for manifest numeric values. - Integrated the new crate into the workspace (members + workspace dependency) and added initial unit tests validating round-trips and handle escape rejection.
File summaries
| File | Description |
|---|---|
| diskann-record/src/lib.rs | Crate-level API/docs, reserved-key policy, 64-bit platform assertion, and end-to-end tests. |
| diskann-record/src/version.rs | Defines Version and its string serialization/deserialization form. |
| diskann-record/src/number.rs | Adds Number wire type and safe narrowing conversions. |
| diskann-record/src/save/mod.rs | Save-side traits, entry point, macros, and primitive Saveable impls. |
| diskann-record/src/save/context.rs | Save-side context and sidecar writer + manifest finalization logic. |
| diskann-record/src/save/error.rs | Save-side error wrapper. |
| diskann-record/src/save/value.rs | Wire-level Value/Record/Handle representations and serde behavior. |
| diskann-record/src/load/mod.rs | Load-side traits, entry point, macros, and primitive Loadable impls. |
| diskann-record/src/load/context.rs | Load-side context/object/array APIs and sidecar reader. |
| diskann-record/src/load/error.rs | Load-side error type and recoverable-vs-critical classification. |
| diskann-record/Cargo.toml | New crate manifest and dependencies. |
| Cargo.toml | Adds diskann-record to the workspace and workspace dependencies. |
| Cargo.lock | Records the new workspace package entry. |
Copilot's findings
- Files reviewed: 12/13 changed files
- Comments generated: 4
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (87.69%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1188 +/- ##
==========================================
+ Coverage 89.46% 89.78% +0.31%
==========================================
Files 487 501 +14
Lines 92170 95331 +3161
==========================================
+ Hits 82460 85590 +3130
- Misses 9710 9741 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Suhas - I have one big architectural comment about allowing pluggable backend contexts that I think will address many of the concerns about how heavy this is as a dependency and support for VFS that I think is probably worth doing. Happy to help out if needed.
… and load paths now use ; feature gated serde and disk impls
…ite a Vec<struct> where each struct might produce a file with the same key
|
@hildebrandmw Requesting review for the following changes:
|
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Suhas - one more round on the SaveContext/LoadContext traits. I think it would be helpful to implement a purely in-memory backend to see how it interactws with the reader/writers. Such a backend can work directly with Value and avoid pulling in serde entirely.
…ls into it now; added in-memory ONLY variant of SaveContext and LoadContext --> moved to backend/; added an enum Backend to choose between Disk*Context and InMemory*Context; moved Disk*Context to backend/
…ed WriterInner impls for DiskWriter and MemoryWriter; renamed InMemoryContext -> MemoryContext (same for InMemorySaveContext)
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Suhas, this is coming together. I love how light-weight it is getting when the disk backend and serde are excluded. I have a few higher-level comments. Mostly about testing and fortifying the unhappy paths in addition to the happy paths.
harsha-simhadri
left a comment
There was a problem hiding this comment.
couple of comments inline. will read backed code once you have final updates
harsha-simhadri
left a comment
There was a problem hiding this comment.
couple of comments inline. will read backed code once you have final updates
…it NaN,+inf,-inf handling for f64
…aveContext if hint is invalid
harsha-simhadri
left a comment
There was a problem hiding this comment.
lgtm, but please see which of the comments in #1079 still apply here. There are quite a few unresolved comments from Jordan and Alex there that seem relevant.
This PR is part 1/2 of #1079 and only introduces the
diskann-recordcrate. See #1079 for sample output when thesave::Saveandload::Loadtraits from this crate are implemented for a simple in-memory index.Reference Issues/PRs
Part 1/2 of #1079. Also see #737.
What does this implement/fix? Briefly explain your changes.
Introduces a new create
diskann-recordto support serialization + deserialization for DiskANN indexes. It provides a small, backend-agnostic framework for persisting structured Rust values as a versioned manifest plus side-car binary artifacts, and reloading them later.diskann_recordis intended to be a new foundational crate, so it only depends onanyhow(serdeis feature-gated and only used for on-disk formats).Summary of changes:
diskann_record::save::Saveanddiskann_record::load::Load), along with two macrosload_fieldsandsave_fieldsfor simple, plainstructs.Versionthat allow for durable indexes:load_legacyexplicitly supports loading serialized representations of older versions of the same struct.Diskbackend that writes artifacts to individual files, and aMemorybackend that simply serializes to aVec<u8>.Diskbackend depends onserde, whereasMemorydoes not.diskann-recordis generic enough to allow saving with one backend and loading with another provider. It boils down to howSaveContext/LoadContextare implemented for each provider, except forMemoryprovider -- it pipes the output of itsSaveContextdirectly into itsLoadContext, so it never leaves in-process memory.Any other comments?