Skip to content

Introduce diskann-record crate to support serialization + deserialization of DiskANN indexes#1188

Merged
suhasjs merged 23 commits into
mainfrom
users/suhasja/saveload-core
Jun 30, 2026
Merged

Introduce diskann-record crate to support serialization + deserialization of DiskANN indexes#1188
suhasjs merged 23 commits into
mainfrom
users/suhasja/saveload-core

Conversation

@suhasjs

@suhasjs suhasjs commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

This PR is part 1/2 of #1079 and only introduces the diskann-record crate. See #1079 for sample output when the save::Save and load::Load traits from this crate are implemented for a simple in-memory index.

  • Does this PR have a descriptive title that could go in our release notes?
  • Does this PR add any new dependencies?
  • Does this PR modify any existing APIs?
  • Is the change to the API backwards compatible?
  • Should this result in any changes to our documentation, either updating existing docs or adding new ones?

Reference Issues/PRs

Part 1/2 of #1079. Also see #737.

What does this implement/fix? Briefly explain your changes.

Introduces a new create diskann-record to support serialization + deserialization for DiskANN indexes. It provides a small, backend-agnostic framework for persisting structured Rust values as a versioned manifest plus side-car binary artifacts, and reloading them later. diskann_record is intended to be a new foundational crate, so it only depends on anyhow (serde is feature-gated and only used for on-disk formats).

Summary of changes:

  • Introduces two new traits (diskann_record::save::Save and diskann_record::load::Load), along with two macros load_fields and save_fields for simple, plain structs.
  • Records carry Version that allow for durable indexes: load_legacy explicitly supports loading serialized representations of older versions of the same struct.
  • Manifest is JSON-compatible and carries a list of artifact filenames (if serialized using disk-backend)
  • Implement two custom backends -- Disk backend that writes artifacts to individual files, and a Memory backend that simply serializes to a Vec<u8>. Disk backend depends on serde, whereas Memory does not.
  • diskann-record is generic enough to allow saving with one backend and loading with another provider. It boils down to how SaveContext/LoadContext are implemented for each provider, except for Memory provider -- it pipes the output of its SaveContext directly into its LoadContext, so it never leaves in-process memory.

Any other comments?

…oadable] traits [THIS PR] + impls for structs [TODO]
@suhasjs suhasjs self-assigned this Jun 18, 2026
@suhasjs suhasjs requested review from a team and Copilot June 18, 2026 17:52
@suhasjs suhasjs added enhancement New feature or request rust Pull requests that update rust code labels Jun 18, 2026
@suhasjs suhasjs moved this to Done in DiskANN backlog Jun 18, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

Sidecar artifact path handling allows unsafe/incorrect path shapes (including potential directory escape on save) and needs validation hardening before merging.

Pull request overview

Introduces a new diskann-record crate that defines a versioned JSON-manifest + sidecar-artifact framework for persisting DiskANN-related structures, including a Save/Load trait surface, wire-level value model, and basic round-trip tests. This is positioned as the foundational crate for the follow-up PR that will implement these traits for real index types.

Changes:

  • Added save + load modules with Save/Load and Saveable/Loadable traits, plus save_fields! / load_fields! macros.
  • Implemented wire types (Value, Record, Handle), schema Version, and a lossless Number container for manifest numeric values.
  • Integrated the new crate into the workspace (members + workspace dependency) and added initial unit tests validating round-trips and handle escape rejection.
File summaries
File Description
diskann-record/src/lib.rs Crate-level API/docs, reserved-key policy, 64-bit platform assertion, and end-to-end tests.
diskann-record/src/version.rs Defines Version and its string serialization/deserialization form.
diskann-record/src/number.rs Adds Number wire type and safe narrowing conversions.
diskann-record/src/save/mod.rs Save-side traits, entry point, macros, and primitive Saveable impls.
diskann-record/src/save/context.rs Save-side context and sidecar writer + manifest finalization logic.
diskann-record/src/save/error.rs Save-side error wrapper.
diskann-record/src/save/value.rs Wire-level Value/Record/Handle representations and serde behavior.
diskann-record/src/load/mod.rs Load-side traits, entry point, macros, and primitive Loadable impls.
diskann-record/src/load/context.rs Load-side context/object/array APIs and sidecar reader.
diskann-record/src/load/error.rs Load-side error type and recoverable-vs-critical classification.
diskann-record/Cargo.toml New crate manifest and dependencies.
Cargo.toml Adds diskann-record to the workspace and workspace dependencies.
Cargo.lock Records the new workspace package entry.

Copilot's findings

  • Files reviewed: 12/13 changed files
  • Comments generated: 4

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-record/src/save/context.rs Outdated
Comment thread diskann-record/src/save/context.rs Outdated
Comment thread diskann-record/src/load/context.rs Outdated
Comment thread diskann-record/src/save/mod.rs
@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.69932% with 216 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.78%. Comparing base (3aa44ac) to head (c2e83b1).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
diskann-record/src/lib.rs 87.63% 45 Missing ⚠️
diskann-record/src/load/context.rs 75.84% 43 Missing ⚠️
diskann-record/src/load/error.rs 65.06% 29 Missing ⚠️
diskann-record/src/value.rs 87.55% 28 Missing ⚠️
diskann-record/src/save/context.rs 58.82% 21 Missing ⚠️
diskann-record/src/backend/disk.rs 95.36% 17 Missing ⚠️
diskann-record/src/save/mod.rs 75.40% 15 Missing ⚠️
diskann-record/src/number.rs 92.12% 10 Missing ⚠️
diskann-record/src/backend/memory.rs 98.25% 3 Missing ⚠️
diskann-record/src/version.rs 93.47% 3 Missing ⚠️
... and 1 more

❌ Your patch status has failed because the patch coverage (87.69%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1188      +/-   ##
==========================================
+ Coverage   89.46%   89.78%   +0.31%     
==========================================
  Files         487      501      +14     
  Lines       92170    95331    +3161     
==========================================
+ Hits        82460    85590    +3130     
- Misses       9710     9741      +31     
Flag Coverage Δ
miri 89.78% <87.69%> (+0.31%) ⬆️
unittests 89.44% <87.69%> (+0.32%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-record/src/save/error.rs 100.00% <100.00%> (ø)
diskann-record/src/load/mod.rs 96.55% <96.55%> (ø)
diskann-record/src/backend/memory.rs 98.25% <98.25%> (ø)
diskann-record/src/version.rs 93.47% <93.47%> (ø)
diskann-record/src/number.rs 92.12% <92.12%> (ø)
diskann-record/src/save/mod.rs 75.40% <75.40%> (ø)
diskann-record/src/backend/disk.rs 95.36% <95.36%> (ø)
diskann-record/src/save/context.rs 58.82% <58.82%> (ø)
diskann-record/src/value.rs 87.55% <87.55%> (ø)
diskann-record/src/load/error.rs 65.06% <65.06%> (ø)
... and 2 more

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hildebrandmw hildebrandmw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suhas - I have one big architectural comment about allowing pluggable backend contexts that I think will address many of the concerns about how heavy this is as a dependency and support for VFS that I think is probably worth doing. Happy to help out if needed.

Comment thread diskann-record/src/save/context.rs Outdated
Comment thread diskann-record/src/value.rs
@suhasjs

suhasjs commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@hildebrandmw Requesting review for the following changes:

  • Moved up value.rs to reflect the shared usage between load/save paths
  • Trait objects to enable VFS pluggability (with a DiskContext impl for disk-based serialization, feature gated)
  • Change SaveContext::write to take Option<&str> to treat the input key as a hint. Filename is now INTEGER-{key}, where INTEGER is just the number of artifacts written so far. Creating a random value would pull in rand, and I didn't want to add that dependency.
  • Improved test coverage

@hildebrandmw hildebrandmw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suhas - one more round on the SaveContext/LoadContext traits. I think it would be helpful to implement a purely in-memory backend to see how it interactws with the reader/writers. Such a backend can work directly with Value and avoid pulling in serde entirely.

Comment thread diskann-record/src/load/context.rs Outdated
Comment thread diskann-record/src/load/context.rs Outdated
Comment thread diskann-record/src/version.rs Outdated
Comment thread diskann-record/src/save/context.rs Outdated
Comment thread diskann-record/src/save/context.rs Outdated
Comment thread diskann-record/src/save/context.rs Outdated
Comment thread diskann-record/src/save/context.rs Outdated
suhasjs added 5 commits June 23, 2026 14:20
…ls into it now; added in-memory ONLY variant of SaveContext and LoadContext --> moved to backend/; added an enum Backend to choose between Disk*Context and InMemory*Context; moved Disk*Context to backend/
…ed WriterInner impls for DiskWriter and MemoryWriter; renamed InMemoryContext -> MemoryContext (same for InMemorySaveContext)

@hildebrandmw hildebrandmw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suhas, this is coming together. I love how light-weight it is getting when the disk backend and serde are excluded. I have a few higher-level comments. Mostly about testing and fortifying the unhappy paths in addition to the happy paths.

Comment thread diskann-record/src/save/context.rs Outdated
Comment thread diskann-record/src/value.rs Outdated
Comment thread diskann-record/src/number.rs
Comment thread diskann-record/src/backend/memory.rs Outdated
Comment thread diskann-record/src/backend/memory.rs Outdated
Comment thread diskann-record/src/backend/disk.rs
Comment thread diskann-record/src/load/mod.rs
Comment thread diskann-record/README.md Outdated
Comment thread diskann-record/README.md
Comment thread diskann-record/src/value.rs

@harsha-simhadri harsha-simhadri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments inline. will read backed code once you have final updates

@harsha-simhadri harsha-simhadri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments inline. will read backed code once you have final updates

@harsha-simhadri harsha-simhadri left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but please see which of the comments in #1079 still apply here. There are quite a few unresolved comments from Jordan and Alex there that seem relevant.

@JordanMaples JordanMaples left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@suhasjs suhasjs merged commit 84fed73 into main Jun 30, 2026
23 checks passed
@suhasjs suhasjs deleted the users/suhasja/saveload-core branch June 30, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request rust Pull requests that update rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants