Skip to content

Revise README for DiskANN3 #1046

Merged
harsha-simhadri merged 13 commits into
mainfrom
harshasi/update_readme
May 13, 2026
Merged

Revise README for DiskANN3 #1046
harsha-simhadri merged 13 commits into
mainfrom
harshasi/update_readme

Conversation

@harsha-simhadri
Copy link
Copy Markdown
Contributor

Updated README to reflect changes in DiskANN3 and added details about the Provider API and getting started guide.

Updated README to reflect changes in DiskANN3 and added details about the Provider API and getting started guide.
Corrected formatting and improved clarity in the README.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the top-level README to describe DiskANN3 (Rust main branch) and its Provider API, plus a short “Getting Started” section pointing to benchmarking and provider integration entry points.

Changes:

  • Replaces the prior badge/paper-heavy intro with a DiskANN3 overview and feature list.
  • Adds Provider API context and a “Getting Started” section with links to benchmarks and the provider contract.
  • Moves badges and paper links into the “Legacy C++ Code” section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Updated the README to clarify the DiskANN3 library's purpose and usage, including changes to the description of the API and algorithmic features.
Updated project name and description in README.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.64%. Comparing base (3d3ed4c) to head (e9c0139).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1046      +/-   ##
==========================================
+ Coverage   90.60%   90.64%   +0.04%     
==========================================
  Files         461      461              
  Lines       85494    85920     +426     
==========================================
+ Hits        77462    77884     +422     
- Misses       8032     8036       +4     
Flag Coverage Δ
miri 90.64% <ø> (+0.04%) ⬆️
unittests 90.60% <ø> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@harsha-simhadri harsha-simhadri changed the title Revise README for DiskANN3 updates Revise README for DiskANN3 May 10, 2026
@harsha-simhadri harsha-simhadri enabled auto-merge (squash) May 10, 2026 23:01
Comment thread README.md
Comment thread README.md Outdated
Comment thread README.md
Comment thread README.md
Comment thread README.md Outdated
Comment thread README.md
Comment thread README.md
Comment thread README.md Outdated
Comment thread README.md
@harsha-simhadri harsha-simhadri merged commit 3fb79a7 into main May 13, 2026
22 checks passed
@harsha-simhadri harsha-simhadri deleted the harshasi/update_readme branch May 13, 2026 22:10
arkrishn94 added a commit that referenced this pull request May 28, 2026
# DiskANN v0.53.0 Release Notes

## Breaking Changes

An AI generated, human reviewed list of changes is summarized below.

### Paged search overhauled — channel-based API
([#1078](#1078))

`PagedSearchState` and its `'static`-bound pause/resume model have been
replaced with an async, channel-based interface. The recommended way to
drive paged search is now via a `tokio::sync::mpsc` channel, with the
searcher embedded in an otherwise-`'static` future. See the [rendered
RFC](https://github.com/microsoft/DiskANN/blob/main/rfcs/01078-paged-search.md)
for the new shape. Callers wired against `PagedSearchState` must migrate
to the channel API.

Users of paged search via `wrapped_async::DiskANNIndex` that know their
inner futures will never suspend can use the new
`wrapped_async::DiskANNIndex::paged_search_no_await`; this will
efficiently run paged searches with minimal synchronization overhead.

### `DiskANNIndex::flat_search` removed
([#1076](#1076))

`DiskANNIndex::flat_search` and the `IdIterator` trait have been removed
from the `diskann` crate. Equivalent functionality lives on the new
inherent method `DiskIndexSearcher::flat_search` in `diskann-disk`. This
unblocks the experimental directions in #1067 and #983.

```rust
// Before
diskann_index.flat_search(query, ...)?;

// After
disk_index_searcher.flat_search(query, ...).await?;
```

### `DiskIndexSearcher::flat_search` now batched
([#1097](#1097))

The new `DiskIndexSearcher::flat_search` uses the bulk `pq_distances`
path instead of one-vector-at-a-time `Accessor::build_query_computer` +
`evaluate_similarity`. Downstream behavior is equivalent but tighter
resource bounds apply.

### `centroid` removed from PQ interfaces
([#1010](#1010))

The dataset-centroid argument has been removed from `FixedChunkPQTable`
construction, `populate`, and most other PQ APIs. The shift only ever
worked for L2 distance and was silently ignored for inner-product /
cosine, so passing it was a footgun. When an L2 shift is required, fold
it into the PQ pivots instead (the library now does this internally).

```rust
// Before
let table = FixedChunkPQTable::new(.., centroid, ..);

// After — drop the centroid argument
let table = FixedChunkPQTable::new(.., ..);
```

### Flat search interface
([#983](#983))

A new `flat` module in `diskann` adds a provider-agnostic brute-force
search surface, mirroring the shape of graph search. Backends implement
a single trait, `DistancesUnordered<C>` (in `flat/strategy.rs`), which
fuses iteration and distance computation, allowing any backend
(in-memory, quantized, disk, remote) to plug into a shared algorithm.
See the [rendered
RFC](https://github.com/microsoft/DiskANN/blob/main/rfcs/00983-flat-search.md).
This is additive but is the new canonical surface — direct ad-hoc
flat-search call sites should migrate.

### `bf_tree` extracted into `diskann-bftree` crate
([#1020](#1020))

The bf_tree provider has been moved out of `diskann-providers`
(previously at
`diskann-providers/src/model/graph/provider/async_/bf_tree/`) into a new
standalone `diskann-bftree` crate. Along with the move:

- Switched from PQ to spherical quantization.
- Dropped dependencies on `DeletionCheck`, `AsDeletionCheck`, and
`RemoveDeletedIdsAndCopy`.
- Simplified generics.

Consumers must update their `Cargo.toml` to depend on `diskann-bftree`
and update import paths.

### `direct_distance_impl` and `inner_product_raw` re-exposed
([#1081](#1081))

`direct_distance_impl` (free function) and
`FixedChunkPQTable::inner_product_raw` are `pub` again after being
privatized in #1044. Restored to unblock a downstream user. Not breaking
in the typical direction — this restores previously available API
surface.

### MinMax `recompress` takes a grid-scale parameter
([#1109](#1109))

The MinMax `recompress` API now accepts a grid-scale parameter. 

## New Features

- SIMD-optimized L2-squared norm
([#1107](#1107))
- Significantly faster bitmap computation
([#1099](#1099))
- Large speedup on the bitmap construction path used by filtered search.
- LLVM IR bloat regression check in CI
([#1083](#1083))
- CI now flags regressions in generated LLVM IR size, helping catch
unintended monomorphization blow-ups.
- Recall computation fix for under-k groundtruth
([#1069](#1069))

## Merged PRs

* Revise README for DiskANN3 by @harsha-simhadri in
#1046
* [CI] Try to fix publishing step by @hildebrandmw in
#1057
* [benchmark] Remove `DispatchRule` by @hildebrandmw in
#1064
* [benchmark] Automatic Input Registration by @hildebrandmw in
#1066
* Remove centroid from most PQ interfaces by @hildebrandmw in
#1010
* [diskann/disk] Remove `flat_search` from `DiskANNIndex` by
@hildebrandmw in #1076
* macos build and miri check to nightly by @harsha-simhadri in
#1058
* [API] Make some methods public again by @hildebrandmw in
#1081
* [benchmark] Simply `Inputs` more by @hildebrandmw in
#1077
* Turn on stack protection for the diskann-garnet NuGet build by
@jackmoffitt in #1082
* Fix options for diskann-garnet nuget pipeline by @jackmoffitt in
#1091
* [CI] add LLVM IR bloat regression check by @arazumov in
#1083
* Bump openssl from 0.10.79 to 0.10.80 by @dependabot[bot] in
#1093
* [Disk CI benchmarks] Use 1ES.Pool=diskann-github by @arazumov in
#869
* Fix recall computation for fewer than k groundtruth results by
@magdalendobson in #1069
* bf_tree migration away from diskann-providers by @JordanMaples in
#1020
* [RFC/diskann] Overhaul paged search by @hildebrandmw in
#1078
* Remove unsafe code from compute_vec_l2sq by @arazumov in
#1094
* Remove direct accessor call in `diskann-garnet` by @hildebrandmw in
#1098
* Refactor `DiskIndexSearcher::flat_search` to use batching by
@hildebrandmw in #1097
* [flat index] Flat Search Interface by @arkrishn94 in
#983
* migrating multi-hop tests from diskann-providers to diskann by
@JordanMaples in #928
* Significantly speed up bitmap computation by @magdalendobson in
#1099
* `compute_vecs_l2sq`: Replace scalar L2 Squared norm with
SIMD-optimized FastL2NormSquared by @arazumov in
#1107
* [minmax] Add grid scaling to recompress API by @arkrishn94 in
#1109

**Full Changelog**:
v0.52.0...v0.53.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants