[RFC/diskann] Overhaul paged search#1078
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1078 +/- ##
==========================================
+ Coverage 89.46% 90.57% +1.10%
==========================================
Files 473 474 +1
Lines 89653 89740 +87
==========================================
+ Hits 80212 81278 +1066
+ Misses 9441 8462 -979
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR overhauls DiskANN’s paged (iterative) search API to remove the SearchState<..., ExtraState: 'static> pattern and instead return a lifetime-bound PagedSearch<'a, ...> handle, enabling non-'static query computers/strategies and reducing trait-bound complexity. It also updates downstream wrappers/tests and adds an RFC documenting a channel-based pattern for crossing tokio::spawn/FFI boundaries.
Changes:
- Remove the
'staticbound fromBuildQueryComputer::QueryComputer. - Replace the
start_paged_search/next_search_resultsAPI withDiskANNIndex::paged_search{_with_init_ids}returning aPagedSearchhandle withnext_page. - Update
diskann-providerssync wrapper + test cases, and add an RFC describing the new model and migration guidance.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| rfcs/01078-paged-search.md | RFC describing the motivation, new API shape, and a channel-based spawned-task usage pattern. |
| diskann/src/provider.rs | Drops 'static from BuildQueryComputer::QueryComputer to allow borrowed query computers. |
| diskann/src/graph/test/cases/paged_search.rs | Updates paged-search tests to use PagedSearch::next_page. |
| diskann/src/graph/search/scratch.rs | Gates SearchScratch::search_l() behind #[cfg(test)]. |
| diskann/src/graph/search/paged.rs | Introduces the new PagedSearch handle implementation and paging logic. |
| diskann/src/graph/search/mod.rs | Wires the new paged module and re-exports PagedSearch. |
| diskann/src/graph/index.rs | Removes old SearchState/paged-search API and adds paged_search{_with_init_ids} constructors returning PagedSearch. |
| diskann-providers/src/index/wrapped_async.rs | Updates synchronous wrapper to return a blocking PagedSearch wrapper around the async handle. |
| diskann-providers/src/index/diskann_async.rs | Updates async provider tests/helpers to use PagedSearch::next_page. |
Comments suppressed due to low confidence (2)
diskann-providers/src/index/wrapped_async.rs:356
- These synchronous wrapper methods still require
S: SearchStrategy<DP, T> + 'static, but the underlyingDiskANNIndex::paged_searchno longer needs'static. Keeping this bound unnecessarily restricts callers from using non-'staticstrategies (the main goal of this RFC). Consider dropping the+ 'staticbound here as well.
pub fn paged_search<'a, S, T>(
&'a self,
strategy: S,
context: &'a DP::Context,
query: T,
l_value: usize,
) -> ANNResult<PagedSearch<'a, DP, S, T>>
where
S: SearchStrategy<DP, T> + 'static,
T: Copy + Send + 'a,
{
diskann/src/graph/index.rs:2211
computed_resultis initialized withvec![Neighbor::default(); l_value]andnext_result_indexis set tol_valueto represent an empty cache. SincePagedSearch::next_pagenow returns an ownedVec, you can avoid the O(l_value) initialization cost by usingVec::with_capacity(l_value)(orVec::new()) and startingnext_result_indexat 0.
ANNResult::Ok(PagedSearch {
index: self,
context,
scratch,
computed_result: vec![Neighbor::default(); l_value],
next_result_index: l_value,
search_param_l: l_value,
strategy,
computer,
_query: std::marker::PhantomData,
})
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Paged search has been causing all kinds of issues for our code base and is actively getting in the way of simplifications in #1067 due to interactions with the
PagedSearchState. The TLDR of the issue is thatPagedSearchStaterequires types to be'staticand introduces the need to "pause" and "resume" search state in a way that is complex to describe in trait bounds.Since our code is already async, we can lean into that and use the usual Rust machinery to embed non-
'staticpaged searcher inside an otherwise'staticfuture. The recommended way to now interact with paged search is via channels.Rendered RFC
API Migration Guide
index.start_paged_search(s, ctx, q, l).awaitindex.paged_search(s, ctx, q, l).awaitindex.next_search_results(ctx, &mut state, k, &mut buf).awaitsearch.next_page(k).awaitSearchState<Id, (S, C)>PagedSearch<'a, DP, S, T>PagedSearchState<DP, S, C>PagedSearch<'a, DP, S, T>page.is_empty()If existing code embedded the
SearchStatein some'staticcontainer, that is no longer viable because of the borrow. Instead, channels can be used for this communication:If code was already explicitly using a
.awaitloop withSearchState, then minimal changes should be needed.For Users of Paged Search via
wrapped_asyncUsers of paged search via
wrapped_async::DiskANNIndexthat know their inner futures will never suspend can use the newwrapped_async::DiskANNIndex::paged_search_no_await. This will use the new API transparently viawrapped_async::noawait::PagedSearchand efficiently run paged searches with minimal synchronization overhead.This should only be used if the implementation of
Accessor,BuildQueryComputer,SearchExt,DataProvider, andExpandBeamare known to never yield and always complete withPoll::Ready.