feat: support segmented btree indices#6605
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@claude review |
| let (page_lookup_file, standalone_partition_page_file) = | ||
| match store.open_index_file(BTREE_LOOKUP_NAME).await { | ||
| Ok(page_lookup_file) => (page_lookup_file, None), | ||
| Err(original_err) => { | ||
| let files = store.list_files_with_sizes().await?; | ||
| let Some((lookup_file, page_file)) = find_single_partition_files(&files)? | ||
| else { | ||
| return Err(original_err); | ||
| }; | ||
| ( | ||
| store.open_index_file(lookup_file).await?, | ||
| Some(page_file.to_string()), | ||
| ) | ||
| } | ||
| }; |
There was a problem hiding this comment.
🟡 BTreeIndex::load falls back to loading standalone part_* segment files whenever open_index_file(BTREE_LOOKUP_NAME) returns any Err, not just NotFound. For the normal segmented-btree case where the merged file was never written this is correct, but if the merged page_lookup.lance exists yet fails to open for a transient IO, permission, version-conflict, or corruption reason — and exactly one orphan part_*_page_lookup.lance/part_*_page_data.lance pair happens to coexist — the loader will silently return an index backed by the stale partition pair and drop the original error. Narrowing the fallback to err.is_not_found() (see Error::NotFound in rust/lance-core/src/error.rs) would surface the real failure at essentially zero cost.
Extended reasoning...
What the bug is
In rust/lance-index/src/scalar/btree.rs around lines 1219-1233, BTreeIndex::load has a new fallback:
let (page_lookup_file, standalone_partition_page_file) =
match store.open_index_file(BTREE_LOOKUP_NAME).await {
Ok(page_lookup_file) => (page_lookup_file, None),
Err(original_err) => {
let files = store.list_files_with_sizes().await?;
let Some((lookup_file, page_file)) = find_single_partition_files(&files)?
else {
return Err(original_err);
};
...
}
};The Err(original_err) arm catches any error from open_index_file, not just a missing-file error. Error::NotFound is an explicit enum variant (rust/lance-core/src/error.rs:155), and open_index_file in rust/lance-index/src/scalar/lance_format.rs:272 can also return transient IO errors, permission errors, VersionConflict that falls back to v1 and then fails, and corrupt-file errors.
How it manifests
If the merged page_lookup.lance physically exists but fails to open for a non-NotFound reason, and find_single_partition_files finds exactly one part_*_page_lookup.lance/part_*_page_data.lance pair with matching partition IDs, the loader silently constructs a BTreeIndex backed by the stale/orphan partition pair and discards original_err. Subsequent queries return results from the wrong set of rows rather than surfacing the real error.
Step-by-step proof
- A merge operation writes
page_lookup.lancebut the write is interrupted or the underlying storage corrupts the footer. - Per
cleanup_partition_files(lines ~2231 / ~2305), cleanup logs and continues on failure. If a merge-cleanup race leaves one orphanpart_0_page_lookup.lance/part_0_page_data.lancepair behind (e.g., the merge source set had one partition, or cleanup partially succeeded for the other partitions), the fallback preconditionlookup_files.len() == 1 && page_files.len() == 1is met. store.open_index_file(BTREE_LOOKUP_NAME).awaitreturnsErr(corruption_err)(not NotFound).- The match arm takes the
Errbranch, lists files, finds the single orphan pair, opens the partition lookup file successfully, and constructs aBTreeIndexfrom the stale pair. corruption_erris never propagated; the caller seesOk(index)and queries return data from the stale partition.
Addressing the refutation
The refutation is largely correct that the trigger scenario is narrow, and I agree the severity is low. A few points to refine:
- The analogy to
inverted/index.rs:652-725is fair, but "consistent with prior code" does not make the fallback correct — if the same pattern is used elsewhere, those call sites share the same latent gap. - "Cleanup failure would leave N files, not 1" is true for a multi-partition merge, but segments are written per-fragment and can legitimately have a single partition per segment. In that case, the merged file has been written (normal segmented-btree flow still goes through
merge_metadata_files), and any corruption of that merged file will coexist with the one-partition orphan scenario. - "Loading from partition files when they exist is correct behavior" is only true when the partition files are the current source of truth. Once a merged file has been written, the partition files become stale — they may reflect an earlier version of the index.
How to fix
Narrow the match to NotFound-like errors:
Err(original_err) if original_err.is_not_found() => { ... }
Err(other) => return Err(other),(or add an is_not_found() helper on Error that checks matches!(self, Error::NotFound { .. }) and any equivalent object-store variants).
Impact
Low. Requires a corrupted/partial merged file and exactly one orphan part_* pair surviving cleanup. All three independent verifiers agreed on "nit". The fix is a one-line change that removes a silent-failure path at no cost.
There was a problem hiding this comment.
@beinan could you help address this? This is a valid error
There was a problem hiding this comment.
Fixed in ca0d6b191.
I narrowed the fallback so we only try the standalone part_* BTREE files for true missing-lookup cases, while preserving other open failures. In this code path the missing lookup can surface either as Error::NotFound or as Error::IO wrapping an object-store NotFound, so the final check handles both shapes and still returns all other errors unchanged.
Reran the focused BTREE coverage after the fix:
cargo test -p lance test_open_named_scalar_index_uses_all_btree_segments -- --nocapturecargo test -p lance test_btree_segment_search_is_exact_across_fragments -- --nocapture
Both passed locally.
jackye1995
left a comment
There was a problem hiding this comment.
sorry I missed the update notification... looks good to me!
Summary
Add segmented BTREE support on top of the logical scalar index segment path that landed for zonemap.
This teaches BTREE loading to open committed standalone
part_*segment artifacts when a mergedpage_lookup.lanceis not present, and adds coverage that logical BTREE segments load together, return exact results across fragments, and drive scanner prefix pruning.Testing