perf(index): fast-path total rows and cache index_statistics output#6582
Open
justinrmiller wants to merge 2 commits intolance-format:mainfrom
Open
perf(index): fast-path total rows and cache index_statistics output#6582justinrmiller wants to merge 2 commits intolance-format:mainfrom
justinrmiller wants to merge 2 commits intolance-format:mainfrom
Conversation
`Dataset::index_statistics` called `count_rows(None)` on every invocation purely to compute `num_unindexed_rows = total - indexed`. On large or legacy datasets `count_rows` fans out per-fragment and can open fragment data files, making it the dominant cost. In addition, repeat calls at the same manifest version recomputed the full JSON from scratch even though the answer is deterministic. P1: introduce `manifest_total_rows` which sums `Fragment::num_rows()` from the manifest. If every fragment has `physical_rows`, the total is resolved in memory. If any is missing, fall through to the existing `count_rows(None)` path — no correctness regression for legacy datasets. P2: wrap `index_statistics` with a `DSIndexCache` get/insert keyed on `(manifest_version, index_name)`. The cache already scopes entries by dataset URI; manifest version is monotonic and bumps on every operation that can change the result (append, delete, compact, index create/optimize/drop), so invalidation is automatic — no write path ever needs to touch the cache. Both changes are additive. Failed calls don't populate the cache. No public API or manifest format change. Benchmarks (10M rows, 10K fragments, local SSD, Apple Silicon): count_rows_baseline 1.82 ms legacy_cold 5.33 ms (pre-P1/P2) cold 3.56 ms (P1 only; -33% / -1.77 ms) cached 21.7 µs (P1 + P2; ~245x vs legacy_cold) P1 savings scale linearly with fragment count (~200 ns/fragment in-memory; orders of magnitude more on legacy formats or slow object stores where `count_rows` reads data files). P2 collapses repeat calls to a moka lookup plus a string clone. Tests added: - test_index_statistics_row_counts_match_count_rows - test_manifest_total_rows_missing_metadata_returns_none - test_index_statistics_cache_hit_avoids_io - test_index_statistics_cache_invalidates_on_manifest_bump - test_index_statistics_cache_distinguishes_index_names Benchmark added: - benches/index_stats.rs (legacy vs cold vs cached, parity check)
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two optimizations to
Dataset::index_statisticsthat together turn a ~5 ms call into ~22 µs for the common repeat-call pattern on a 10 M-row / 10 K-fragment dataset, with exact-behavior fallback for legacy datasets.P1 — skip
count_rows(None)when the manifest can answergather_fragment_statisticswas callingDataset::count_rows(None)purely to derivenum_unindexed_rows = total - indexed. On modern datasets that's an in-memory fan-out; on legacy fragments withoutphysical_rows/writer_versionit opens the first data file of every fragment from object storage. Newmanifest_total_rowshelper sumsFragment::num_rows()in memory and falls through to the existingcount_rowspath only when any fragment is missing that metadata.P2 — cache the JSON keyed on manifest version
index_statisticsis a pure function of(dataset URI, manifest version, index name). TheDSIndexCachealready scopes entries by dataset URI; manifest version bumps on every operation that can change the answer (append, delete, compact, index create/optimize/drop), so keying on it gives automatic invalidation with no write-path coupling. NewIndexStatisticsKeyalongside the existingIndexMetadataKey/ScalarIndexDetailsKey;index_statisticswraps the existing dispatch with a get/insert.Errors don't poison the cache. No manifest format change, no public API change.
Benchmarks
New
benches/index_stats.rs. Runs four paths in a single process on a shared fixture:count_rows_baseline— cost of thecount_rows(None)fan-out alone.legacy_cold— pre-P1/P2 behavior, exercised via#[doc(hidden)] bench_legacy_index_statistics.cold— current behavior, fresh session per iter.cached— current behavior, warm cache.A startup parity check asserts the legacy and current paths produce identical counters (
num_indexed_rows,num_unindexed_rows,num_indexed_fragments,num_unindexed_fragments,num_indices) before measurement begins.BTree index on
Int32, local SSD, Apple Silicon,--sample-size 10 --measurement-time 2s.Scaling across fixture size
count_rows_baselinelegacy_coldcold(P1)cached(P1 + P2)Observations:
count_rowscost scales linearly with fragment count (~180–200 ns/fragment in-memory). On legacy-format fragments or slow object stores it grows by orders of magnitude — each fragment may require aHEADplus a range read to recoverphysical_rows.legacy_cold − count_rows_baseline, which is the theoretical floor.Test plan
cargo test -p lance --lib -- index::tests::— 68 passing, including 5 new:test_index_statistics_row_counts_match_count_rows(multi-fragment + partial index + deletes + append; assertsindexed + unindexed == count_rows(None))test_manifest_total_rows_missing_metadata_returns_none(forces thecount_rowsfallback)test_index_statistics_cache_hit_avoids_io(second call does zero reads perio_tracker)test_index_statistics_cache_invalidates_on_manifest_bump(append +optimize_indicesboth return fresh numbers)test_index_statistics_cache_distinguishes_index_names(two indices on the same dataset don't collide)cargo clippy --all --tests --benches -- -D warningscargo fmt --all -- --checkcargo bench -p lance --bench index_stats— parity check passes, numbers above.Not in this change
Left for follow-up, in rough priority order:
indexed_fragments(currently O(deltas × fragments) withFragmentclones). Cheap rewrite, ~5–10× on many-delta indices.load_statisticsfor BTree / Inverted / BloomFilter / LabelList — only Bitmap has it today; others fall back toopen_generic_index, which materializes the full index file. 20–50× potential on cold-cache calls, medium-effort per-plugin work.num_indexed_rowsonIndexMetadata— manifest format change; probably unnecessary after P1 + P2.index_stats; test(java): re-construct tests for merge operation #4483 added a Phalanx-side timeout as mitigation.