Skip to content

perf: fix O(N·K) slow row-id lookup on stable-row-id datasets#6716

Merged
westonpace merged 2 commits into
lance-format:mainfrom
hamersaw:perf/stable-row-id-take-rows
May 11, 2026
Merged

perf: fix O(N·K) slow row-id lookup on stable-row-id datasets#6716
westonpace merged 2 commits into
lance-format:mainfrom
hamersaw:perf/stable-row-id-take-rows

Conversation

@hamersaw
Copy link
Copy Markdown
Contributor

@hamersaw hamersaw commented May 8, 2026

Dataset._take_rows could be very slow on datasets with stable row IDs enabled and any fragment containing deletions. RowIdIndex::get ends in U64Segment::get(pos), and for RangeWithHoles / RangeWithBitmap this was implemented as self.iter().nth(i) — an O(i) walk through the filtered iterator. Called once per input id in TakeBuilder::get_row_addrs, the total cost was O(N · K). At 1000 ids against a 4M-row dataset with 8 interior deletions, the take ran in 67s; the same workload now completes in ~7ms.

RangeWithHoles::get now binary-searches the hole-count function (which is monotone in holes[k] - k) for O(log H) lookups. RangeWithBitmap::get walks bytes with count_ones until it finds the byte holding the i-th set bit, then trailing_zeros within that byte — O(K/64). RangeWithHoles::position swaps take_while().count() for binary_search, also O(log H).

Adds RowIdIndex::get_many that sorts a working copy of the input and sweeps the chunk iterator once, amortizing the per-id RangeInclusiveMap walk from O(N · log F) to O(F + N). TakeBuilder::get_row_addrs now routes through it.

…letions

`Dataset._take_rows` could effectively hang on datasets with stable row
IDs enabled and any fragment containing deletions. `RowIdIndex::get` ends
in `U64Segment::get(pos)`, and for `RangeWithHoles` / `RangeWithBitmap`
this was implemented as `self.iter().nth(i)` — an O(i) walk through the
filtered iterator. Called once per input id in `TakeBuilder::get_row_addrs`,
the total cost was O(N · K). At 1000 ids against a 4M-row dataset with 8
interior deletions, the take ran in 67s; the same workload now completes
in ~7ms.

`RangeWithHoles::get` now binary-searches the hole-count function (which
is monotone in `holes[k] - k`) for O(log H) lookups. `RangeWithBitmap::get`
walks bytes with `count_ones` until it finds the byte holding the i-th set
bit, then `trailing_zeros` within that byte — O(K/64). `RangeWithHoles::position`
swaps `take_while().count()` for `binary_search`, also O(log H).

Adds `RowIdIndex::get_many` that sorts a working copy of the input and
sweeps the chunk iterator once, amortizing the per-id `RangeInclusiveMap`
walk from O(N · log F) to O(F + N). `TakeBuilder::get_row_addrs` now
routes through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 91.17647% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-table/src/rowids/index.rs 88.88% 3 Missing ⚠️
rust/lance-table/src/rowids/segment.rs 91.89% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@westonpace
Copy link
Copy Markdown
Member

@claude review once

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found — algorithm is correct and the description's math holds up (the binary search uses the monotone holes[k] - k, and the bitmap select1 relies on new_full/From<&[bool]> both zero-padding bits past len, which I verified).

Extended reasoning...

Overview

Three files modified, ~140 net lines:

  • rust/lance-table/src/rowids/segment.rs: rewrites RangeWithHoles::get (O(N)→O(log H) via binary search on the monotone function holes[k]-k), RangeWithBitmap::get (O(N)→O(K/64) via byte popcount + trailing_zeros), and RangeWithHoles::position (O(N)→O(log H)).
  • rust/lance-table/src/rowids/index.rs: adds RowIdIndex::get_many — sorts a working copy of input ids and sweeps the RangeInclusiveMap chunk iterator once.
  • rust/lance/src/dataset/take.rs: routes TakeBuilder::get_row_addrs through get_many.

The motivating bug: Dataset._take_rows on stable-row-id datasets with any deletions degraded from ms to ~67s for 1000 ids on 4M rows.

Security risks

None. This is purely an algorithmic performance fix in internal row-id resolution. No new I/O, no new external input parsing, no auth/crypto/permissions. The new code does not introduce panics — the holes.get(mid).unwrap() is bounded by hi <= holes.len() and mid < hi.

Level of scrutiny

Higher-than-minimum. This is not a mechanical change — it is a non-trivial algorithmic rewrite of code on the hot path for take_rows, with subtle correctness invariants:

  • RangeWithHoles::get correctness depends on holes[k] - k being monotone non-decreasing (true because holes are sorted and unique).
  • RangeWithBitmap::get correctness depends on bytes past bitmap.len() being zero-padded — verified true for new_full (explicitly zeros tail bits) and From<&[bool]> (starts from new_empty which is all-zero).
  • get_many correctness depends on the input being sorted before the sweep, with the else { break } path safe because out is pre-initialized to None.

I worked through these by hand and they hold, but a human maintainer familiar with the rowid layer should still confirm — especially because this code path silently affects all stable-row-id reads.

Other factors

Test coverage is strong: existing proptest in rowids/index.rs, comprehensive test_segments covering all variants of get/position/iteration round-trip, and a new perf-regression test (test_large_range_segments_no_deletions) that would fail if the optimization is undone. Codecov flags 6 uncovered lines (mostly in error/edge paths). The reviewer (westonpace) explicitly asked for a review pass, which also suggests this should not skip a human.

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the fix. I think my only question is whether we should also close #6486 now?

@westonpace westonpace merged commit e12f421 into lance-format:main May 11, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants