fix: critical fixes for 1.0.0-rc3 #5421

jackye1995 · 2025-12-05T19:35:47Z

Patch to release/v1.0

This also include related PRs to making the related benchmark and CI work

The btree and bitmap index are important and critical features but lacking a good micro-benchmark for the search (we do have some macro-benchmarks). This PR adds micro benchmarks for btree and bitmap search and covers a variety of scenarios. * Floats (fixed-width data) vs. Strings (variable width data) * Equality vs. Range * Small result set vs. Large result set * High cardinality data vs. Low cardinality data

lance-format@82df343 parallelizes bitmap loading but it introduced a clone of the bitmap index. Since this was a clone of the index itself and not just an arc clone it ended up being a very expensive deep clone, especially at scale. It also led to unbounded memory usage since all clones needed to live at the same time and this could cause the search to crash. Fortunately, these clones are not needed, so we can simply remove them.

chatgpt-codex-connector · 2025-12-05T19:35:51Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Part of lancedb/lancedb#2771, continuing lancedb/lancedb#2568. `lance-namespace-impls` uses `lance` which pulled in default features (aws,gcp,azure,oss), which pulled `aws-*` crates into `lancedb` even when `lancedb` was used with `default-features = false`. I don't have tests at hand, but it would probably be good to run `cargo tree -p lance -e no-build -e no-dev --no-default-features -i aws-config` in CI (maybe for each crate?) to avoid adding any dependency back by default in a future PR. --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>

This PR will remove cached wheels before building to address CI broken like: ```shell Processing ./target/wheels/pylance-1.1.0b1-cp39-abi3-manylinux_2_28_aarch64.whl Processing ./target/wheels/pylance-1.1.0b2-cp39-abi3-manylinux_2_28_aarch64.whl (from pylance==1.1.0b2) ERROR: Cannot install pylance 1.1.0b1 (from /runner/_work/lance/lance/python/target/wheels/pylance-1.1.0b1-cp39-abi3-manylinux_2_28_aarch64.whl) and pylance 1.1.0b2 (from /runner/_work/lance/lance/python/target/wheels/pylance-1.1.0b2-cp39-abi3-manylinux_2_28_aarch64.whl) because these package versions have conflicting dependencies. The conflict is caused by: The user requested pylance 1.1.0b1 (from /runner/_work/lance/lance/python/target/wheels/pylance-1.1.0b1-cp39-abi3-manylinux_2_28_aarch64.whl) The user requested pylance 1.1.0b2 (from /runner/_work/lance/lance/python/target/wheels/pylance-1.1.0b2-cp39-abi3-manylinux_2_28_aarch64.whl) Additionally, some packages in these conflicts have no matching distributions available for your environment: pylance To fix this you could try to: 1. loosen the range of package versions you've specified 2. remove package versions to allow pip to attempt to solve the dependency conflict ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts ``` --- **Parts of this PR were drafted with assistance from Codex (with `gpt-5.1-codex-max`) and fully reviewed and edited by me. I take full responsibility for all changes.** Signed-off-by: Xuanwo <github@xuanwo.io>

I've noticed that debug / info logging has been flooded lately with things like... ``` [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 ``` This is because we create and execute a dummy plan to apply projection. This "dummy plan execution" is overly heavyweight on its own and should be optimized (see lance-format#5069). However, this short-term patch will at least remove the extraneous logging

If we log `self` then it will log the object store which can be a lot of info (especially if it is a memory object store)

This adds in-memory datasets to the random access benchmarks and adds a parallel version of the benchmark. In addition we instrument the project_batch function which currently is too heavyweight and it is useful to track.

codecov · 2025-12-05T20:01:32Z

Codecov Report

❌ Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-datafusion/src/exec.rs	66.66%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

…ance-format#5392) When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses to `blob::take_blobs` which expected row IDs. Root cause: - `take_blobs_by_indices` converts indices to row addresses - It passed these addresses to `take_blobs` which calls `take_builder` - `TakeBuilder.get_row_addrs()` looked up the values in the row ID index - For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident - For fragment 1+: addresses (4294967296+) didn't match any row IDs - This caused empty results and missing `_rowaddr` column → panic Fix: - Add `take_blobs_by_addresses()` that uses `TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup - Update `take_blobs_by_indices` to call the new function - Add defensive fix in `do_take_rows` to include `_rowaddr` column in empty batches 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>

westonpace added 2 commits December 5, 2025 11:27

github-actions bot added bug Something isn't working python labels Dec 5, 2025

wjones127 approved these changes Dec 5, 2025

View reviewed changes

jackye1995 changed the title ~~fix: remove expensive clone in bitmap search~~ fix: critical fixes for 1.0.0-rc3 Dec 5, 2025

valkum and others added 6 commits December 5, 2025 11:55

perf: do not instrument self in multipart upload (lance-format#5416)

6025f41

If we log `self` then it will log the object store which can be a lot of info (especially if it is a memory object store)

test: extend random access benchmarks (lance-format#5417)

8d28f7b

This adds in-memory datasets to the random access benchmarks and adds a parallel version of the benchmark. In addition we instrument the project_batch function which currently is too heavyweight and it is useful to track.

ci: bump up timeout for benchmark regression job (lance-format#5420)

3707e42

jackye1995 force-pushed the release-1.0-patch-btree branch from 86e3653 to 06bb00e Compare December 5, 2025 20:03

jackye1995 force-pushed the release-1.0-patch-btree branch from 06bb00e to 9023ad4 Compare December 5, 2025 20:20

jackye1995 merged commit 0140adf into lance-format:release/v1.0 Dec 5, 2025
30 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: critical fixes for 1.0.0-rc3 #5421

fix: critical fixes for 1.0.0-rc3 #5421

Uh oh!

jackye1995 commented Dec 5, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 5, 2025

Uh oh!

codecov bot commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix: critical fixes for 1.0.0-rc3 #5421

fix: critical fixes for 1.0.0-rc3 #5421

Uh oh!

Conversation

jackye1995 commented Dec 5, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 5, 2025

Uh oh!

codecov bot commented Dec 5, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants