-
Notifications
You must be signed in to change notification settings - Fork 513
fix: critical fixes for 1.0.0-rc3 #5421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jackye1995
merged 9 commits into
lance-format:release/v1.0
from
jackye1995:release-1.0-patch-btree
Dec 5, 2025
Merged
fix: critical fixes for 1.0.0-rc3 #5421
jackye1995
merged 9 commits into
lance-format:release/v1.0
from
jackye1995:release-1.0-patch-btree
Dec 5, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The btree and bitmap index are important and critical features but lacking a good micro-benchmark for the search (we do have some macro-benchmarks). This PR adds micro benchmarks for btree and bitmap search and covers a variety of scenarios. * Floats (fixed-width data) vs. Strings (variable width data) * Equality vs. Range * Small result set vs. Large result set * High cardinality data vs. Low cardinality data
lance-format@82df343 parallelizes bitmap loading but it introduced a clone of the bitmap index. Since this was a clone of the index itself and not just an arc clone it ended up being a very expensive deep clone, especially at scale. It also led to unbounded memory usage since all clones needed to live at the same time and this could cause the search to crash. Fortunately, these clones are not needed, so we can simply remove them.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
wjones127
approved these changes
Dec 5, 2025
Part of lancedb/lancedb#2771, continuing lancedb/lancedb#2568. `lance-namespace-impls` uses `lance` which pulled in default features (aws,gcp,azure,oss), which pulled `aws-*` crates into `lancedb` even when `lancedb` was used with `default-features = false`. I don't have tests at hand, but it would probably be good to run `cargo tree -p lance -e no-build -e no-dev --no-default-features -i aws-config` in CI (maybe for each crate?) to avoid adding any dependency back by default in a future PR. --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
This PR will remove cached wheels before building to address CI broken like: ```shell Processing ./target/wheels/pylance-1.1.0b1-cp39-abi3-manylinux_2_28_aarch64.whl Processing ./target/wheels/pylance-1.1.0b2-cp39-abi3-manylinux_2_28_aarch64.whl (from pylance==1.1.0b2) ERROR: Cannot install pylance 1.1.0b1 (from /runner/_work/lance/lance/python/target/wheels/pylance-1.1.0b1-cp39-abi3-manylinux_2_28_aarch64.whl) and pylance 1.1.0b2 (from /runner/_work/lance/lance/python/target/wheels/pylance-1.1.0b2-cp39-abi3-manylinux_2_28_aarch64.whl) because these package versions have conflicting dependencies. The conflict is caused by: The user requested pylance 1.1.0b1 (from /runner/_work/lance/lance/python/target/wheels/pylance-1.1.0b1-cp39-abi3-manylinux_2_28_aarch64.whl) The user requested pylance 1.1.0b2 (from /runner/_work/lance/lance/python/target/wheels/pylance-1.1.0b2-cp39-abi3-manylinux_2_28_aarch64.whl) Additionally, some packages in these conflicts have no matching distributions available for your environment: pylance To fix this you could try to: 1. loosen the range of package versions you've specified 2. remove package versions to allow pip to attempt to solve the dependency conflict ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts ``` --- **Parts of this PR were drafted with assistance from Codex (with `gpt-5.1-codex-max`) and fully reviewed and edited by me. I take full responsibility for all changes.** Signed-off-by: Xuanwo <github@xuanwo.io>
I've noticed that debug / info logging has been flooded lately with things like... ``` [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 [2025-11-18T14:12:05Z INFO lance::events] target="lance::execution" type="plan_run" plan_summary="Projection(OneShot)" output_rows=2 iops=0 requests=0 bytes_read=0 indices_loaded=0 parts_loaded=0 index_comparisons=0 ``` This is because we create and execute a dummy plan to apply projection. This "dummy plan execution" is overly heavyweight on its own and should be optimized (see lance-format#5069). However, this short-term patch will at least remove the extraneous logging
If we log `self` then it will log the object store which can be a lot of info (especially if it is a memory object store)
This adds in-memory datasets to the random access benchmarks and adds a parallel version of the benchmark. In addition we instrument the project_batch function which currently is too heavyweight and it is useful to track.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
86e3653 to
06bb00e
Compare
…ance-format#5392) When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses to `blob::take_blobs` which expected row IDs. Root cause: - `take_blobs_by_indices` converts indices to row addresses - It passed these addresses to `take_blobs` which calls `take_builder` - `TakeBuilder.get_row_addrs()` looked up the values in the row ID index - For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident - For fragment 1+: addresses (4294967296+) didn't match any row IDs - This caused empty results and missing `_rowaddr` column → panic Fix: - Add `take_blobs_by_addresses()` that uses `TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup - Update `take_blobs_by_indices` to call the new function - Add defensive fix in `do_take_rows` to include `_rowaddr` column in empty batches 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>
06bb00e to
9023ad4
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Patch to release/v1.0
This also include related PRs to making the related benchmark and CI work