Skip to content

Conversation

@Xuanwo
Copy link
Collaborator

@Xuanwo Xuanwo commented Jan 26, 2026

Cherry-pick from main:

Notes:

  • Resolved a cherry-pick conflict in python/src/dataset.rs to keep PyObject-based argument handling for the release branch.
  • Regenerated python/Cargo.lock and java/lance-jni/Cargo.lock after resolving lock conflicts.

Verification:

  • cargo test --manifest-path python/Cargo.toml

Xuanwo and others added 2 commits January 27, 2026 01:46
This PR will expose blob handling APIs to python so that users can just
scan all blobs as binary.

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
This PR will add blob handling support for fragment

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**
@github-actions
Copy link
Contributor

Code Review Summary

This cherry-pick backport looks good to merge. The changes correctly expose the blob_handling APIs to Python for both dataset and fragment scanners.

Observations

No P0/P1 issues found. The implementation:

  1. Correctly uses the existing BlobHandling enum from lance-core
  2. Has proper input validation in both Python and Rust layers
  3. Includes tests for both dataset-level and fragment-level scanning

Minor Note (informational, not blocking)

The docstring for blob_handling in dataset.py:799 has a typo: "blobs_descriptions" vs "blobs_description" (singular vs plural). The code accepts "blobs_descriptions" (plural) which matches the Rust enum variant name. This is consistent but worth noting for documentation clarity.

LGTM for merge. ✓

…#5737)

Fixing hanging Spark test:
lance-format/lance-spark#177

Without this fix, the additioanl azure and gcp dependencies in
credentials vending featrues pulled in native-tls, which requires
installing openssl 1.1. When not present, Spark hangs because rust
execution fails in a separated thread and it fails silently, and then a
semaphore keeps waiting for result.

For why it fails silently rather than propagating back, that's another
fix needed in Spark, but should not block the release of 2.0.0
@github-actions github-actions bot added the java label Jan 26, 2026
Fix typo in python.
@Xuanwo Xuanwo changed the title chore(release): backport blob handling (#5790, #5801) chore(release): backport credential vending removal and blob handling (#5737, #5790, #5801, #5796) Jan 26, 2026
@jackye1995
Copy link
Contributor

known test failures due to pandas 3, merging

@jackye1995 jackye1995 merged commit 8636326 into release/v1.0 Jan 26, 2026
12 of 20 checks passed
@jackye1995 jackye1995 deleted the cherry-pick-5790-5801-release-v1.0 branch January 26, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants