Skip to content

feat!: make dataset object store access base-aware#6647

Merged
jackye1995 merged 1 commit intolance-format:mainfrom
jackye1995:jack/multi-base-caching
May 1, 2026
Merged

feat!: make dataset object store access base-aware#6647
jackye1995 merged 1 commit intolance-format:mainfrom
jackye1995:jack/multi-base-caching

Conversation

@jackye1995
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 commented Apr 29, 2026

Make dataset object-store access explicit for multi-base datasets. The public Dataset::object_store accessor now takes an Option<u32> base id: pass None for the primary dataset store, or Some(base_id) for an additional base referenced by dataset metadata. This is a breaking change from the previous zero-argument accessor.

Adds Dataset::with_object_store_wrappers(...) to clone datasets with wrappers propagated to the primary object store, refs store, primary store params, and every base store params entry. When a fragment, deletion file, or index references a base id, Lance resolves both the correct path location and the corresponding wrapped object store, so wrappers such as caching and instrumentation apply to multi-base reads without changing page cache keys.

Also fixes base-aware read paths that resolved a base_id path but still used the primary dataset object store. Data-file metadata creation, deletion file reads, scalar/vector index opens, index detail inference, index stats, and index migration now resolve the correct base object store before reading.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the enhancement New feature or request label Apr 29, 2026
@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Apr 29, 2026

I'm concerned that every use of Dataset.object_store could be a potential bug, since that only represents the primary object store and not all object stores.

Should we do a change where we turn that field into a Map<base_index, Arc<ObjectStore>> ?

@jackye1995 jackye1995 force-pushed the jack/multi-base-caching branch 2 times, most recently from 3a65cd0 to c732b59 Compare April 29, 2026 23:43
@jackye1995 jackye1995 changed the title feat: propagate object store wrappers to base stores feat!: make dataset object store access base-aware Apr 29, 2026
Copy link
Copy Markdown
Contributor

@LuQQiu LuQQiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-base table is a newly introduce feature, kind of don't want it to break all existing users who is using single base table. Also agree with what @cmccabe mentioned, we don't want it to be sliently incorrect.
Would prefer to keep dataset.object_store() but error out if the dataset is multibase dataset

@jackye1995 jackye1995 force-pushed the jack/multi-base-caching branch from 4b0a195 to 5dc1480 Compare April 30, 2026 21:47
@jackye1995 jackye1995 force-pushed the jack/multi-base-caching branch from 5dc1480 to 34e14cc Compare April 30, 2026 23:27
@jackye1995 jackye1995 merged commit 456198c into lance-format:main May 1, 2026
30 checks passed
wombatu-kun pushed a commit to wombatu-kun/lance that referenced this pull request May 4, 2026
After rebasing onto main, our test added in
"feat: route partial-schema merge_insert through the v2 write path"
no longer compiled because main lance-format#6647 turned the public
`Dataset::object_store()` getter into an async, base-aware method
returning `Result<Arc<ObjectStore>>`. The other call site of
`read_transaction_file` in the same module already uses the
`pub(crate) object_store` field directly via `.as_ref()`; mirror that
pattern here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants