feat: add DataFile.create helper for building DataFile metadata#6427
Merged
westonpace merged 3 commits intolance-format:mainfrom Apr 8, 2026
Merged
feat: add DataFile.create helper for building DataFile metadata#6427westonpace merged 3 commits intolance-format:mainfrom
westonpace merged 3 commits intolance-format:mainfrom
Conversation
…lance files Adds a convenience method `DataFile.create(dataset, path)` that reads a lance file's metadata and automatically determines field IDs, column indices, file version, and file size — eliminating the need for manual DataFile construction when performing DataReplacement operations. Closes lance-format#6413 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the core logic from the Python binding into a public async method on Dataset so Rust users can also construct DataFile metadata from existing lance files. The Python binding is now a thin wrapper. Also refactors data_file_dir to reuse the new data_file_dir_for_base helper, removing duplicated base path resolution logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
wjones127
approved these changes
Apr 7, 2026
Comment on lines
+1747
to
+1749
| let file = scheduler | ||
| .open_file(&filepath, &CachedFileSize::unknown()) | ||
| .await?; |
Contributor
There was a problem hiding this comment.
nitpick: you just read the file size, so you should be able to pass it here:
Suggested change
| let file = scheduler | |
| .open_file(&filepath, &CachedFileSize::unknown()) | |
| .await?; | |
| let file = scheduler | |
| .open_file(&filepath, &CachedFileSize::new(file_size)) | |
| .await?; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DataFile.create(dataset, path, *, base_id=None)classmethod that reads a lance file's metadata and automatically constructs aDataFilewith correct field IDs, column indices, file version, and file sizeDataFileconstruction when performingDataReplacementoperationsCloses #6413
Test plan
test_data_file_create_basic— verifies fields, column_indices, version, file_size for a two-column filetest_data_file_create_subset_columns— single column from a multi-column datasettest_data_file_create_end_to_end— full DataReplacement round-trip using the new helpertest_data_file_create_unknown_column— error on column not in dataset schematest_table_ops.pytests still pass🤖 Generated with Claude Code