feat: add lance_dataset_drop_columns for metadata-only column removal#42
Open
LuciferYang wants to merge 1 commit into
Open
feat: add lance_dataset_drop_columns for metadata-only column removal#42LuciferYang wants to merge 1 commit into
LuciferYang wants to merge 1 commit into
Conversation
First of three PRs covering the schema-evolution roadmap entry. Exposes upstream's `drop_columns` — a metadata-only manifest commit that removes the named columns from the schema without rewriting data files.
Contributor
Author
|
The macOS arm64 leg of |
jja725
pushed a commit
that referenced
this pull request
May 23, 2026
## Summary The macOS arm64 consumer-smoke-test job has been failing on `main` since #24 with a long list of unresolved `_IO*` symbols (`_IOObjectRelease`, `_IOServiceMatching`, `_IOHIDEventSystemClientCreate`, `_IORegistryEntryCreateCFProperty`, …) — sample run: https://github.com/lance-format/lance-c/actions/runs/26272649710. Root cause is plumbing, not the consumer example: `sysinfo` (pulled in transitively via the lance crates) calls IOKit on macOS for disk enumeration, CPU frequency, and thermal sensors, and `objc2_io_kit` declares the binding. Cargo's `rustc-link-lib=framework=IOKit` is honored when this repo builds, but a downstream consumer linking against the installed `liblance_c.a` via `find_package(LanceC)` (or pkg-config) only sees the frameworks we declare in our config files — and IOKit was missing. Add `-framework IOKit` next to the existing `CoreFoundation` / `Security` / `SystemConfiguration` entries in all three mirroring places: - `CMakeLists.txt` — build-tree `LanceC_platform_deps` interface library - `cmake/LanceCConfig.cmake.in` — installed `find_package(LanceC)` consumers - `CMakeLists.txt` — pkg-config `Libs.private` ## Verification Same `cmake --install` → `examples/cmake-consumer` build path the CI runs, on arm64 macOS (15.0 SDK, AppleClang 17): ``` $ cmake --install build --prefix _install $ cmake -S examples/cmake-consumer -B consumer-build -DCMAKE_PREFIX_PATH="$PWD/_install" $ cmake --build consumer-build … [100%] Built target consumer $ consumer-build/consumer usage: consumer <dataset_uri> $ echo $? 2 ``` Before the patch the same sequence dies at link with `Undefined symbols for architecture arm64`. After it, the link succeeds and the binary exits 2 (usage error) as the CI step expects. ## After this lands Unblocks the consumer-smoke macOS leg for every open PR — #42 (schema-evolution drop_columns) hits this exact failure on its CI run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First of three PRs against #41 (schema evolution). Exposes upstream's
drop_columns— a metadata-only manifest commit that removes the named columns from the schema without rewriting any data files. Materializing the projection is left to a later_compact_files(and a future cleanup operation, once exposed, removes the old version's files).Mutates the dataset in place under an exclusive write lock; scanners already in flight keep their pre-drop snapshot view via the existing Arc clone-on-write, same as
_delete/_update/_compact_files.Surface
Inputs are validated up front with per-index error messages so the precise cause is observable from
lance_last_error_message(). NULL handle, NULL pointer array, zero count, NULL or empty-string entries, and non-UTF-8 names all returnLANCE_ERR_INVALID_ARGUMENT; upstream's own rejections (unknown column, attempt to drop every column) map to the same code.The C++ wrapper takes
const std::vector<std::string>&and follows theupdate/merge_insertsibling convention — passescol_ptrs.data()unconditionally. An empty vector flows through the Rust-sidenum_columns == 0guard so the error message says "num_columns must be > 0" rather than the misleading "columns must not be NULL".Tests
Eleven new Rust integration tests covering single-drop, multi-drop, version bump, data preservation (downcasts the surviving Arrow columns and checks the actual values, not just shape), and the full rejection surface (NULL dataset / NULL array / zero count / NULL entry / empty-string entry / unknown column / drop-all). C and C++ smoke tests snapshot
ArrowSchema.n_childrenpre/post drop, exercise the drop-last-column rejection path, and verify the version is unchanged when a drop fails.cargo testandcargo test --test compile_and_run_test -- --ignoredboth green.Follow-ups
lance_dataset_alter_columns— rename / nullability / type changelance_dataset_add_columns— SQL expressions / AllNulls / ArrowArrayStreamThe README roadmap entry stays unticked until all three ship.