feat: generalize storage_options for GCS/Azure and any object_store backend#46
Merged
beinan merged 1 commit intolance-format:mainfrom Apr 22, 2026
Merged
Conversation
…ackend Closes lance-format#45. The Rust+PyO3 layer already forwarded an arbitrary `storage_options` dict through to lance's `DatasetBuilder`, but the Python constructor only exposed ergonomic shortcuts for AWS keys, which made it look like GCS/Azure weren't supported and blocked work on a tiered memory layout on `gs://` URIs. This change: * Promotes `storage_options` as the canonical, backend-agnostic way to configure remote stores (aligned with `lance` and `lance-graph`). * Keeps the AWS-specific kwargs (`aws_access_key_id`, `region`, ...) as backwards-compatible shortcuts that now emit a single `DeprecationWarning` pointing callers at `storage_options`. * Adds unit tests for the storage_options merge / pass-through logic. * Adds an opt-in real-GCS integration test gated on `LANCE_CONTEXT_GCS_BUCKET` plus one of `LANCE_CONTEXT_GCS_SERVICE_ACCOUNT_KEY` / `GOOGLE_APPLICATION_CREDENTIALS` / `LANCE_CONTEXT_GCS_ENDPOINT`. * Switches the existing S3 integration test to the canonical `storage_options=` path and adds a companion back-compat test that asserts the AWS kwargs still work and emit the deprecation warning. * Fixes the moto.server invocation for moto >= 5 (dropped the positional service argument) and pulls in `moto[s3,server]` so flask is available for the subprocess; without this the S3 suite was being silently skipped on modern environments. * Updates README with GCS and Azure examples and marks lance-format#14 / lance-format#45 as done in the roadmap. No behavior change for local or S3 callers that pass `storage_options=`. Made-with: Cursor
9e515bd to
e5a2fc7
Compare
beinan
approved these changes
Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #45.
The Rust + PyO3 layer already forwards an arbitrary
storage_optionsdict through to lance'sDatasetBuilder::with_storage_options, but the Python constructor only exposed ergonomic shortcuts for AWS keys, which made it look like GCS/Azure weren't supported and blocked tiered-memory work ongs://URIs.This PR:
storage_optionsas the canonical, backend-agnostic way to configure remote stores — aligned with howlanceandlance-graphhandle backends.aws_access_key_id,aws_secret_access_key,aws_session_token,region,endpoint_url,allow_httpas backwards-compatible shortcuts that now emit a singleDeprecationWarningpointing callers atstorage_options.python/tests/test_storage_options.pywith 8 unit tests covering merge / pass-through / precedence / deprecation semantics.python/tests/test_gcs_persistence.py, an opt-in real-GCS integration test gated onLANCE_CONTEXT_GCS_BUCKET+ one ofLANCE_CONTEXT_GCS_SERVICE_ACCOUNT_KEY/GOOGLE_APPLICATION_CREDENTIALS/LANCE_CONTEXT_GCS_ENDPOINT(for `fake-gcs-server`-style emulators).No behavior change for local or S3 callers that already pass `storage_options=`.
Notes
An emulator-based GCS test was explored with `gcp-storage-emulator` but has two compat issues outside this repo's scope: (1) it imports `fs` (pyfilesystem2) which calls `pkg_resources.declare_namespace`, broken under setuptools >= 81 (Python 3.12+); (2) its JSON responses fail OpenDAL's deserializer with `invalid type: null, expected a string`. The opt-in integration test works against `fake-gcs-server` (Go binary) and real GCS, which is what the UP+GCS tiered-memory rollout will actually use.
Test plan
Made with Cursor