Skip to content

feat: ✨ Allow writing to Zarr via xarray_tensorstore._TensorStoreAdapter#10

Merged
copybara-service[bot] merged 17 commits intogoogle:mainfrom
rhoadesScholar:main
Nov 6, 2024
Merged

feat: ✨ Allow writing to Zarr via xarray_tensorstore._TensorStoreAdapter#10
copybara-service[bot] merged 17 commits intogoogle:mainfrom
rhoadesScholar:main

Conversation

@rhoadesScholar
Copy link
Copy Markdown
Contributor

This commit adds a setitem method to xarray_tensorstore._TensorStoreAdapter, allowing users to write data to a Zarr opened via this adapter.

Should close Issue #5

This commit adds a __setitem__ method to xarray_tensorstore._TensorStoreAdapter, allowing users to write data to a Zarr opened via this adapter.

Should close Issue google#5
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Oct 30, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@rhoadesScholar
Copy link
Copy Markdown
Contributor Author

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Done. Ready or review @shoyer

@shoyer
Copy link
Copy Markdown
Contributor

shoyer commented Nov 4, 2024

Hi Jeff, thanks for looking into this!

Have you tested that this actually works for writing files in Xarray? I'd love to see an example of how that works.

Also, I think you may need to revert edits in the test file -- it looks like this is just an older version of the test suite, without any new additions for this functionality?

Attempted to match original formatting.
@rhoadesScholar
Copy link
Copy Markdown
Contributor Author

rhoadesScholar commented Nov 5, 2024

Have you tested that this actually works for writing files in Xarray? I'd love to see an example of how that works.

Yup. I'm currently using this version of xarray-tensorstore under the hood for reading/writing large imaging datasets. It's a bit involved for an example, I know, but here is where it is executed in my codebase. In this case, self.array is the Xarray DataArray with the xarray-tensorstore backend.

That said, I haven't tried it for other use cases, such as the type of data you have in the minimal usage example in the README.

@rhoadesScholar
Copy link
Copy Markdown
Contributor Author

Also, I think you may need to revert edits in the test file -- it looks like this is just an older version of the test suite, without any new additions for this functionality?

Done. 👍🏼 Thank you!

@shoyer
Copy link
Copy Markdown
Contributor

shoyer commented Nov 5, 2024

Yup. I'm currently using this version of xarray-tensorstore under the hood for reading/writing large imaging datasets. It's a bit involved for an example, I know, but here is where it is executed in my codebase. In this case, self.array is the Xarray DataArray with the xarray-tensorstore backend.

OK, thanks for sharing!

For context, this is a little different from how other storage systems work in Xarray, which do not support modifying an existing Zarr array opened in an Xarray object. Typically, we separate reading and writing into two separate methods, e.g.,

  1. xarray.open_zarr() opens a Zarr store for reading
  2. Dataset.to_zarr() writes an xarray.Dataset to a new Zarr file

There are a few detailed reasons why this this is the case for Xarray, but as far as I can tell, none of them apply to Xarray-TensoreStore. So I think it would be fine to add this functionality here.

That said, because this differs from the default in Xarray, I would suggest making this new behavior (mutability) something that requires an explicit opt-in.

The suggestion would be to add a new mode argument to xarray_tensorstore.open_zarr(), supporting a few options copied from Zarr-Python's open_group:

  • mode='r' read only (must exist)
  • mode='r+' read/write (must exist)

mode='r' would be the default behavior.

Comment thread xarray_tensorstore.py Outdated
Comment thread xarray_tensorstore_test.py Outdated
Comment thread xarray_tensorstore_test.py
@rhoadesScholar
Copy link
Copy Markdown
Contributor Author

rhoadesScholar commented Nov 5, 2024

I think I found an issue with vector indexing in Xarray as a whole, compared to Numpy indexing. Do you happen to know if this is expected?

Here's a minimal example:

import numpy as np
import xarray

key = (np.array([0, 1]), np.array([0, 1]), slice(None))
source_data = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
source = xarray.DataArray(
    source_data,
    dims=("x", "y", "z"),
    name="baz",
)
xarray.testing.assert_equal(source_data[key], source[key])

@shoyer
Copy link
Copy Markdown
Contributor

shoyer commented Nov 5, 2024

I think found an issue with vector indexing in Xarray as a whole, compared to Numpy indexing. Do you happen to know if this is expected?

Yes, this is intentional. Xarray's vectorized rules match keys up with array dimensions:
https://docs.xarray.dev/en/latest/user-guide/indexing.html#vectorized-indexing

Comment thread xarray_tensorstore.py
@rhoadesScholar
Copy link
Copy Markdown
Contributor Author

Sorry for the messy formatting commits...

Comment thread xarray_tensorstore.py Outdated
Comment thread xarray_tensorstore.py Outdated
Comment thread xarray_tensorstore_test.py Outdated
Comment thread xarray_tensorstore_test.py Outdated
Comment thread xarray_tensorstore.py Outdated
@shoyer shoyer removed the pull ready label Nov 6, 2024
@rhoadesScholar
Copy link
Copy Markdown
Contributor Author

Sorry. Forgot to update the test. Should be good now.

Comment thread xarray_tensorstore.py
@copybara-service copybara-service Bot merged commit debcafa into google:main Nov 6, 2024
@shoyer
Copy link
Copy Markdown
Contributor

shoyer commented Nov 6, 2024

thanks!

@rhoadesScholar
Copy link
Copy Markdown
Contributor Author

Thank you!

@shoyer
Copy link
Copy Markdown
Contributor

shoyer commented Nov 8, 2024

This is now available as part of the 0.1.5 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants