-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose memory
argument for "netcdf4" engine
#6956
base: main
Are you sure you want to change the base?
Conversation
Thanks @ianliu you should be able to do this with |
@dcherian xarray/xarray/backends/netCDF4_.py Lines 534 to 552 in 434f9e8
|
Ah thanks. I always forget that. Can you add a test please? Somewhere in this class would be the right place I think xarray/xarray/tests/test_backends.py Line 1225 in 434f9e8
|
@dcherian I've added a unit test, but outside the class. Is this a problem? I see in the Xarray's contributing page that Xarray is transitioning to more functional tests. |
Hmm, the test doesn't pass like this, it is complaining that |
2033902
to
a107fb6
Compare
@dcherian can you review my changes? I think the PR is done now. |
I've just committed a change that also allows one to save a netcdf file to memory, like so: ds = xr.Dataset({ "v": xr.DataArray(data=range(10)) })
buf = ds.to_netcdf(engine="netcdf4")
assert xr.open_dataset("", engine="netcdf4", memory=buf).equals(ds) |
@@ -221,7 +221,7 @@ def close(self, needs_lock=True): | |||
default = None | |||
file = self._cache.pop(self._key, default) | |||
if file is not None: | |||
file.close() | |||
return file.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to netCDF4's API, an in-memory dataset binary data is returned upon closing it. If you look at the diff-chunk over this one, I use the return value of close()
to get the bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reupping this question.
@ianliu - are you interested in finishing up this PR? From my perspective, this seems like a useful feature that would be nice to get in to xarray. |
Hi @jhamman ! I'm willing to finish it, but don't know whats really missing. I guess I'll rebase with main and fix the failing tests. But I would like a review on the second commit, since it adds more profound changes to the internal API's, such as returning a value from Also, the logic on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, including a suggestion for how to fix the failing test.
@@ -221,7 +221,7 @@ def close(self, needs_lock=True): | |||
default = None | |||
file = self._cache.pop(self._key, default) | |||
if file is not None: | |||
file.close() | |||
return file.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reupping this question.
buf = ds.close() | ||
with open_dataset("dummy.nc", engine="netcdf4", memory=buf) as ds: | ||
assert all(ds.x == x) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re the test failures below in test_engine
, I think you can remove the pytest.raises
for the case without a filename now.
9ee6abf
to
8a4b3be
Compare
@jhamman sorry for the delay. So, the The return statement is there because the netCDF4 API returns the byte array upon closing the dataset. See the API docs here: https://unidata.github.io/netcdf4-python/#in-memory-diskless-datasets |
@jhamman I think this PR is OK now, what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Some type hints would be nice to have :)
This commit exposes the `memory` argument for the "netcdf4" engine, allowing one to create a netcdf dataset from a memory buffer, like so: ```python buffer: bytes = ... ds = xr.open_dataset("", engine="netcdf4", memory=buffer) ``` This commit also adds a unit test for the added feature.
The error of mypy is real, but unfortunately this is a design error on our side, which is out of scope for this PR.
so simply add a |
Adds support to save a netcdf to memory using the "netcdf4" engine: ```python ds = xr.Dataset({ "v": xr.DataArray(data=range(10)) }) buf = ds.to_netcdf(engine="netcdf4") assert xr.open_dataset("", engine="netcdf4", memory=buf).equals(ds) ```
This commit exposes the
memory
argument for the "netcdf4" engine,allowing one to create a netcdf dataset from a memory buffer, like so:
whats-new.rst
api.rst