Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose memory argument for "netcdf4" engine #6956

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

ianliu
Copy link

@ianliu ianliu commented Aug 26, 2022

This commit exposes the memory argument for the "netcdf4" engine,
allowing one to create a netcdf dataset from a memory buffer, like so:

buffer: bytes = ...
ds = xr.open_dataset("", engine="netcdf4", memory=buffer)

@dcherian
Copy link
Contributor

Thanks @ianliu you should be able to do this with xr.open_dataset(..., backend_kwargs={"memory": buffer})

@ianliu
Copy link
Author

ianliu commented Aug 26, 2022

@dcherian backend_kwargs doesn't work either, because the NetCDF4BackendEntrypoint.open_dataset function doesn't accept a memory argument:

def open_dataset(
self,
filename_or_obj,
mask_and_scale=True,
decode_times=True,
concat_characters=True,
decode_coords=True,
drop_variables=None,
use_cftime=None,
decode_timedelta=None,
group=None,
mode="r",
format="NETCDF4",
clobber=True,
diskless=False,
persist=False,
lock=None,
autoclose=False,
):

@dcherian
Copy link
Contributor

Ah thanks. I always forget that.

Can you add a test please? Somewhere in this class would be the right place I think

class NetCDF4Base(CFEncodedBase):

@ianliu
Copy link
Author

ianliu commented Aug 26, 2022

@dcherian I've added a unit test, but outside the class. Is this a problem? I see in the Xarray's contributing page that Xarray is transitioning to more functional tests.

@ianliu
Copy link
Author

ianliu commented Aug 26, 2022

Hmm, the test doesn't pass like this, it is complaining that nc4 is not defined. I will put the test inside the class.

@ianliu ianliu force-pushed the main branch 2 times, most recently from 2033902 to a107fb6 Compare August 26, 2022 17:37
@ianliu
Copy link
Author

ianliu commented Aug 26, 2022

@dcherian can you review my changes? I think the PR is done now.

@ianliu
Copy link
Author

ianliu commented Aug 30, 2022

I've just committed a change that also allows one to save a netcdf file to memory, like so:

ds = xr.Dataset({ "v": xr.DataArray(data=range(10)) })
buf = ds.to_netcdf(engine="netcdf4")
assert xr.open_dataset("", engine="netcdf4", memory=buf).equals(ds)

@@ -221,7 +221,7 @@ def close(self, needs_lock=True):
default = None
file = self._cache.pop(self._key, default)
if file is not None:
file.close()
return file.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to netCDF4's API, an in-memory dataset binary data is returned upon closing it. If you look at the diff-chunk over this one, I use the return value of close() to get the bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reupping this question.

@jhamman
Copy link
Member

jhamman commented Jan 31, 2023

@ianliu - are you interested in finishing up this PR? From my perspective, this seems like a useful feature that would be nice to get in to xarray.

@ianliu
Copy link
Author

ianliu commented Jan 31, 2023

Hi @jhamman ! I'm willing to finish it, but don't know whats really missing. I guess I'll rebase with main and fix the failing tests.

But I would like a review on the second commit, since it adds more profound changes to the internal API's, such as returning a value from CachingFileManager.close function. This was needed because NetCDF4 returns the bytearray upon close.

Also, the logic on the to_netcdf function is starting to get a little bit convoluted, would be nice to get a review there as well.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, including a suggestion for how to fix the failing test.

@@ -221,7 +221,7 @@ def close(self, needs_lock=True):
default = None
file = self._cache.pop(self._key, default)
if file is not None:
file.close()
return file.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reupping this question.

xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/netCDF4_.py Outdated Show resolved Hide resolved
buf = ds.close()
with open_dataset("dummy.nc", engine="netcdf4", memory=buf) as ds:
assert all(ds.x == x)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the test failures below in test_engine, I think you can remove the pytest.raises for the case without a filename now.

@ianliu
Copy link
Author

ianliu commented Feb 12, 2023

@jhamman sorry for the delay. So, the return that you commented "reupping this question", I did respond the first code review but forgot to submit my answer... I will respond here as well, because I find all these code reviews confusing.

The return statement is there because the netCDF4 API returns the byte array upon closing the dataset. See the API docs here: https://unidata.github.io/netcdf4-python/#in-memory-diskless-datasets

@ianliu
Copy link
Author

ianliu commented Feb 27, 2023

@jhamman I think this PR is OK now, what do you think?

@headtr1ck headtr1ck requested a review from jhamman March 10, 2023 20:45
Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Some type hints would be nice to have :)

xarray/backends/netCDF4_.py Outdated Show resolved Hide resolved
xarray/backends/netCDF4_.py Show resolved Hide resolved
xarray/tests/test_backends.py Outdated Show resolved Hide resolved
xarray/tests/test_backends.py Outdated Show resolved Hide resolved
This commit exposes the `memory` argument for the "netcdf4" engine,
allowing one to create a netcdf dataset from a memory buffer, like so:

```python
buffer: bytes = ...
ds = xr.open_dataset("", engine="netcdf4", memory=buffer)
```

This commit also adds a unit test for the added feature.
@headtr1ck
Copy link
Collaborator

The error of mypy is real, but unfortunately this is a design error on our side, which is out of scope for this PR.
see the mypy message:

xarray/backends/netCDF4_.py:563: error: Signature of "open_dataset" incompatible with supertype "BackendEntrypoint" [override]

so simply add a # type: ignore[override] at the end of this line.

Adds support to save a netcdf to memory using the "netcdf4" engine:

```python
ds = xr.Dataset({ "v": xr.DataArray(data=range(10)) })
buf = ds.to_netcdf(engine="netcdf4")
assert xr.open_dataset("", engine="netcdf4", memory=buf).equals(ds)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose "memory" argument to the "netcdf4" engine
6 participants