Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-platform in-memory serialization of netcdf4 (like the current scipy-based dumps) #23

Open
ebrevdo opened this issue Feb 26, 2014 · 11 comments

Comments

@ebrevdo
Copy link
Contributor

ebrevdo commented Feb 26, 2014

It would be nice to create in-memory netCDF4 objects. This is difficult with the netCDF4 library, which requires a filename (possibly one that it can mmap, but probably not, based on its opendap documentation).

One solution is to call os.mkfifo (in *nix) or its windows equivalent (if the library is available) using tempfile.mktemp as the path. Pass this to the netCDF4 object. dumps() is equivalent to calling sync, close, reading from the pipe, then deleting the result.

We may actually be able to use the same functionality in reverse for creating a netCDF4 object from a StringIO.

@ebrevdo ebrevdo self-assigned this Feb 26, 2014
@shoyer
Copy link
Member

shoyer commented Feb 26, 2014

Another option is to add an HDF5 backend with pytables. @ToddSmall has a demo script somewhere that shows how you can pass around in-memory HDF5 objects between processes.

@akleeman
Copy link
Contributor

Another similar option would be to use in-memory HDF5 objects for which Todd Small found an option:

Writing to a string:

h5_file = tables.open_file("in-memory", title=my_title, mode="w",   12
                               driver="H5FD_CORE", driver_core_backing_store=0)
... [add variables] ...
image = h5_file.get_file_image()

Reading from a string

h5_file = tables.open_file("in-memory", mode="r", driver="H5FD_CORE",
                               driver_core_image=image,
                               driver_core_backing_store=0)

@ebrevdo
Copy link
Contributor Author

ebrevdo commented Feb 26, 2014

Looks like this may be the only option. Based on my tests, netCDF4 is strongly antithetical to any kind of streams/piped buffers. If we go the hdf5 route, we'd have to reimplement the CDM/netcdf4 on top of hdf5, no?

@shoyer
Copy link
Member

shoyer commented Feb 26, 2014

HDF5 supports homogeneous n-dimensional arrays and metadata, which in
principle should be all we need. Actually, under the covers netCDF4 is
HDF5. But yes, we would have to do some work to reinvent this.

On Wed, Feb 26, 2014 at 2:32 PM, ebrevdo notifications@github.com wrote:

Looks like this may be the only option. Based on my tests, netCDF4 is
strongly antithetical to any kind of streams/piped buffers. If we go the
hdf5 route, we'd have to reimplement the CDM/netcdf4 on top of hdf5, no?

Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/23#issuecomment-36186205
.

@shoyer
Copy link
Member

shoyer commented Apr 8, 2015

Just wrote a little library to do netCDF4 via h5py: https://github.com/shoyer/h5netcdf

Unfortunately h5py still can't do in-memory file images (h5py/h5py#552). But it does give an alternative way to read/write netCDF4 without going via the Unidata libraries. There is experimental support for engine='h5netcdf' in my dask PR: #381

pytables was not a viable option because it can't read or write HDF5 dimension scales, which are necessary for dimensions in netCDF4 files.

@max-sixty
Copy link
Collaborator

In an effort to reduce the issue backlog, I'll close this, but please reopen if you disagree

@shoyer
Copy link
Member

shoyer commented Jan 15, 2019

This is actually finally possible to support now with h5py, which as of the latest release supports reading/writing to file-like objects in Python.

@stale
Copy link

stale bot commented Dec 20, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Dec 20, 2020
@pikulmar
Copy link

Still an issue, as far as I can tell. Possibly duplicated in #3372.

@stale stale bot removed the stale label Feb 10, 2021
keewis pushed a commit to keewis/xarray that referenced this issue Jan 17, 2024
* re-enable mypy

* ignored untyped imports

* draft implementation of a TreeNode class which stores children in a dict

* separate path-like access out into mixin

* pseudocode for node getter

* basic idea for a path-like object which inherits from pathlib

* pass type checking

* implement attach

* consolidate tree classes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* passes some basic family tree tests

* frozen children

* passes all basic family tree tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copied iterators code over from anytree

* get nodes with path-like syntax

* relative path method

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set and get node methods

* copy anytree iterators

* add anytree license

* change iterator import

* copy anytree's string renderer

* renderer

* refactored treenode to use .get

* black

* updated datatree tests to match new path API

* moved io tests to their own file

* reimplemented getitem in terms of .get

* reimplemented setitem in terms of .update

* remove anytree dependency

* from_dict constructor

* string representation of tree

* fixed tree diff

* fixed io

* removed cheeky print statements

* fixed isomorphism checking

* fixed map_over_subtree

* removed now-uneeded utils.py compatibility functions

* fixed tests for mapped dataset api methods

* updated API docs

* reimplement __setitem__ in terms of _set

* fixed bug by ensuring name of child node is changed to match key it is stored under

* updated docs

* added whats-new, and put all changes from this PR in it

* added summary of previous versions

* remove outdated ._add_child method

* fix some of the easier typing errors

* generic typevar for tree in TreeNode

* datatree.py almost passes type checking

* ignore remaining typing errors for now

* fix / ignore last few typing errors

* remove spurious type check

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@jhamman
Copy link
Member

jhamman commented Jul 30, 2024

This must be the one of the oldest active issues in the project!

Curious if folks have thoughts on the current state here? Netcdf4-python has a some support for in-memory datasets but we are not exposing them with xarray (as far as I can tell). How about h5netcdf?

@kmuehlbauer
Copy link
Contributor

There is effort in #6956 to enable this for netcdf4.

This is possible for h5py/h5netcdf, too. See https://docs.h5py.org/en/stable/high/file.html#file-drivers. xarray has just to enable this via kwarg driver="core". Not sure how much effort this will take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants