Cross-platform in-memory serialization of netcdf4 (like the current scipy-based dumps) #23

ebrevdo · 2014-02-26T22:15:35Z

It would be nice to create in-memory netCDF4 objects. This is difficult with the netCDF4 library, which requires a filename (possibly one that it can mmap, but probably not, based on its opendap documentation).

One solution is to call os.mkfifo (in *nix) or its windows equivalent (if the library is available) using tempfile.mktemp as the path. Pass this to the netCDF4 object. dumps() is equivalent to calling sync, close, reading from the pipe, then deleting the result.

We may actually be able to use the same functionality in reverse for creating a netCDF4 object from a StringIO.

shoyer · 2014-02-26T22:18:28Z

Another option is to add an HDF5 backend with pytables. @ToddSmall has a demo script somewhere that shows how you can pass around in-memory HDF5 objects between processes.

akleeman · 2014-02-26T22:21:01Z

Another similar option would be to use in-memory HDF5 objects for which Todd Small found an option:

Writing to a string:

h5_file = tables.open_file("in-memory", title=my_title, mode="w",   12
                               driver="H5FD_CORE", driver_core_backing_store=0)
... [add variables] ...
image = h5_file.get_file_image()

Reading from a string

h5_file = tables.open_file("in-memory", mode="r", driver="H5FD_CORE",
                               driver_core_image=image,
                               driver_core_backing_store=0)

ebrevdo · 2014-02-26T22:32:06Z

Looks like this may be the only option. Based on my tests, netCDF4 is strongly antithetical to any kind of streams/piped buffers. If we go the hdf5 route, we'd have to reimplement the CDM/netcdf4 on top of hdf5, no?

shoyer · 2014-02-26T22:38:42Z

HDF5 supports homogeneous n-dimensional arrays and metadata, which in
principle should be all we need. Actually, under the covers netCDF4 is
HDF5. But yes, we would have to do some work to reinvent this.

On Wed, Feb 26, 2014 at 2:32 PM, ebrevdo notifications@github.com wrote:

Looks like this may be the only option. Based on my tests, netCDF4 is
strongly antithetical to any kind of streams/piped buffers. If we go the
hdf5 route, we'd have to reimplement the CDM/netcdf4 on top of hdf5, no?

Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/23#issuecomment-36186205
.

shoyer · 2015-04-08T02:07:31Z

Just wrote a little library to do netCDF4 via h5py: https://github.com/shoyer/h5netcdf

Unfortunately h5py still can't do in-memory file images (h5py/h5py#552). But it does give an alternative way to read/write netCDF4 without going via the Unidata libraries. There is experimental support for engine='h5netcdf' in my dask PR: #381

pytables was not a viable option because it can't read or write HDF5 dimension scales, which are necessary for dimensions in netCDF4 files.

max-sixty · 2019-01-14T21:12:07Z

In an effort to reduce the issue backlog, I'll close this, but please reopen if you disagree

shoyer · 2019-01-15T06:27:00Z

This is actually finally possible to support now with h5py, which as of the latest release supports reading/writing to file-like objects in Python.

stale · 2020-12-20T14:00:16Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

pikulmar · 2021-02-10T13:56:24Z

Still an issue, as far as I can tell. Possibly duplicated in #3372.

* re-enable mypy * ignored untyped imports * draft implementation of a TreeNode class which stores children in a dict * separate path-like access out into mixin * pseudocode for node getter * basic idea for a path-like object which inherits from pathlib * pass type checking * implement attach * consolidate tree classes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * passes some basic family tree tests * frozen children * passes all basic family tree tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copied iterators code over from anytree * get nodes with path-like syntax * relative path method * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set and get node methods * copy anytree iterators * add anytree license * change iterator import * copy anytree's string renderer * renderer * refactored treenode to use .get * black * updated datatree tests to match new path API * moved io tests to their own file * reimplemented getitem in terms of .get * reimplemented setitem in terms of .update * remove anytree dependency * from_dict constructor * string representation of tree * fixed tree diff * fixed io * removed cheeky print statements * fixed isomorphism checking * fixed map_over_subtree * removed now-uneeded utils.py compatibility functions * fixed tests for mapped dataset api methods * updated API docs * reimplement __setitem__ in terms of _set * fixed bug by ensuring name of child node is changed to match key it is stored under * updated docs * added whats-new, and put all changes from this PR in it * added summary of previous versions * remove outdated ._add_child method * fix some of the easier typing errors * generic typevar for tree in TreeNode * datatree.py almost passes type checking * ignore remaining typing errors for now * fix / ignore last few typing errors * remove spurious type check Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

ebrevdo added the enhancement label Feb 26, 2014

ebrevdo self-assigned this Feb 26, 2014

shoyer added the backends label Jul 15, 2014

shoyer mentioned this issue Apr 8, 2015

support CORE initialization from in-memory images h5py/h5py#552

Closed

ebrevdo removed their assignment Jan 4, 2016

synaptic mentioned this issue Aug 29, 2016

Support BytesIO/Stream-like objects instead of just filepath jjhelmus/pyfive#2

Closed

max-sixty closed this as completed Jan 14, 2019

shoyer reopened this Jan 15, 2019

euyuil mentioned this issue Jul 16, 2020

Remote writing NETCDF4 files to Amazon S3 #2995

Open

stale bot added the stale label Dec 20, 2020

stale bot removed the stale label Feb 10, 2021

kmuehlbauer mentioned this issue Sep 16, 2023

Dataset.to_netcdf netcdf4 bytes #3372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-platform in-memory serialization of netcdf4 (like the current scipy-based dumps) #23

Cross-platform in-memory serialization of netcdf4 (like the current scipy-based dumps) #23

ebrevdo commented Feb 26, 2014

shoyer commented Feb 26, 2014

akleeman commented Feb 26, 2014

ebrevdo commented Feb 26, 2014

shoyer commented Feb 26, 2014

shoyer commented Apr 8, 2015

max-sixty commented Jan 14, 2019

shoyer commented Jan 15, 2019

stale bot commented Dec 20, 2020

pikulmar commented Feb 10, 2021

Cross-platform in-memory serialization of netcdf4 (like the current scipy-based dumps) #23

Cross-platform in-memory serialization of netcdf4 (like the current scipy-based dumps) #23

Comments

ebrevdo commented Feb 26, 2014

shoyer commented Feb 26, 2014

akleeman commented Feb 26, 2014

ebrevdo commented Feb 26, 2014

shoyer commented Feb 26, 2014

shoyer commented Apr 8, 2015

max-sixty commented Jan 14, 2019

shoyer commented Jan 15, 2019

stale bot commented Dec 20, 2020

pikulmar commented Feb 10, 2021