New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support CORE initialization from in-memory images #552

Closed
dnelson86 opened this Issue Mar 17, 2015 · 12 comments

Comments

Projects
None yet
8 participants
@dnelson86
Copy link
Contributor

dnelson86 commented Mar 17, 2015

Hi,

A feature request to support operations on in-memory HDF5 image files.
In particular, to be able to initialize the CORE driver with an in-memory buffer

H5Pset_file_image(fapl_id, buf, buf_len);

These are 1.8.9+ features.

See page 24 of: http://www.hdfgroup.org/HDF5/doc/Advanced/FileImageOperations/HDF5FileImageOperations.pdf

Thanks!

@andrewcollette

This comment has been minimized.

Copy link
Member

andrewcollette commented Mar 17, 2015

No objection from me! A community-supplied PR would be necessary, though.

@dnelson86

This comment has been minimized.

Copy link
Contributor

dnelson86 commented Mar 27, 2015

@gheber

This comment has been minimized.

Copy link

gheber commented Mar 28, 2015

+1 from me. I just thought about that very function the other day playing w/ PySpark. How can I help?

@shoyer

This comment has been minimized.

Copy link

shoyer commented Apr 8, 2015

I am also very interested in this feature, but I'm not familiar enough with C to write the PR myself.

Note that pytables has support for this, which might might be a helpful starting point:
https://pytables.github.io/cookbook/inmemory_hdf5_files.html#memory-images-of-hdf5-files

That final example seems to be missing the argument driver_core_image=image:
pydata/xarray#23 (comment)

@nevion

This comment has been minimized.

Copy link
Contributor

nevion commented May 1, 2015

I had this ready some time ago, but haven't had time to address what we wanted to do for the future:
#460

Been using it roughly every day since so it's definitely working for me.

What we wanted to do was proper buffer handling though which required some changes to the callbacks the HDF group uses. In my experience to date, it is quite the crap shoot getting them to accept patches and I haven't had time to even make them.

@gheber

This comment has been minimized.

Copy link

gheber commented May 1, 2015

What kind of changes would you like to see? I'd be happy to shepherd things on this end.

@nevion

This comment has been minimized.

Copy link
Contributor

nevion commented May 1, 2015

@gheber I can expand more concretely if needed but basically we need to be able to control the lifetime management of a python c buffer object and have that be part of the H5LT functions/structs. It is not sufficient to assume we can just pass the file image alone since we need to reference count on the python object to make sure the pointer the buffer object/file image is working with stays valid. Copying the image works for some people but given people using HDF5 are often using significant chunks (close to 100%) of ram, copying is not an option that works for many quite often, myself included.

So this means more callbacks and pointers need to be added to the H5LT.c file image functions/structs or we need to duplicate most of those lines to work with python. This problem will happen exactly the same for Java, Ruby, Matlab (provided they updated), R, octave, et al so it would be good to make an easily pluggable H5LT.c so we don't need to copy and paste it N times to get a working efficient solution.

@nevion

This comment has been minimized.

Copy link
Contributor

nevion commented Jul 29, 2015

bump - @gheber - anything? This is a whole lot less work if the H5LT has callbacks for some external reference counting/housekeeping...

FYI just rebased my image support topic fork on 2.5.0: https://github.com/nevion/h5py If you need image support, this is the way to get it for now but I wish we'd hurry up and make everything proper in H5LT so we can make the long term upstream way.

@rcatwood

This comment has been minimized.

Copy link

rcatwood commented Jan 7, 2016

This sounds useful for me, for managing operations on 'very large' 3- or 4-dimensional images, but I am not sure I understand it. Would it transparently be using mmap as in C , such that a process manipulating a subsection of the image would have a mmap of the file but then only read the part of the file that is actually operated upon, thus avoiding filling up the memory unnecessarily, where 2 or 4 processes on each computer node are operating on different parts of the same file?

@nevion

This comment has been minimized.

Copy link
Contributor

nevion commented Jan 7, 2016

@rcatwood Nope, just regular old buffers/arrays. Of course python can make an mmapped buffer work without HDF explicitly knowing about it.

@shoyer

This comment has been minimized.

Copy link

shoyer commented Dec 20, 2018

I think we can close this thanks to #1061 (available in the 2.9.0 release), which adds supports for reading/writing Python file-like objects.

@tacaswell tacaswell added this to the 2.9.0 milestone Dec 20, 2018

@tacaswell

This comment has been minimized.

Copy link
Member

tacaswell commented Dec 20, 2018

I think this is more directly addressed by #1075 but in either case, this can be closed.

@tacaswell tacaswell closed this Dec 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment