Add collective metadata functions to the low level API #2224
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'd like to add the following functions to the low level API:
For context: I'm using h5py on a HPC cluster to process simulation outputs stored as HDF5. The code is distributed over multiple compute nodes which have 128 cores each. In order to make use of all of the cpu cores I run python using mpi4py with one process per core. I'm using collective I/O to read the input simulation data and write out the results.
This puts quite a load on the Lustre parallel file system, and I think it's probably because every process accesses the files independently for metadata operations. I'm hoping that can be alleviated by having HDF5 do all file access in collective mode so that only a few processes per node need to access the file system.
For my use case I just need to put the whole file in collective metadata mode. To do that I've added get/set_all_coll_metadata_ops() and get/set_coll_metadata_write() methods to h5p.PropFAID. The HDF5 documentation says that H5Pset_all_coll_metadata_ops() can also be called on group, dataset, datatype, link, or attribute access property lists. Of those I think h5py only exposes link and dataset access property lists so I also added get/set_all_coll_metadata_ops() to h5p.PropLAID and h5p.PropDAID.