New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MultiManager and low level H5Dread_multi call #2351
base: master
Are you sure you want to change the base?
Conversation
h5py/h5d.pyx
Outdated
IF HDF5_VERSION >= (1, 14, 0): | ||
@with_phil | ||
def read_multi(count, list dataset_ids, list mspace_ids, list fspace_ids, | ||
list type_ids, list bufs not None, PropID dxpl=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters are lists instead of numpy arrays to avoid an issue with the destination buffers.
Specifically, the individual destination buffer for each dset read should be a numpy array. If bufs
were itself also a numpy array containing these arrays, then it would copy those arrays instead of pointing to the same memory, and the original destination arrays the user created wouldn't get populated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense.
The Python function probably shouldn't take count
as the first parameter - it's necessary in C to tell you how many items you can read after a pointer, but lists have a length.
But if that's the case, I'd probably do a check up front that the lists are all the same length, and fail with a clear error if not.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2351 +/- ##
==========================================
+ Coverage 89.53% 89.80% +0.26%
==========================================
Files 17 17
Lines 2380 2570 +190
==========================================
+ Hits 2131 2308 +177
- Misses 249 262 +13 ☔ View full report in Codecov by Sentry. |
As |
h5py/tests/test_h5d_direct_chunk.py
Outdated
dataset = filehandle.create_dataset("data", shape, | ||
dtype=dt, data=data_in) | ||
|
||
self.assertTrue(numpy.array_equal(data_in, dataset[...])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertTrue(numpy.array_equal(data_in, dataset[...])) | |
numpy.testing.assert_array_equal(data_in, dataset[...]) |
And throughout. Specific assert methods like this can give you more useful information if the thing they're checking fails, whereas assertTrue
can only tell you it was not true. E.g.
AssertionError:
Arrays are not equal
(shapes (5,), (6,) mismatch)
x: array([0, 1, 2, 3, 4])
y: array([0, 1, 2, 3, 4, 5])
h5py/tests/test_h5d_direct_chunk.py
Outdated
|
||
@ut.skipIf(h5py.version.hdf5_version_tuple < (1, 14, 0), | ||
"read_multi requires HDF5 >= 1.14.0") | ||
class TestReadMulti(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the file test_h5d_direct_chunk.py
, the rest of which is all about the direct chunk read/write methods. I think this file is small enough that we may as well just rename it to test_h5d.py
.
@@ -151,6 +151,63 @@ def open(ObjectID loc not None, char* name, PropID dapl=None): | |||
""" | |||
return DatasetID(H5Dopen(loc.id, name, pdefault(dapl))) | |||
|
|||
IF HDF5_VERSION >= (1, 14, 0): | |||
@with_phil | |||
def rw_multi(list dataset_ids, list mspace_ids, list fspace_ids, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be broken up into read_multi
and write_multi
for readability, but they would be identical except for one parameter passed to dset_rw_multi
.
I have an implementation of a high-level |
Also expanded test cases
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
This implements the function
read_multi
inh5d.pyx
, in order to call HDF5'sH5Dread_multi
. Because it is meant to work on multiple datasets, I figured it made more sense not to implement it as a method onDatasetID
.To avoid duplicating type conversion handling, I modified
dset_rw
to handle both the single and multi dataset cases.After review, I plan to implement a similar function for
H5Dwrite_multi
, and then high-level calls for both, perhaps through aMultiManager
class similar toAttributeManager
.Addresses #2298