Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add H5Pget_meta_block_size and H5Pset_meta_block_size support #2106

Merged
merged 14 commits into from
Jun 23, 2022

Conversation

mkitti
Copy link
Contributor

@mkitti mkitti commented May 29, 2022

Here we add support for H5Pget_meta_block_size and H5Pset_meta_block_size.

h5py.File gets a new meta_block_size keyword argument that defaults to None.
h5py.File also gets a new property meta_block_size.

This feature sets the minimum meta block size that the HDF5 will allocate. This can be used to regularize the size of the header and the space between datasets. Setting a large meta block size can consolidate the meta data into a single large header. Setting a small meta block size can minimize the space needed for the HDF5 header.

Note that this influences a file access property. The value is not stored in the file.

In [1]: import h5py, os

In [2]: with h5py.File(
   ...:     "test.h5","w",
   ...: ) as h5f:
   ...:     h5f["test"] = 1
   ...:     print(h5f["test"].id.get_offset())
   ...: print(os.path.getsize("test.h5"))
2048
2056

In [4]: with h5py.File(
   ...:     "test.h5","w",
   ...:     meta_block_size=4096
   ...: ) as h5f:
   ...:     h5f["test"] = 1
   ...:     print(h5f["test"].id.get_offset())
   ...: print(os.path.getsize("test.h5"))
4096
4104

In [6]: with h5py.File(
   ...:     "test.h5","w",
   ...:     libver="v108",
   ...:     meta_block_size=512
   ...: ) as h5f:
   ...:     h5f["test"] = 1
   ...:     print(h5f["test"].id.get_offset())
   ...: print(os.path.getsize("test.h5"))
512
520

@codecov
Copy link

codecov bot commented May 29, 2022

Codecov Report

Merging #2106 (dd48d41) into master (859788e) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2106      +/-   ##
==========================================
+ Coverage   90.00%   90.02%   +0.02%     
==========================================
  Files          17       17              
  Lines        2350     2357       +7     
==========================================
+ Hits         2115     2122       +7     
  Misses        235      235              
Impacted Files Coverage Δ
h5py/_hl/files.py 90.63% <100.00%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 859788e...dd48d41. Read the comment docs.

@takluyver
Copy link
Member

Thanks, this looks good.

I spotted that in the docs you seem to use a mixture of 'metadata' and 'meta data' - let's make this consistent. 'Metadata' is the more familiar form for me, and this is also how it's spelled in the HDF5 docs we link to.

h5py/tests/test_file2.py Outdated Show resolved Hide resolved
@mkitti
Copy link
Contributor Author

mkitti commented Jun 15, 2022

I spotted that in the docs you seem to use a mixture of 'metadata' and 'meta data' - let's make this consistent. 'Metadata' is the more familiar form for me, and this is also how it's spelled in the HDF5 docs we link to.

Done in 94fd41e and e35d2be

@mkitti
Copy link
Contributor Author

mkitti commented Jun 15, 2022

I'm not sure what to make of the Travis error.

@mkitti
Copy link
Contributor Author

mkitti commented Jun 18, 2022

@takluyver Thanks for the previous review. The docs have been fixed. Is there anything else to do here?

self.mktemp(), 'w',
meta_block_size=meta_block_size
) as f:
self.assertTrue(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably superfluous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in ae9b8d4

) as f:
self.assertTrue(f)
f["test"] = 5
self.assertTrue(f.meta_block_size) == meta_block_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertTrue(f.meta_block_size) == meta_block_size
self.assertEqual(f.meta_block_size, meta_block_size)

And likewise for the other assertions, otherwise they're not really asserting what they look like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I fixed this in ae9b8d4

@takluyver
Copy link
Member

I'm not sure what happened on Travis either, but I re-ran the job and it succeeded this time.

Copy link
Contributor Author

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assertGreaterEqual is better for the test for HDF5 1.8 compat

h5py/tests/test_file2.py Outdated Show resolved Hide resolved
Comment on lines 297 to 301
@pytest.mark.skipif(h5py.version.hdf5_version_tuple < (1, 10, 2),
reason="HDF5 header became smaller in version v1.8")
def test_file_create_with_meta_block_size_libver(self):
meta_block_size = 512
libver = "v108"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might get rid of this test entirely - h5py tests shouldn't depend on the precise details of how HDF5 allocates space in its files from version to version. I think the test above with assertGreaterEqual is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option exists to manipulate how HDF5 allocates space in the file. That's the point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default meta_block_size is 2048 bytes

Thus we need two tests here:

  1. One greater than the default meta_block_size: 4096 bytes
  2. One less than the default meta_block_size: 512 bytes

I have removed version restrictions on both tests. They both now use assertGreaterEqual so that both tests will pass on HDF5 1.8. I added an upper bound on the 512 byte test to differentiate it from the situation when the default meta_block_size is used.

@mkitti
Copy link
Contributor Author

mkitti commented Jun 22, 2022

I modified the test so that both tests now apply to HDF5 1.8. They both now use assertGreaterEqual.

I added comments detailing that we can expect equality for these tests for HDF5 1.10 and greater.

I have kept both tests in because they test values that are greater than and less than the default meta_block_size of 2048 as documented in https://portal.hdfgroup.org/display/HDF5/H5P_SET_META_BLOCK_SIZE .

I added comments detailing the default size in the tests.

@takluyver
Copy link
Member

Thanks @mkitti !

@takluyver takluyver merged commit 6ffb48b into h5py:master Jun 23, 2022
@takluyver takluyver added this to the 3.8 milestone Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants