Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation gives wrong default arguments for require_dataset #1994

Closed
mcharlou opened this issue Nov 4, 2021 · 7 comments · Fixed by #2108
Closed

Documentation gives wrong default arguments for require_dataset #1994

mcharlou opened this issue Nov 4, 2021 · 7 comments · Fixed by #2108

Comments

@mcharlou
Copy link

mcharlou commented Nov 4, 2021

The API documentation suggests that arguments shape and dtype are optional (both defaulting to None) for Group.require_dataset, while they are in fact positional (see h5py/_hl/hroup.py:214).

However it would be a very useful feature to be able to require a dataset from data input instead of shape and dtype (similarly to Group.create_dataset).

  • h5py version: 3.5.0
@takluyver
Copy link
Member

I think the downside of allowing grp.require_dataset('xyz', data=arr) is the ambiguity: do you write or check things if the dataset already exists? Are you requiring:

  1. that a dataset exists and contains this data
  2. that a dataset exists - and use the array as a default if creating it
  3. that a dataset exists matching the shape & dtype of the array - and use the array as a default if creating it.

Groups also support the standard Python mapping API, so you've already got grp['xyz'] = arr and grp.setdefault('xyz', arr) as alternatives for options 1 & 2. It seems reasonable to have somewhat more verbose code for option 3.

I agree that the docs should be corrected, though. Do you want to make a PR?

@takluyver
Copy link
Member

Although admittedly grp['xyz'] = arr is not always ideal, because if the dataset already exists it will create a new dataset in the file rather than writing into the existing one. HDF5 doesn't automatically reclaim unused space, so that might make the file bigger than it needs to be.

@mcharlou
Copy link
Author

mcharlou commented Nov 5, 2021

Maybe a look into other languages HDF5 APIs would provide some insight?

I'm actually in the exact case you mention, where I want to write new data into a possibly existing dataset, overwriting if the dataset already exists. grp['xyz'] = arr would work but since this might occur several times I wouldn't want the file to grow bigger every time. Of course I can always check manually for existence, but I think it would make sense to have it as a default behavior for Group.require_dataset with data argument.

I'll let someone who's more familiar than me with the documentation system take care of the PR, I just created the Issue to report the problem I stumbled across.

@mcharlou
Copy link
Author

mcharlou commented Nov 5, 2021

After testing, grp['xyz'] = arr fails if the dataset already exists:

Traceback (most recent call last):
  File "/root/.vscode-server/extensions/ms-python.python-2021.10.1365161279/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_vars.py", line 420, in evaluate_expression
    compiled = compile_as_eval(expression)
  File "/root/.vscode-server/extensions/ms-python.python-2021.10.1365161279/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_vars.py", line 374, in compile_as_eval
    return compile(_expression_to_evaluate(expression), '<string>', 'eval')
  File "<string>", line 1
    root['test'] = 12.
                 ^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/h5py/_hl/group.py", line 432, in __setitem__
    h5o.link(ds.id, self.id, name, lcpl=lcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)

I'm not sure whether this behavior is expected.

  • Python version: 3.8.12
  • h5py version: 3.5.0
  • OS: Debian 11 in Docker (python:3.8-slim-bullseye)

@tacaswell
Copy link
Member

Where are the docs wrong? The docstring looks correct to me.

I am with @takluyver , adding a data kwarg to require_dataset is too ambiguous (and I have other issues with this method where kwargs are sometimes silently ignored: #897).

@mcharlou
Copy link
Author

mcharlou commented Nov 5, 2021

The docstring itself is fine, but the doc has shape=None and dtype=None as default arguments (here).

You're probably right about the data kwarg, but it's forwarded to Group.create_dataset along with other kwargs if the dataset has to be created anyway.

@tacaswell
Copy link
Member

Odd, we are not using autodoc and the have a copy of the docstring in the rst: https://github.com/h5py/h5py/blob/master/docs/high/group.rst#L385-L399

Changing to auto-doc is a big project, but the quick fix is to edit the rst to remove the default arguments. Looks like this has been wrong for ~forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants