API: raise if kwds are ignored in require_dataset #897

tacaswell · 2017-06-17T02:17:05Z

This raises if user kwds will be silently ignored.

This is prompted by a mailing list question about automatically resizing when I noticed the new data is simply ignored!

A better solution may be to check all of the kwargs?

A less disruptive change would be to put a huge warning in the docstring.

This raises if user kwds will be silently ignored.

aragilar · 2017-06-20T11:43:14Z

I feel this should be a warning rather than an exception, as I'm not sure why someone would use require_dataset with kwargs unless they had some expectation that the dataset existed (otherwise why not just check whether the dataset existed with the correct settings, and create it with create_dataset if necessary).

vasole · 2017-06-20T12:06:40Z

@aragilar

Perhaps as a way to make sure the existing dataset is compatible with the one intended to be used/extended/overwritten?

I have to acknowledge I did not understand the idea behind the PR until reading your post. If the idea is to make sure the required dataset is compatible with what it was requested and therefore expected by the user, then I think the idea to raise an exception makes perfect sense.

vasole · 2017-06-20T12:07:32Z

BTW, what is the behavior of HDF5 itself?

aragilar · 2017-06-20T13:35:32Z

@vasole require_dataset is an h5py thing only, so no guidance from HDF5.

If require_dataset had just been introduced, then I could see there being an intent to validate the properties of the dataset, which we could explicitly document (and so avoid confusion). The problem is require_dataset has been part of h5py for at least 6 years, and seems to lack an example in the docs, so it's quite likely this could break code (whether that code is behaving correctly or not is the question). We could do the validation for the kwargs, which would be consistent with the part of the docstring which says Raises TypeError if an incompatible object already exists, but what about data keyword, under the old system it's ignored, but should we override (which would result in data loss), or raise an error always (consistent with the idea of validating the properties of the dataset), or raise an error (or warn) if the dataset exists.

I feel that warning is a change which avoids breaking anything while also highlighting potential problems, while we decide what the correct behaviour (and intent) should be.

tacaswell · 2018-10-16T02:35:16Z

Punting to next release.

takluyver · 2019-01-10T13:55:16Z

I don't think this makes sense as-is. If you know the dataset shouldn't already exist, you'd use create_dataset. You use require_dataset if the dataset may or may not exist. It would be annoying to have parameters which may cause an error depending on state which the code shouldn't know.

Perhaps instead we could do some combination of:

More checks that an existing dataset matches specified parameters (e.g. maxshape, chunk)
Deprecate parameters which it doesn't make sense to check, or which can't efficiently be checked (data?)

takluyver · 2019-04-25T21:01:43Z

I'm dropping the 2.10 milestone for this, as it hasn't been changed for a long time, and I think there are issues with it as stands.

takluyver · 2019-09-30T19:57:10Z

Do we want to make any backward-incompatible changes to require_dataset for 3.0? E.g. checking more parameters against the existing dataset?

I'm in favour of leaving it as is. Shape and dtype are the fundamental properties of a dataset. If we start checking more properties of existing datasets, I don't think there's any good reason not to check all specified properties, and that sounds like it would be hard to get right - e.g. if you specify gzip compression with the default level, and the existing dataset has a different compression level, should it throw an error? I'd say leave it up to application code to check the specific things it cares about.

tacaswell · 2021-12-14T21:58:14Z

Coming back to this, I suspect the right path forward is start checking the extra kwargs and seeing if they match the settings on the dset and raise if strict==True.

That would still leave the case that started me down this path 5 years ago un-resolved (as with a bit more hind-sight being able to provide a fill dataset that is non-constant may be useful).

takluyver · 2021-12-15T09:58:23Z

I'm inclined to say it's not worth it - it's not clear to me precisely what it's useful to check (e.g. issue #2018, and my last comment), and this PR has hung around for years without generating any enthusiasm. None of the links from other issues are 'this PR would solve the problem you describe'. Maybe we just accept that require_dataset is a kind of scruffy, awkward bit of the API, make sure it's clearly documented and let it be. 🤷

If you do still want to check more inputs, a new strict=True keyword argument would be the only way to do it without breaking the API. (Edit: OK, that specific interface is not the only way. But any new checks should be opt in, so we'd still be maintaining the existing code path.)

tacaswell · 2021-12-15T13:24:25Z

Yeah, I think I'm going to give up on this one 😞

I agree that any changes that would make me happy would be quite backwards incompatible and the work required to make them opt-in is just not worth it.

API: raise if kwds are ignored in require_dataset

c562823

This raises if user kwds will be silently ignored.

tacaswell added this to the 2.8 milestone Aug 20, 2017

tacaswell modified the milestones: 2.8, 2.9 Feb 11, 2018

tacaswell modified the milestones: 2.9, 2.10 Oct 16, 2018

tacaswell mentioned this pull request Nov 13, 2018

Ordered iteration over links and attributes #1098

Merged

takluyver removed this from the 2.10 milestone Apr 25, 2019

aragilar mentioned this pull request Aug 21, 2020

Fix some todos #1646

Open

tacaswell mentioned this pull request Nov 5, 2021

Documentation gives wrong default arguments for require_dataset #1994

Closed

tacaswell mentioned this pull request Dec 14, 2021

Exact shape matching hinders require_dataset usecase #2018

Closed

tacaswell closed this Dec 15, 2021

tacaswell deleted the mnt_raise_on_ignored_kwds branch December 15, 2021 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: raise if kwds are ignored in require_dataset #897

API: raise if kwds are ignored in require_dataset #897

tacaswell commented Jun 17, 2017

aragilar commented Jun 20, 2017 •

edited

vasole commented Jun 20, 2017

vasole commented Jun 20, 2017

aragilar commented Jun 20, 2017 •

edited

tacaswell commented Oct 16, 2018

takluyver commented Jan 10, 2019

takluyver commented Apr 25, 2019

takluyver commented Sep 30, 2019

tacaswell commented Dec 14, 2021

takluyver commented Dec 15, 2021 •

edited

tacaswell commented Dec 15, 2021

API: raise if kwds are ignored in require_dataset #897

API: raise if kwds are ignored in require_dataset #897

Conversation

tacaswell commented Jun 17, 2017

aragilar commented Jun 20, 2017 • edited

vasole commented Jun 20, 2017

vasole commented Jun 20, 2017

aragilar commented Jun 20, 2017 • edited

tacaswell commented Oct 16, 2018

takluyver commented Jan 10, 2019

takluyver commented Apr 25, 2019

takluyver commented Sep 30, 2019

tacaswell commented Dec 14, 2021

takluyver commented Dec 15, 2021 • edited

tacaswell commented Dec 15, 2021

aragilar commented Jun 20, 2017 •

edited

aragilar commented Jun 20, 2017 •

edited

takluyver commented Dec 15, 2021 •

edited