Fix dset.iter_chunks() with selection #2381

Delengowski · 2024-02-16T18:25:43Z

Found a solution that works with other test cases.

The way I saw the problem, we would calculate a chunk_slice that was not a subset of the requested slice. So the I added a check that checks that the calculated chunk slice is a subset of the requested slice.

Delengowski · 2024-02-16T18:26:23Z

Resolves #2341

for more information, see https://pre-commit.ci

codecov · 2024-02-16T18:48:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.53%. Comparing base (6b512e5) to head (c537b8c).
Report is 17 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2381   +/-   ##
=======================================
  Coverage   89.53%   89.53%           
=======================================
  Files          17       17           
  Lines        2380     2380           
=======================================
  Hits         2131     2131           
  Misses        249      249

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

aragilar · 2024-02-17T00:05:01Z

Looks like codecov had an issue uploading, rerunning the failing jobs.

Delengowski · 2024-02-18T13:01:34Z

Adds a small runtime hit when there is no selection. I'm wondering that if souce_sel is passed as None we should keep that information and not perform that check at all. This is all assuming my check is the correct way to go about it and there's not a simpler solution.

import h5py
import numpy
a = numpy.arange(25).reshape(5, 5)
h5f = h5py.File('iterchunks.h5', 'w')
d = h5f.create_dataset('test1', data=a, chunks=(2, 2))

This branch

In [22]: %timeit -n 1000000 -r 7 [x for x in d.iter_chunks()]
37.3 µs ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Master

In [10]: %timeit -n 1000000 -r 7 [x for x in d.iter_chunks()]
32 µs ± 265 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

takluyver · 2024-03-13T14:38:18Z

h5py/_hl/dataset.py

                # we still have room to extend along this dimensions
-                return tuple(slices)
+                break

            if dim > 0:
                # reset to the start and continue iterating with higher dimension
                self._chunk_index[dim] = 0


Looking at the code in master at present, this is the bit I'd be looking to change. When it hits the end of the selection in one dimension, this is telling it to go back to the start in that dimension - but the overall start of the dimension rather than the start of the selection. So I would try something like this:

Suggested change

self._chunk_index[dim] = 0

self._chunk_index[dim] = s.start // self._layout[dim]

If that works, I suspect it might fix it without the added complexity elsewhere in the class. In particular, I'm suspicious of the line self._chunk_index[dim] = 1 you've added a few lines above. I suspect this hardcoded 1 works for the particular case you've added as a test, but will still be wrong in other cases.

I will get this tomorrow. I hope it's simpler bc I felt what I was doing was complicated but it was only solution I was conceptualizing.

git checkout master -- h5py/_hl/dataset.py and swapping that does indeed make it pass for the additional test case.

This is good because I was uncomfortable with the slowness I introduced in the common case.

for more information, see https://pre-commit.ci

…ki/h5py into bad_slices_Dataset_iter_chunks

Delengowski · 2024-03-16T13:07:42Z

Current test failures seem unrelated

https://dev.azure.com/h5pyappveyor/h5py/_build/results?buildId=1917&view=logs&j=fb5a0786-7759-5eb0-8063-f3845f8c552c&t=efe2eb3b-410d-520c-c622-b15b2e00631e&l=334

takluyver · 2024-03-22T13:25:27Z

Thanks! Yes, I think the failures are where it's trying to test with numpy 2, which is in beta now, so nothing to do with this. #2386 is working on that.

takluyver · 2024-03-22T13:36:53Z

Close/reopen to re-run tests with that PR merged.

Adds test case for bug report and resolves

78c16b6

[pre-commit.ci] auto fixes from pre-commit.com hooks

fbd33f8

for more information, see https://pre-commit.ci

takluyver changed the title ~~Adds test case for bug report and resolves~~ Fix dset.iter_chunks() with selection Mar 13, 2024

takluyver reviewed Mar 13, 2024

View reviewed changes

Delengowski and others added 5 commits March 16, 2024 08:36

Reverts changes and adds much simpler suggestion

88b3b12

Resolves merge conflict

87f8d06

[pre-commit.ci] auto fixes from pre-commit.com hooks

c7c2697

for more information, see https://pre-commit.ci

Reverts formatting mishaps from merge

b416b2e

Merge branch 'bad_slices_Dataset_iter_chunks' of github.com:Delengows…

c537b8c

…ki/h5py into bad_slices_Dataset_iter_chunks

takluyver closed this Mar 22, 2024

takluyver reopened this Mar 22, 2024

takluyver merged commit 060a0d0 into h5py:master Mar 24, 2024
42 checks passed

takluyver added this to the 3.11 milestone Apr 5, 2024

takluyver added the bug label Apr 5, 2024

takluyver mentioned this pull request Apr 5, 2024

Possible bogus items from Dataset.iter_chunks() #2341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dset.iter_chunks() with selection #2381

Fix dset.iter_chunks() with selection #2381

Delengowski commented Feb 16, 2024

Delengowski commented Feb 16, 2024

codecov bot commented Feb 16, 2024 •

edited

aragilar commented Feb 17, 2024

Delengowski commented Feb 18, 2024

takluyver Mar 13, 2024

Delengowski Mar 14, 2024

Delengowski Mar 16, 2024

Delengowski commented Mar 16, 2024

takluyver commented Mar 22, 2024

takluyver commented Mar 22, 2024

	self._chunk_index[dim] = 0
	self._chunk_index[dim] = s.start // self._layout[dim]

Fix dset.iter_chunks() with selection #2381

Fix dset.iter_chunks() with selection #2381

Conversation

Delengowski commented Feb 16, 2024

Delengowski commented Feb 16, 2024

codecov bot commented Feb 16, 2024 • edited

Codecov Report

aragilar commented Feb 17, 2024

Delengowski commented Feb 18, 2024

takluyver Mar 13, 2024

Choose a reason for hiding this comment

Delengowski Mar 14, 2024

Choose a reason for hiding this comment

Delengowski Mar 16, 2024

Choose a reason for hiding this comment

Delengowski commented Mar 16, 2024

takluyver commented Mar 22, 2024

takluyver commented Mar 22, 2024

codecov bot commented Feb 16, 2024 •

edited