Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Lengths of empty regular slices #1568

Merged
merged 24 commits into from Aug 15, 2022

Conversation

ioanaif
Copy link
Collaborator

@ioanaif ioanaif commented Jul 26, 2022

@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #1568 (d47b7e8) into main (9e17f29) will increase coverage by 0.00%.
The diff coverage is 31.42%.

Impacted Files Coverage Δ
src/awkward/_v2/_connect/cuda/__init__.py 0.00% <0.00%> (ø)
src/awkward/_v2/_connect/numexpr.py 88.40% <0.00%> (ø)
src/awkward/_v2/_connect/pyarrow.py 88.46% <0.00%> (ø)
src/awkward/_v2/contents/bytemaskedarray.py 88.82% <0.00%> (ø)
src/awkward/_v2/contents/indexedarray.py 73.83% <0.00%> (ø)
src/awkward/_v2/contents/indexedoptionarray.py 89.14% <0.00%> (ø)
src/awkward/_v2/contents/listoffsetarray.py 81.85% <0.00%> (ø)
src/awkward/_v2/contents/unionarray.py 86.27% <0.00%> (ø)
src/awkward/_v2/highlevel.py 71.01% <ø> (+0.24%) ⬆️
src/awkward/_v2/numba.py 93.47% <0.00%> (ø)
... and 11 more

@ioanaif ioanaif marked this pull request as ready for review July 27, 2022 13:44
@ioanaif ioanaif requested a review from jpivarski July 27, 2022 14:07
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like a general solution: if there are two sliced dimensions, should there be two RegularArrays? What if the sliced dimension is not first? Why is start is None or stop is None the condition for determining whether there should be a RegularArray? For positive step, start is None is equivalent to start == 0 and stop is None is equivalent to stop == len(self).

I totally believe that a RegularArray wrapper is needed here, but not under these conditions. Maybe always? Maybe if there's at least one slice somewhere in the tuple? Maybe the number of RegularArrays depends on the number of slices, and their sizes depend on which dimensions they're at?

The way to find out is to see what NumPy does: fortunately this test doesn't depend on dimensions being irregular. You can make a NumPy array with a lot of dimensions and try slices on it with various combinations of : and [0, 1, 2], and [] to see what happens to the output shape.

@ioanaif
Copy link
Collaborator Author

ioanaif commented Jul 28, 2022

What if the sliced dimension is not first?

Good point, I assumed in an empty slice that it would go first.

Why is start is None or stop is None the condition for determining whether there should be a RegularArray?

This was the pattern I found from the tests, the zeros_length needs to be adjusted only when a slice with this properties was passed

The way to find out is to see what NumPy does: fortunately this test doesn't depend on dimensions being irregular. You can make a NumPy array with a lot of dimensions and try slices on it with various combinations of : and [0, 1, 2], and [] to see what happens to the output shape.

I noticed this behaviour is only consistent when slice.start/slice.stop is None, here are some of the NumPy tests I did:

>>> d = np.arange(3 * 3 * 2).reshape(3,3,2)
>>> e = ak._v2.contents.NumpyArray(d)
>>> c1 = np.array([], np.int64)
>>> ak._v2.to_list(d[[2],[1],c1]) == ak._v2.to_list(e[[2],[1],c1]) == []
True
>>> d = np.arange(1 * 3 * 3 * 2).reshape(1,3,3,2)
>>> e = ak._v2.contents.NumpyArray(d)
>>> ak._v2.to_list(d[c1,c1]) == ak._v2.to_list(e[c1,c1]) == []
True
>>> d = np.arange(2 * 3).reshape(2, 3)
>>> e = ak._v2.contents.NumpyArray(d)
>>> ak._v2.to_list(d[:,[]]) == ak._v2.to_list(b[:,[]]) == [[], []]
True 
>>> ak._v2.to_list(d[1:,[]]) == ak._v2.to_list(e[1:,[]]) == [[]]
True
>>> ak._v2.to_list(d[:-2,[]]) == ak._v2.to_list(e[:-2,[]]) == []
True
>>> ak._v2.to_list(d[:-1,[]]) == ak._v2.to_list(e[:-1,[]]) == [[]]
True

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this isn't a general solution to the problem.

Although this test case works (in your unit tests):

>>> ak._v2.to_regular(ak._v2.Array([[1, 2, 3], [4, 5, 6]]), axis=1)[:, []]
<Array [[], []] type='2 * 0 * 2 * 3 * int64'>

There's nothing special about the slice's start and stop being None. For instance, the first dimension has length 2, so : is completely equivalent to 0:2, and yet:

>>> ak._v2.to_regular(ak._v2.Array([[1, 2, 3], [4, 5, 6]]), axis=1)[0:2, []]
<Array [] type='0 * 3 * int64'>

That's easily fixed by removing the if-statement as suggested below.

But here's a deeper issue (pun intended): I can make the example 3-dimensional, rather than 2, by putting the whole thing into a length-1 list, replacing axis=1 with axis=2, and putting another : in the slice:

>>> ak._v2.to_regular(ak._v2.Array([[[1, 2, 3], [4, 5, 6]]]), axis=2)[:, :, []]
<Array [[]] type='1 * var * 3 * int64'>

But it's not returning

>>> np.array([[[1, 2, 3], [4, 5, 6]]])[:, :, []].tolist()
[[[], []]]

as it should.

It has something to do with the nested RegularArrays. If I make the Awkward Array out of RegularArrays, it doesn't work, but if I make it out of a (multidimensional) NumpyArray, it does (although that may just be because we pass it off to NumPy, which is correct by definition).

>>> ak._v2.from_numpy(np.array([[[1, 2, 3], [4, 5, 6]]]), regulararray=True)[:, :, []]
<Array [[]] type='1 * 0 * 1 * 2 * 3 * int64'>
>>> ak._v2.from_numpy(np.array([[[1, 2, 3], [4, 5, 6]]]), regulararray=False)[:, :, []]
<Array [[[], []]] type='1 * 2 * 0 * int64'>

I thought maybe I would figure it out, but no—I poked around with it, but I don't understand why the original solution worked for the test example. Do you see what's going on here?

src/awkward/_v2/contents/content.py Outdated Show resolved Hide resolved
@jpivarski jpivarski linked an issue Aug 9, 2022 that may be closed by this pull request
ioanaif and others added 2 commits August 12, 2022 16:15
Co-authored-by: Jim Pivarski <jpivarski@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lengths of empty regular slices (in v2)
3 participants