Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ak.min changes index of empty nested lists #982

Closed
fleble opened this issue Jul 7, 2021 · 6 comments · Fixed by #997
Closed

ak.min changes index of empty nested lists #982

fleble opened this issue Jul 7, 2021 · 6 comments · Fixed by #997
Labels
bug The problem described is something that must be fixed

Comments

@fleble
Copy link

fleble commented Jul 7, 2021

I wonder if the following is a bug or a feature:

>>> import awkward as ak
>>> ak.__version__
'1.3.0'
>>> ak_array = ak.Array([[[1, 2, 3], [], [4, 3, 2]], [[4, 5, 6], [], [2, 3, 4]]])
>>> ak.min(ak_array, axis=0)
<Array [[1, 2, 3], [2, 3, 2], []] type='3 * var * ?int64'>

The empty nested list is now in last index.
This works if the array is masked though:

>>> ak_array = ak.mask(ak_array, ak.count(ak_array, axis=2)>0)
>>> ak.min(ak_array, axis=0)
<Array [[1, 2, 3], [], [2, 3, 2]] type='3 * var * ?int64'>

Is this a bug in ak.min or should one always mask the ak array for such kind of operations?

@fleble fleble added the bug (unverified) The problem described would be a bug, but needs to be triaged label Jul 7, 2021
@agoose77
Copy link
Collaborator

agoose77 commented Jul 7, 2021

Hi @fleble, thanks for filing a report!

From reading the documentation, I would expect that for the unmasked case, you should see the same result as the masked case. Semantically, the empty lists are still lists, and so the dimension exists for the reduction. So, it doesn't look like the intended behaviour to me. It appears as though empty arrays are behaving like they are missing entirely. I don't anticipate this behaviour, but perhaps my expectations are wrong.

If this is a bug, then it might be an issue with the kernel for the output offset calculations e.g. https://github.com/scikit-hep/awkward-1.0/blob/a70a535a818d54c6dcdb60711dff09a283441541/src/cpu-kernels/awkward_ListOffsetArray_reduce_nonlocal_outstartsstops_64.cpp but I don't know enough about this layer to be able to confidently say anything. @ianna or @jpivarski will know whether this is intended behaviour.

@ianna
Copy link
Collaborator

ianna commented Jul 7, 2021

@fleble - thanks for reporting this! I'd expect the latter output, because when axis=0, the following array min

[
 [
  [1, 2, 3], [], [4, 3, 2]
 ],
 [ 
  [4, 5, 6], [], [2, 3, 4]
 ]
]

I think, it should be:

  [1, 2, 3], [], [2, 3, 2]

@jpivarski
Copy link
Member

I think you're right. I think it should not be moving that empty list.

Here's a similar example in NumPy:

>>> np_array = np.array([
...     [[1, 2, 3], [9, 9, 9], [4, 3, 2]],
...     [[4, 5, 6], [9, 9, 9], [2, 3, 4]]
... ])
>>> np.min(np_array, axis=0)
array([[1, 2, 3],
       [9, 9, 9],
       [2, 3, 2]])
>>> np_array = np.array([
...     [[1, 2, 3], [0, 0, 0], [4, 3, 2]],
...     [[4, 5, 6], [0, 0, 0], [2, 3, 4]]
... ])
>>> np.min(np_array, axis=0)
array([[1, 2, 3],
       [0, 0, 0],
       [2, 3, 2]])

And here's an example where the list is not empty, but has one element:

>>> ak_array = ak.Array([
...     [[1, 2, 3], [9], [4, 3, 2]],
...     [[4, 5, 6], [9], [2, 3, 4]]
... ])
>>> ak.min(ak_array, axis=0)
<Array [[1, 2, 3], [9], [2, 3, 2]] type='3 * var * ?int64'>

Actually, that last one clinches it: the length-0 list should not be moving, as a length-1 list does not.

@agoose77 agoose77 added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Jul 7, 2021
@fleble
Copy link
Author

fleble commented Jul 8, 2021

Thank you for the fast replies! Glad to help by reporting this bug and thank you for the development of this package!
I guess you will be posting here when the bug is fixed?

@ianna
Copy link
Collaborator

ianna commented Jul 8, 2021

Thank you for the fast replies! Glad to help by reporting this bug and thank you for the development of this package!
I guess you will be posting here when the bug is fixed?

Yes, this issue will be linked to a PR addressing it. Thanks again!

@jpivarski
Copy link
Member

This was super-tricky and got at the heart of how nonlocal reducers (i.e. ak.min or any other reducer with axis != -1) work. It slipped through the previous tests because the empty lists have to be aligned with one another—usually less symmetric tests are better, but this case is only triggered if all of the empty inner lists are combined into a resultant empty list. My old tests were so asymmetric that the empty lists never aligned like this, and a quantity that I needed to put the empty lists in the right place in the output is computed from the maximum of an old case that I had handled and this one, so it only becomes relevant when one exceeds the other. I wasn't expecting this rabbit hole to lead so deep!

Should be fixed now, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants