-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jagged index array raises FIXME #1022
Comments
At a glance, it looks like this is being triggered from |
The history of that exception was that I was reviewing some of @nsmith-'s corrections to jagged slicing, and I came upon this function that didn't look to me like it could ever be reached. (I couldn't construct an example that would trigger it.) I thought it might be dead code, but instead of removing it, I put this FIXME error message in, so that an example that triggers it would be reported and we can use that example to understand how that code path is reached. If it were Python, we could just put an exception at that point and look at the stack trace. Since it's C++, we'll have to do a bit more work, but it's still possible. Thanks! |
I will look into debugging this with Valgrind / gdb; I've not touched any of the compilation code yet, so it will be interesting to see whether CLion can handle the CMake project. |
This code is slated for conversion into Python (which is why I have Python stack traces in mind), but if this is a common enough case—that I somehow wasn't able to make up—then waiting for v2 may be too long. If it depends crucially on the "list-of-IndexedArray-of-X" and not, for instance, on "list-of-IndexedOptionArray-of-X", then that point in the code could simply |
I can also trigger this with a more conventional layout: y = ak.Array([
[
[1,2,3,4],
[5,6,7,8],
]
])
y = ak.to_regular(y,axis=-1) Could you help me to understand what this particular overload is supposed to do? By name, |
SliceVarNewAxis had to be added later, as an alternate form of SliceNewAxis. They're very similar, made to distinguish between |
SliceNewAxis and SliceVarNewAxis shouldn't be relevant because none of these slices involve any But there's definitely something murky here. >>> y = ak.to_regular(ak.Array([[[1, 2, 3, 4], [5, 6, 7, 8]]]), axis=-1)
>>> y
<Array [[[1, 2, 3, 4], [5, 6, 7, 8]]] type='1 * var * 4 * int64'>
>>> y.layout
<ListOffsetArray64>
<offsets><Index64 i="[0 2]" offset="0" length="2" at="0x56448d7a6940"/></offsets>
<content><RegularArray size="4">
<content><NumpyArray format="l" shape="8" data="1 2 3 4 5 6 7 8" at="0x56448d7aa960"/></content>
</RegularArray></content>
</ListOffsetArray64> and >>> t = ak.argmax(y, axis=-1, keepdims=True)
>>> t
<Array [[[3], [3]]] type='1 * var * 1 * ?int64'>
>>> t.layout
<ListOffsetArray64>
<offsets><Index64 i="[0 2]" offset="0" length="2" at="0x56448d7a9b30"/></offsets>
<content><RegularArray size="1">
<content><ByteMaskedArray valid_when="false">
<mask><Index8 i="[0 0]" offset="0" length="2" at="0x56448d7a9bb0"/></mask>
<content><NumpyArray format="l" shape="2" data="3 3" at="0x56448d7a9ab0"/></content>
</ByteMaskedArray></content>
</RegularArray></content>
</ListOffsetArray64> First of all, this is the simplest and most natural example that triggers the bug: >>> y[t]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward/highlevel.py", line 996, in __getitem__
tmp = ak._util.wrap(self.layout[where], self._behavior)
RuntimeError: FIXME ListArrayOf<T>::SliceVarNewAxis. 2021-02-10 Was this left over from development? If so, it's not getting tested. If anyone out there encounters this error, please report it so that we can properly validate this code path and include it in the tests. https://github.com/scikit-hep/awkward-1.0/issues/new?assignees=&labels=bug+%28unverified%29&template=bug-report.md&title= Is it the fact that >>> t2 = ak.fill_none(t, 999, axis=-1)
>>> t2
<Array [[[3], [3]]] type='1 * var * 1 * int64'>
>>> t2.layout
<ListOffsetArray64>
<offsets><Index64 i="[0 2]" offset="0" length="2" at="0x56448d7a9b30"/></offsets>
<content><RegularArray size="1">
<content><NumpyArray format="l" shape="2" data="3 3" at="0x56448d7ad660"/></content>
</RegularArray></content>
</ListOffsetArray64>
>>> y[t2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward/highlevel.py", line 996, in __getitem__
tmp = ak._util.wrap(self.layout[where], self._behavior)
RuntimeError: FIXME ListArrayOf<T>::SliceVarNewAxis. 2021-02-10 Was this left over from development? If so, it's not getting tested. If anyone out there encounters this error, please report it so that we can properly validate this code path and include it in the tests. https://github.com/scikit-hep/awkward-1.0/issues/new?assignees=&labels=bug+%28unverified%29&template=bug-report.md&title= Does it depend on this depth of nesting? Yes. >>> y[0]
<Array [[1, 2, 3, 4], [5, 6, 7, 8]] type='2 * 4 * int64'>
>>> t[0]
<Array [[3], [3]] type='2 * 1 * ?int64'>
>>> y[0][t[0]]
<Array [[4, 4, 4, 4], [8, 8, 8, 8]] type='2 * var * ?int64'> But—murkiness—it's not okay if the slicer ( >>> y[0][t2[0]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward/highlevel.py", line 996, in __getitem__
tmp = ak._util.wrap(self.layout[where], self._behavior)
ValueError: in RegularArray attempting to get 3, index out of range
(https://github.com/scikit-hep/awkward-1.0/blob/1.4.0/src/cpu-kernels/awkward_RegularArray_getitem_next_array_regularize.cpp#L19) Oh, but I guess that's because this has become a strictly regular array problem, and it's deferring to NumPy rules. >>> y0 = ak.to_numpy(y[0])
>>> t20 = ak.to_numpy(t2[0])
>>> y0
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> t20
array([[3],
[3]])
>>> y0[t20]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 0 with size 2 That's the design—the potentially missing values makes the slicer be interpreted as a nested slice—but a bit counter-intuitive because it looks like a small change from the case in which the values are not formally nullable. But NumPy does not ever do nested slices; each dimension has to be a separate array in a tuple, and any multidimensionalness in the slicer is taken to reshape the output. Both of those considerations presuppose rectilinear data. So this "murkiness," at least, is understood. It's a poor fit to advanced NumPy slicing features and the idea of variable-length data. But the FIXME error message should not be encountered. |
Yes, it seems that this code path is indeed coming from
|
The line that introduces the This is called from As to the interpretation of what this means ... I'm not quite up to speed there yet. |
So, as per Gitter, I determined that I.e. >>> array = ak.Array(
... [[[1.1, 2.2, 3.3], [3, 4], [4.4, 5.5], [6.6, 7.7], [8, 6], [7.7, 8.8, 9.9]]]
... )
>>> array
<Array [... 7.7], [8, 6], [7.7, 8.8, 9.9]]] type='1 * var * var * float64'>
>>> ix = ak.Array([[0, -1]])
>>> ix
<Array [[0, -1]] type='1 * var * int64'>
>>> array[ix]
<Array [... 1.1, 2.2, 3.3], [7.7, 8.8, 9.9]]] type='1 * var * var * float64'>
>>> array[ix[np.newaxis, ...]]
<Array [... [6.6, 7.7], [8, 6], [7.7, 9.9]]] type='1 * var * var * float64'> Clearly, This is being raised for my test case because Given my current understanding, I think we actually want to remove this exception altogether - isn't this method required? |
If removing the exception altogether returns the right result, then that's the best way to resolve this issue. The FIXME was added because I wasn't sure whether the whole function could ever be reached. If it can be reached and it does the right thing, then the function should stay and the exception should be removed. If it can be reached and it does the wrong thing, then it will unfortunately require more scrutiny. |
In disabling this exception, the result fails with ValueError: cannot fit jagged slice with length 1 into ListArray64 of size 2
(https://github.com/scikit-hep/awkward-1.0/blob/1.4.0/src/libawkward/array/ListArray.cpp#L1827) I think the problem is in the translation from the Given these two examples: def test_list_jagged_t():
y = ak.Array(
ak.layout.ListOffsetArray64(
ak.layout.Index64(np.r_[0, 2]),
ak.layout.ListArray64(
ak.layout.Index64(np.r_[0, 4]),
ak.layout.Index64(np.r_[4, 8]),
ak.layout.NumpyArray(
np.array([0, 1, 2, -1, 0, 1, 2, 3])
)
)
)
)
t = ak.argmax(y, axis=-1, keepdims=True, mask_identity=False)
print(y[t])
def test_list_regular_t():
y = ak.Array(
ak.layout.ListOffsetArray64(
ak.layout.Index64(np.r_[0, 2]),
ak.layout.ListArray64(
ak.layout.Index64(np.r_[0, 4]),
ak.layout.Index64(np.r_[4, 8]),
ak.layout.NumpyArray(
np.array([0, 1, 2, -1, 0, 1, 2, 3])
)
)
)
)
t = ak.to_regular(ak.argmax(y, axis=-1, keepdims=True, mask_identity=False), axis=-1)
print(y[t]) the [jagged([0, 2], jagged([0, 1, 2], array([2, 3])))] conversely, for [jagged([0, 2], newaxis(array([2, 3])))]
jagged([0, 2], array([2, 2])) which would be equivalent to an initial [jagged([0, 2], jagged([0, 2], array([2, 2])))] Given that I am not fully up to speed with the complexity here, I have made some guesses that might be wrong. I have observed that
For this inner We are just taking the offsets off this parent array, whereas we need to use the offsets of the child. At this point, I think my lack of knowledge about the design at play means I am unable to make any informed decision as to how to proceed! This feature is technically not documented anywhere, so it's partially hidden behind a compile-time feature flag (namely, a big loud exception)! As such, you probably could leave this until v2. In the mean-time, I'll ensure that any jagged lookups don't have regular 1-dims. |
@agoose77 I've been investigating this issue. Normally, the mixed variable and regular slicer that you made would be declared invalid: all-variable dimensions triggers Awkward advanced indexing and all-regular triggers NumPy advanced indexing; a mixture would be confusing, so it's not allowed. However, it sneaks through because length-1 regular arrays in a slice are interpreted as SliceVarNewAxis: https://github.com/scikit-hep/awkward-1.0/pull/694/files#diff-a63810de74c2520ec41382cece2d156993c47ba9eb69772ce6b10a8262536e22 The use of the word "newaxis" is a little misleading; this is not representing a This feature was added in a rush to prepare a tutorial that never happened (https://github.com/jpivarski-talks/2021-02-09-propose-scipy2021-tutorial/blob/main/prep/million-song.ipynb). It was the most minimal way I could see to add features that were necessary to do the analysis in that tutorial. The idea was that this is replicating a NumPy feature—boradcasting length-1 dimensions in slices—in Awkward advanced indexing. But Awkward advanced indexing fits a nested structure in the slicer to the nested structure that you're slicing, whereas NumPy advanced indexing slices each dimension by a different array, and all of those arrays in the tuple are broadcasted. NumPy advanced indexing is truly broadcasting because there are multiple arrays in the slicer; Awkward advanced indexing has only one array in the slicer, so it's not really broadcasting. Treating a length-1 dimension differently from any other length makes this rule hard to predict. The idea was that you'd get the length-1 dimension from a In the test suite, the rule is only triggered in the tests that were added to check it. The rule was never advertised (after all, that tutorial was never presented), and it is unlikely to have made its way into any analyses other than yours, since it's rather easy to trigger the FIXME (which is much older, and may yet be unreachable without the new SliceVarNewAxis rule). Slices that are enabled by the rule can still be performed without it—for instance, this test using the rule: array = ak.Array(
[
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]],
]
)
slicer = ak.Array([[3, 4], [0, 1, 2, 3]])
assert array[slicer[:, np.newaxis]].tolist() == [
[[3, 4], [8, 9], [13, 14]],
[[15, 16, 17, 18], [20, 21, 22, 23], [25, 26, 27, 28]],
] can be replaced by the following, without the new rule: assert array[[[[3, 4], [3, 4], [3, 4]], [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]]].tolist() == [
[[3, 4], [8, 9], [13, 14]],
[[15, 16, 17, 18], [20, 21, 22, 23], [25, 26, 27, 28]],
] Your use-case is also possible, but only if the slicer is all-variable (in keeping with the rule to avoid confusion between Awkward advanced indexing and NumPy advanced indexing): >>> y = ak.Array([[[1, 2, 3, 4], [5, 6, 7, 8]]])
>>> t = ak.argmax(y, axis=-1, keepdims=True)
>>> y[t]
<Array [[[4], [8]]] type='1 * var * var * ?int64'> ( So I think I'm going to remove it, which reverts only PR: #694. |
I will be removing it: #1027 |
Version of Awkward Array
1.4.0
Description and code to reproduce
The following code
raises this traceback
The text was updated successfully, but these errors were encountered: