New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryError: std::bad_alloc from masked slicing #723
Comments
This was a case of the wrong error message. Jagged slices need to "fit" the corresponding array at all levels except the last. That was assumed in some parts of the code and failed to be enforced in others. I don't know exactly what was happening to raise the memory error, but it was likely reading off the end of an array to decide how big to make the next array. I've instituted checks for array length at each level of >>> a = ak.layout.NumpyArray(np.arange(122))
>>> idx = ak.layout.Index64([0, 2, 4, 6, 8, 10, 12])
>>> a = ak.layout.ListOffsetArray64(idx, a)
>>> idx = ak.layout.Index64([0, -1, 1, 2, -1, 3, 4, 5])
>>> a = ak.layout.IndexedOptionArray64(idx, a)
>>> a = ak.Array(a)
>>> a
<Array [[0, 1], None, [2, ... 8, 9], [10, 11]] type='8 * option[var * int64]'>
>>> # not allowed
>>> a[[[0], None]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward/highlevel.py", line 1007, in __getitem__
return ak._util.wrap(self._layout[where], self._behavior)
ValueError: cannot fit jagged slice with length 2 into IndexedOptionArray64 of size 8
(https://github.com/scikit-hep/awkward-1.0/blob/1.1.1/src/libawkward/array/IndexedArray.cpp#L2770)
>>> # allowed
>>> a[[[0], None, [], [], [], [], [], []]]
<Array [[0], None, [], ... None, [], [], []] type='8 * option[var * int64]'> |
I think the check if the slice fits now is interfering with some valid slices. For example this doesn't work a[ak.argsort(a)]
|
Actually, it's a different bug that fixing the above reveals. Starting with the same >>> a = ak.layout.NumpyArray(np.arange(122))
>>> idx = ak.layout.Index64([0, 2, 4, 6, 8, 10, 12])
>>> a = ak.layout.ListOffsetArray64(idx, a)
>>> idx = ak.layout.Index64([0, -1, 1, 2, -1, 3, 4, 5])
>>> a = ak.layout.IndexedOptionArray64(idx, a)
>>> a = ak.Array(a) The >>> ak.argsort(a).layout
<IndexedOptionArray64>
<index><Index64 i="[0 -1 1 2 -1 3 4 5]" offset="0" length="8" at="0x56159d603d50"/></index>
<content><IndexedOptionArray64>
<index><Index64 i="[0 1 2 3 4 5 -1 -1]" offset="0" length="8" at="0x56159d5f7b10"/></index>
<content><ListOffsetArray64>
<offsets><Index64 i="[0 2 4 6 8 10 12]" offset="0" length="7" at="0x56159d5454c0"/></offsets>
<content><NumpyArray format="l" shape="12" data="0 1 0 1 0 ... 1 0 1 0 1" at="0x56159d5efd90"/></content>
</ListOffsetArray64></content>
</IndexedOptionArray64></content>
</IndexedOptionArray64> The >>> ak.argsort(a).layout.simplify()
<IndexedOptionArray64>
<index><Index64 i="[0 -1 1 2 -1 3 4 5]" offset="0" length="8" at="0x56159d603d50"/></index>
<content><ListOffsetArray64>
<offsets><Index64 i="[0 2 4 6 8 10 12]" offset="0" length="7" at="0x56159d5454c0"/></offsets>
<content><NumpyArray format="l" shape="12" data="0 1 0 1 0 ... 1 0 1 0 1" at="0x56159d606920"/></content>
</ListOffsetArray64></content>
</IndexedOptionArray64> This doesn't change the meaning of the result (by definition): >>> ak.to_list(ak.argsort(a))
[[0, 1], None, [0, 1], [0, 1], None, [0, 1], [0, 1], [0, 1]]
>>> ak.to_list(ak.argsort(a).layout.simplify())
[[0, 1], None, [0, 1], [0, 1], None, [0, 1], [0, 1], [0, 1]] but the slicing logic assumes that we have simplified IndexedArrays: >>> a[ak.Array(ak.argsort(a).layout.simplify())]
<Array [[0, 1], None, [2, ... 8, 9], [10, 11]] type='8 * option[var * int64]'> I should probably also add an exception message whenever a slice is constructed with >>> # ak._ext._slice_tostring is my back-door to see what kind of slice object is made from arrays
>>> print(ak._ext._slice_tostring(ak.argsort(a)))
[missing([0, -1, 1, 2, -1, 3, 4, 5], missing([0, 1, 2, 3, 4, 5], jagged([0, 2, 4, 6, 8, 10, 12], array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]))))]
>>> print(ak._ext._slice_tostring(ak.Array(ak.argsort(a).layout.simplify())))
[missing([0, -1, 1, 2, -1, 3, 4, 5], jagged([0, 2, 4, 6, 8, 10, 12], array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1])))] I should probably also add a validity-check for IndexedArrays inside IndexedArrays, option-type inside option-type, and union-type inside union-type. All of these have to be allowed in intermediate calculations (so I can't refuse them in the constructor, as I probably can for >>> # This should change.
>>> ak.is_valid(ak.argsort(a))
True |
I implemented all of those things in PR #729. |
The last line causes a
MemoryError: std::bad_alloc
to be raised. Occasionally (randomly) it raises aValueError
instead reading something likein ListArray64 attempting to get 875985249, jagged slice's offsets extend beyond its content
.Tested with awkward version 1.1.1 and current git main.
The text was updated successfully, but these errors were encountered: