Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError: std::bad_alloc from masked slicing #723

Closed
jrueb opened this issue Feb 10, 2021 · 4 comments · Fixed by #725
Closed

MemoryError: std::bad_alloc from masked slicing #723

jrueb opened this issue Feb 10, 2021 · 4 comments · Fixed by #725
Labels
bug The problem described is something that must be fixed

Comments

@jrueb
Copy link
Contributor

jrueb commented Feb 10, 2021

import awkward as ak
import numpy as np

a = ak.layout.NumpyArray(np.empty(122))
idx = ak.layout.Index64([0, 2, 4, 6, 8, 10, 12])
a = ak.layout.ListOffsetArray64(idx, a)
idx = ak.layout.Index64([0, -1, 1, 2, -1, 3, 4, 5])
a = ak.layout.IndexedOptionArray64(idx, a)
a = ak.Array(a)
a[[[0], None]]

The last line causes a MemoryError: std::bad_alloc to be raised. Occasionally (randomly) it raises a ValueError instead reading something like in ListArray64 attempting to get 875985249, jagged slice's offsets extend beyond its content.

Tested with awkward version 1.1.1 and current git main.

@jrueb jrueb added the bug (unverified) The problem described would be a bug, but needs to be triaged label Feb 10, 2021
@jpivarski jpivarski added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Feb 10, 2021
@jpivarski jpivarski linked a pull request Feb 10, 2021 that will close this issue
@jpivarski
Copy link
Member

This was a case of the wrong error message. Jagged slices need to "fit" the corresponding array at all levels except the last. That was assumed in some parts of the code and failed to be enforced in others. I don't know exactly what was happening to raise the memory error, but it was likely reading off the end of an array to decide how big to make the next array.

I've instituted checks for array length at each level of Content::getitem_next_jagged and homogenized the error messages.

>>> a = ak.layout.NumpyArray(np.arange(122))
>>> idx = ak.layout.Index64([0, 2, 4, 6, 8, 10, 12])
>>> a = ak.layout.ListOffsetArray64(idx, a)
>>> idx = ak.layout.Index64([0, -1, 1, 2, -1, 3, 4, 5])
>>> a = ak.layout.IndexedOptionArray64(idx, a)
>>> a = ak.Array(a)
>>> a
<Array [[0, 1], None, [2, ... 8, 9], [10, 11]] type='8 * option[var * int64]'>

>>> # not allowed
>>> a[[[0], None]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward-1.0/awkward/highlevel.py", line 1007, in __getitem__
    return ak._util.wrap(self._layout[where], self._behavior)
ValueError: cannot fit jagged slice with length 2 into IndexedOptionArray64 of size 8

(https://github.com/scikit-hep/awkward-1.0/blob/1.1.1/src/libawkward/array/IndexedArray.cpp#L2770)

>>> # allowed
>>> a[[[0], None, [], [], [], [], [], []]]
<Array [[0], None, [], ... None, [], [], []] type='8 * option[var * int64]'>

Imminent fixes/features automation moved this from C++ to Done Feb 10, 2021
@jrueb
Copy link
Contributor Author

jrueb commented Feb 11, 2021

I think the check if the slice fits now is interfering with some valid slices. For example this doesn't work

a[ak.argsort(a)]

cannot fit jagged slice with length 6 into IndexedOptionArray64 of size 8. It seems it's not counting masked rows.

@jpivarski
Copy link
Member

Actually, it's a different bug that fixing the above reveals.

Starting with the same a (but using arange instead of empty, so that the numbers are recognizable and repeatable):

>>> a = ak.layout.NumpyArray(np.arange(122))
>>> idx = ak.layout.Index64([0, 2, 4, 6, 8, 10, 12])
>>> a = ak.layout.ListOffsetArray64(idx, a)
>>> idx = ak.layout.Index64([0, -1, 1, 2, -1, 3, 4, 5])
>>> a = ak.layout.IndexedOptionArray64(idx, a)
>>> a = ak.Array(a)

The ak.argsort(a) is producing an IndexedArray of an IndexedArray (both are IndexedOptionArrays, in particular):

>>> ak.argsort(a).layout
<IndexedOptionArray64>
    <index><Index64 i="[0 -1 1 2 -1 3 4 5]" offset="0" length="8" at="0x56159d603d50"/></index>
    <content><IndexedOptionArray64>
        <index><Index64 i="[0 1 2 3 4 5 -1 -1]" offset="0" length="8" at="0x56159d5f7b10"/></index>
        <content><ListOffsetArray64>
            <offsets><Index64 i="[0 2 4 6 8 10 12]" offset="0" length="7" at="0x56159d5454c0"/></offsets>
            <content><NumpyArray format="l" shape="12" data="0 1 0 1 0 ... 1 0 1 0 1" at="0x56159d5efd90"/></content>
        </ListOffsetArray64></content>
    </IndexedOptionArray64></content>
</IndexedOptionArray64>

The ak.argsort operation ought to be calling simplify on its output before returning, if it's an option-type or indexed type.

>>> ak.argsort(a).layout.simplify()
<IndexedOptionArray64>
    <index><Index64 i="[0 -1 1 2 -1 3 4 5]" offset="0" length="8" at="0x56159d603d50"/></index>
    <content><ListOffsetArray64>
        <offsets><Index64 i="[0 2 4 6 8 10 12]" offset="0" length="7" at="0x56159d5454c0"/></offsets>
        <content><NumpyArray format="l" shape="12" data="0 1 0 1 0 ... 1 0 1 0 1" at="0x56159d606920"/></content>
    </ListOffsetArray64></content>
</IndexedOptionArray64>

This doesn't change the meaning of the result (by definition):

>>> ak.to_list(ak.argsort(a))
[[0, 1], None, [0, 1], [0, 1], None, [0, 1], [0, 1], [0, 1]]
>>> ak.to_list(ak.argsort(a).layout.simplify())
[[0, 1], None, [0, 1], [0, 1], None, [0, 1], [0, 1], [0, 1]]

but the slicing logic assumes that we have simplified IndexedArrays:

>>> a[ak.Array(ak.argsort(a).layout.simplify())]
<Array [[0, 1], None, [2, ... 8, 9], [10, 11]] type='8 * option[var * int64]'>

I should probably also add an exception message whenever a slice is constructed with SliceMissing inside of SliceMissing:

>>> # ak._ext._slice_tostring is my back-door to see what kind of slice object is made from arrays
>>> print(ak._ext._slice_tostring(ak.argsort(a)))
[missing([0, -1, 1, 2, -1, 3, 4, 5], missing([0, 1, 2, 3, 4, 5], jagged([0, 2, 4, 6, 8, 10, 12], array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]))))]
>>> print(ak._ext._slice_tostring(ak.Array(ak.argsort(a).layout.simplify())))
[missing([0, -1, 1, 2, -1, 3, 4, 5], jagged([0, 2, 4, 6, 8, 10, 12], array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1])))]

I should probably also add a validity-check for IndexedArrays inside IndexedArrays, option-type inside option-type, and union-type inside union-type. All of these have to be allowed in intermediate calculations (so I can't refuse them in the constructor, as I probably can for SliceMissing), but they shouldn't be considered valid for analysis.

>>> # This should change.
>>> ak.is_valid(ak.argsort(a))
True

@jpivarski
Copy link
Member

I implemented all of those things in PR #729.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants