-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unintuitive slicing behaviour when slicing with Arrays #370
Comments
Indexing can be legitimately confusing (also true of NumPy). I'll break this down to what I think you're asking. First, the array of records can always be projected onto array = ak.Array([
[{"x": 1.1, "y": [1]}],
[{"x": 2.2, "y": [11, 12]}],
[{"x": 3.3, "y": [21, 22, 23]}],
#[], # cannot slice this by index (if empty, you'll just have to pass an empty list in the slice)
[{"x": 3.3, "y": [31, 32, 33]}],
[{"x": 4.4, "y": [41, 42, 43, 44]}],
[{"x": 5.5, "y": [51, 52, 53, 54, 55]}]
]) which has type 6 * var * {"x": float64, "y": var * int64} we could talk about array["y"] # or array.y which is [[[1]], [[11, 12]], [[21, 22, 23]], [[31, 32, 33]], [[41, 42, 43, 44]], [[51, 52, 53, 54, 55]]] with type 6 * var * var * int64 You can certainly do >>> array["y", [[0], [0], [1], [1], [2], [2]]]
<Array [[[[1]]], [[[1, ... [[[21, 22, 23]]]] type='6 * 1 * var * var * int64'> because each of the elements of the slice has length 1, just like >>> array.y[0, 0]
<Array [1] type='1 * int64'> # has an element 0
>>> array.y[1, 0]
<Array [11, 12] type='2 * int64'> # has an element 0
>>> array.y[2, 0]
<Array [21, 22, 23] type='3 * int64'> # has an element 1
>>> array.y[3, 0]
<Array [31, 32, 33] type='3 * int64'> # has an element 1
>>> array.y[4, 0]
<Array [41, 42, 43, 44] type='4 * int64'> # has an element 2
>>> array.y[5, 0]
<Array [51, 52, 53, 54, 55] type='5 * int64'> # has an element 2
>>> array.y[6, 0] and so that's why it works. ak.singletons has nothing to do with it: it's used to convert Recent versions of NumPy provide a hint about why the >>> mask = np.array([
... [True],
... [True, True],
... [False, True, True],
... [False, True, False],
... [False, False, True, True],
... [False, False, True, False]]) raises the warning
This is a NumPy array of >>> mask = ak.Array([
... [True],
... [True, True],
... [False, True, True],
... [False, True, False],
... [False, False, True, True],
... [False, False, True, False]])
>>> mask
<Array [[True], [True, ... False, True, False]] type='6 * var * bool'> but it also needs the length-1 structure of >>> mask = ak.Array([
... [[True]],
... [[True, True]],
... [[False, True, True]],
... [[False, True, False]],
... [[False, False, True, True]],
... [[False, False, True, False]]])
>>> mask
<Array [[[True]], ... False, True, False]]] type='6 * var * var * bool'>
>>> array.y
<Array [[[1]], [[11, ... [51, 52, 53, 54, 55]]] type='6 * var * var * int64'>
>>> ak.num(mask, axis=2)
<Array [[1], [2], [3], [3], [4], [4]] type='6 * var * int64'>
>>> ak.num(array.y, axis=2)
<Array [[1], [2], [3], [3], [4], [5]] type='6 * var * int64'> Okay; they line up: now we're ready to go! >>> array.y[mask]
<Array [[[1]], [[11, 12], ... 43, 44]], [[53]]] type='6 * var * var * int64'> About making a slice option that can be different at each level (e.g. slice list 1 with 0:0, list 2 with 1:2, list 3 with 0:2), that's an interesting idea, something that becomes useful in the context of ragged arrays that you wouldn't have with rectilinear arrays. Right now, that sort of thing can be done by opening up the >>> original = array.y.layout
>>> original
<ListOffsetArray64>
<offsets><Index64 i="[0 1 2 3 4 5 6]" offset="0" length="7" at="0x55f65db71150"/></offsets>
<content><ListOffsetArray64>
<offsets><Index64 i="[0 1 3 6 9 13 18]" offset="0" length="7" at="0x55f65db75170"/></offsets>
<content><NumpyArray format="l" shape="18" data="1 11 12 21 22 ... 51 52 53 54 55" at="0x55f65d654e60"/></content>
</ListOffsetArray64></content>
</ListOffsetArray64>
>>> starts = np.asarray(original.content.starts)
>>> stops = np.asarray(original.content.stops)
>>> starts, stops
(array([ 0, 1, 3, 6, 9, 13], dtype=int64),
array([ 1, 3, 6, 9, 13, 18], dtype=int64)) Slicing with a different >>> starts = starts + [0, 0, 1, 1, 2, 2]
>>> stops = stops - [0, 0, 1, 1, 2, 2]
>>> starts, stops
(array([ 0, 1, 4, 7, 11, 15], dtype=int64), array([ 1, 3, 5, 8, 11, 16], dtype=int64))
>>> modified = ak.layout.ListOffsetArray64(
... original.offsets,
... ak.layout.ListArray64(
... ak.layout.Index64(starts),
... ak.layout.Index64(stops),
... original.content.content))
>>> modified
<ListOffsetArray64>
<offsets><Index64 i="[0 1 2 3 4 5 6]" offset="0" length="7" at="0x55f65db71150"/></offsets>
<content><ListArray64>
<starts><Index64 i="[0 1 4 7 11 15]" offset="0" length="6" at="0x55f65db704d0"/></starts>
<stops><Index64 i="[1 3 5 8 11 16]" offset="0" length="6" at="0x55f65db5b0b0"/></stops>
<content><NumpyArray format="l" shape="18" data="1 11 12 21 22 ... 51 52 53 54 55" at="0x55f65d654e60"/></content>
</ListArray64></content>
</ListOffsetArray64>
>>> ak.Array(modified)
<Array [[[1]], [[11, 12]], ... [[]], [[53]]] type='6 * var * var * int64'>
>>> ak.Array(modified).tolist()
[[[1]], [[11, 12]], [[22]], [[32]], [[]], [[53]]] And that's probably how a variable starts:stops would be implemented. But if the indexing is tricky, this is tricky-squared. It's pretty easy to make an array that's internally inconsistent (check with ak.is_valid and ak.validity_error). |
Great, thank you very much for that swift clarification. I was particularly looking for the second explanations (overall my motivation for awkward as I am dealing with that sort of tasks a lot in the context of jagged arrays). You can close this issue. Maybe the first part could be part of the quickstart. Let me know if I can help fill the doc stubs with content. |
starts = np.asarray(original.content.starts) Throws an error with the same |
Thanks for the offer! The stubs are there because I have to finish other projects (Uproot 4), which also need documentation—Awkward is half-there in that it has all the reference docs, and the ones in the Python API include examples. If you submit a documentation issue with the examples you'd like to see fill the stub, I'll enter them into the stub. I don't have it set up as a wiki (whenever I do make a wiki, no one edits it!), in part because evaluating the JupyterBook is part of the build (to ensure that tutorial examples are not broken), which gives me a chance to edit. But suggested text definitely bumps it up in priority: if you write it, I'll post it. |
That's where you're getting into trickiness-squared. The different node types in a layout have different properties: NumpyArray represents rectilinear data, like NumPy, which has no need of |
Just realized that your suggested array slicing won't result in what I meant by index slicing.
This will result in a select of entire nested lists by a collection of indexes ( For simplicity, let's reconsider: >>> a = ak.from_iter([[1.1, 2.2, 3.3], [], [4.4, 5.5], [6.6, 7.7, 8.8], [9.9]])
>>> a[[0, 3], [True, False, True]]
<Array [1.1, 8.8] type='2 * float64'> where we rearrange and select within nested same-size list (rather uncommon to assume rectangular set of non-jagged arrays, which would fail over I intended to do something like this (multidimensional index slicing): >>> idx = [0,1,2]
>>> a[[0,1,3], idx]
<Array [[1.1], [], [8.8]] type='3 * var * float64'>
# or generally over entire array
idxs = [0,1,0,2,1]
# assert a.shape[0] == len(idxs)
>>> a[list(range(a.shape[0])), idxs]
<Array [[1.1], [], [4.4], [8.8], []] type='5 * var * float64'> or perhaps like so (treat slice array of type a[ak.layout.Index64(idxs)]
<Array [[1.1], [], [4.4], [8.8], []] type='5 * var * float64'> Do only way I see to do this is by your suggested second approach where I set all |
You could use ak.pad_none to make each inner list have at least the right number of elements: >>> ak.pad_none(a, 3)
<Array [[1.1, 2.2, 3.3], ... [9.9, None, None]] type='5 * var * ?float64'> Then it would be legal to ask for >>> ak.pad_none(a, 3)[[0, 1, 3], [0, 1, 2]]
<Array [1.1, None, 8.8] type='3 * ?float64'> The Numpyian thing to do when given advanced arrays in two dimensions is to "iterate over them as one" and return the elements that match—a single-depth list, as above. In your examples, it looks like you want nested lists, and you want the empty list in >>> ak.singletons(ak.pad_none(a, 3)[[0, 1, 3], [0, 1, 2]])
<Array [[1.1], [], [8.8]] type='3 * var * float64'> In your example, you have Your second example would look like this then: >>> ak.singletons(ak.pad_none(a, 3)[range(len(a)), [0, 1, 0, 2, 1]])
<Array [[1.1], [], [4.4], [8.8], []] type='5 * var * float64'> though if it was big, you wouldn't want to do a Python >>> ak.singletons(ak.pad_none(a, 3)[np.arange(len(a)), [0, 1, 0, 2, 1]])
<Array [[1.1], [], [4.4], [8.8], []] type='5 * var * float64'> I should warn you to stay away from |
Not sure if bug or feature.
Slicing by an array of indices would be very handy but currently fails (or is unreliable).
Minimal working example taken (mostly based on the README.md):
Maybe I am missing something here.
Eventually would be nice to achieve a slice from
startIndices
toendIndicecs
without creating boolean arrays of the entire length or a numba for loop.Fails with
ValueError: arrays used as an index must be a (native-endian) integer or boolean
The text was updated successfully, but these errors were encountered: