New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

fix: unflatten should accept non-packed `counts` #2097

Merged

agoose77 merged 5 commits into main from agoose77/fix-unflatten

Jan 10, 2023

Collaborator

agoose77 commented Jan 9, 2023 •

edited

Loading

Fixes ak.unflatten error message doesn't indicate axis out of boudns #2059
Replaces use of np.ma with our own option handling
Fixes loss of recursion

agoose77 added 4 commits

January 9, 2023 23:34


          fix: unflatten should accept non-packed counts

91e3627


          refactor: drop use of np.ma

e1023f7


          fix: list types all possess starts/stops, not offsets

6fee02c


          test: validate important "weird" unflatten cases

3a2c903

agoose77 requested a review from jpivarski

January 9, 2023 23:45

agoose77 commented

View reviewed changes

src/awkward/_util.py

Comment on lines +904 to +906

+                              numpy.array_equal(left.starts, right.starts)
+                              and numpy.array_equal(left.stops, right.stops)
+                              and visitor(left.content, right.content)

Collaborator Author

agoose77 Jan 9, 2023 •

edited

Loading

I'd thought that offsets was the shared interface (and it would not work well for ListArray), but of course that's nonsense; it's starts/stops. This change to the test helper fixes it.

agoose77 commented

View reviewed changes

src/awkward/operations/ak_unflatten.py

Comment on lines +87 to +89

+                  layout = ak.operations.to_layout(
+                      array, allow_record=False, allow_other=False
+                  ).to_packed()

Collaborator Author

agoose77 Jan 9, 2023

Ultimately we need to know that at any depth, the counts computed from our externally-visible structure aligns with the current content. Therefore, we need most of the guarantees of to_packed(). This could be done in the recursion if we wanted better performance for the non axis=-1 cases, because we don't need the leaf to be contiguous, or indeed care about any nodes after the unflatten depth.

agoose77 commented

View reviewed changes

src/awkward/operations/ak_unflatten.py

Comment on lines +96 to +98

+                      counts = ak.operations.to_layout(
+                          counts, allow_record=False, allow_other=False
+                      ).to_packed()

Collaborator Author

agoose77 Jan 9, 2023 •

edited

Loading

counts should be a trivial layout, so this is reasonable to pack (we don't need contiguousness, but we do need a non-indexed layout).

agoose77 commented

View reviewed changes

src/awkward/operations/ak_unflatten.py

+                      counts = ak.operations.to_layout(
+                          counts, allow_record=False, allow_other=False
+                      ).to_packed()
+                      if counts.is_option and (counts.content.is_numpy or counts.content.is_unknown):

Collaborator Author

agoose77 Jan 9, 2023

@jpivarski's fix

agoose77 commented

View reviewed changes

src/awkward/operations/ak_unflatten.py

Comment on lines +101 to +103

+                          counts = backend.nplike.to_rectilinear(
+                              ak.operations.fill_none(counts, 0, axis=-1, highlevel=False)
+                          )

Collaborator Author

agoose77 Jan 9, 2023 •

edited

Loading

This replaces the use of to_numpy, meaning we won't incur a needless device transfer under CUDA (IIRC).

agoose77 commented

View reviewed changes

src/awkward/operations/ak_unflatten.py

Comment on lines +120 to +122

+                      current_offsets = backend.index_nplike.empty(len(counts) + 1, np.int64)
+                      current_offsets[0] = 0
+                      backend.index_nplike.cumsum(counts, out=current_offsets[1:])

Collaborator Author

agoose77 Jan 9, 2023

Use nonlocal for readability (nonlocal appears later)

agoose77 commented

View reviewed changes

src/awkward/operations/ak_unflatten.py Outdated

-                              return layout
-                      out = transform(layout, depth=1, axis=axis)
+                      out = ak._do.recursively_apply(apply, layout)

Collaborator Author

agoose77 Jan 9, 2023 •

edited

Loading

Fix recursion. In v1, we had a transform_child_layouts that basically does what continuation does now (apart from the fact that it accepted a new transform function). Here, we restore the recursive behaviour


          fix: recursively_apply takes layout first

1500d17

codecov bot commented Jan 10, 2023

Codecov Report

Merging #2097 (1500d17) into main (42404f2) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted Files	Coverage Δ
src/awkward/_util.py	`81.64% <100.00%> (ø)`
src/awkward/operations/ak_unflatten.py	`95.31% <100.00%> (+1.28%)`	⬆️

agoose77 temporarily deployed to docs-preview

January 10, 2023 00:10

— with

GitHub Actions Inactive

jpivarski approved these changes

View reviewed changes

Member

jpivarski left a comment

This is interesting: using to_rectilinear instead of project. Since the counts needs to be rectilinear (and not necessarily NumPy), that makes sense.

Nice work!

agoose77 merged commit f88fa8e into main

agoose77 deleted the agoose77/fix-unflatten branch

January 10, 2023 08:08

agoose77 mentioned this pull request

unflatten raises counts must be an integer or a one-dimensional array of integers on a 1D awkward-array (worked in 1.x) #2071

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment