Refactor: cleanup reducer #1365

agoose77 · 2022-03-10T18:35:16Z

ListOffsetArray and the Indexed[Option]Array types have a non-negligible amount of boilerplate from v1 that we can eliminate. In addition, there is a reasonable overlap between the reduce, [arg]sort, and unique pathways that we can move into preparatory functions.

I've changed the logic here quite substantially (in the case of IndexedOptionArray), so it would be good to validate that the test suite is covering these changes.

jpivarski · 2022-03-10T18:44:38Z

Looking good so far. A definition of _prepare_next wasn't included (so I don't see how the tests could pass). Should the name of that helper function be so generic? (I saw what you said on Gitter about naming things!) Is it just for reducers and operations that handle axis in the same way as reducers but don't change the number of dimensions (e.g. sort, cumsum, ...)?

On naming things, "reducers" and "rearrangers"? "Rearrange" sounds like it's only permutations/sorting, which cumsum is not, but it's in the same spirit.

codecov · 2022-03-10T19:03:01Z

Codecov Report

Merging #1365 (9d289b8) into main (b2fd2be) will decrease coverage by 0.73%.
The diff coverage is 52.51%.

Impacted Files	Coverage Δ
src/awkward/_v2/_connect/cling.py	`0.00% <0.00%> (ø)`
src/awkward/_v2/_lookup.py	`97.50% <0.00%> (ø)`
src/awkward/_v2/_prettyprint.py	`66.09% <0.00%> (+2.29%)`	⬆️
src/awkward/_v2/_typetracer.py	`69.14% <0.00%> (ø)`
src/awkward/_v2/identifier.py	`55.69% <0.00%> (ø)`
src/awkward/_v2/operations/convert/ak_from_jax.py	`75.00% <0.00%> (ø)`
src/awkward/_v2/operations/convert/ak_to_jax.py	`75.00% <0.00%> (ø)`
src/awkward/_v2/operations/io/ak_from_parquet.py	`75.00% <0.00%> (ø)`
src/awkward/_v2/operations/io/ak_to_parquet.py	`75.00% <0.00%> (ø)`
src/awkward/_v2/operations/structure/ak_firsts.py	`75.00% <0.00%> (ø)`
... and 144 more

agoose77 · 2022-03-10T19:30:13Z

Looking good so far. A definition of _prepare_next wasn't included

Weird, I wonder why we can't see it from the UI? It's here: https://github.com/scikit-hep/awkward-1.0/blob/bf25bee5a2e8b66c41cfb7a5d8b9681acd5c5fdd/src/awkward/_v2/contents/indexedoptionarray.py#L1172-L1214

agoose77 · 2022-03-10T19:31:03Z

At this stage, I'm mainly just pulling things here and there to see what's fundamental vs a variation. It's tricky to fully refactor this because a lot of the SLOC are just kernel calls which are bulky but "primitive"

agoose77 · 2022-03-10T19:39:22Z

naming things, "reducers" and "rearrangers"? "Rearrange" sounds like it's only permutations/sorting, which cumsum is not, but it's in the same spirit.

I like this name!

This must have been mistyped during v1-v2 :)

AFAICT this is *just* an optimisation (it won't functionally break anything to remove this).

agoose77 · 2022-03-12T18:23:52Z

OK, I've addressed IndexedOptionArray. I think we can really only factor out the preparatory logic into its own function. The rest of the routines are either sufficiently bespoke or terse that it doesn't add much to make a new function.

I have not replicated the original logic. It looks like we had some branches that would never be visited (e.g., anything in the rearranger routines would only a non-IndexedOptionArray if depth != 1, because we explicitly have out = IndexedOptionArray(...)). These changes pass the test suite, but reducer logic is non-trivial, so I would not be surprised if somewhere I've missed something. Now that I understand this logic a bit better, I think it should be easier to extend to other layout types.

This was added in the Python refactor, but doesn't exist in the C++ impl, and I can't see why we added it.

Add docs for reduction pathways.

jpivarski

That's a lot of changes, and it's hard to tell from this altitude that they're right.

This is one of those times when you wish you had 100% test coverage. :(

I scanned through it a second time, more carefully, and noted the things that bothered me most, but in each case, I could convince myself that what you've written is correct.

Thanks a lot—these are some deep and detailed edits!

jpivarski · 2022-03-17T19:18:24Z

src/awkward/_v2/contents/indexedarray.py

            if isinstance(out, ak._v2.contents.RegularArray):
                out = out.toListOffsetArray64(True)

-            elif isinstance(out, ak._v2.contents.ListOffsetArray):
+            # If the result of `_reduce_next` is a list, and we're not applying at this
+            # depth, then it will have offsets given by the boundaries in parents.
+            # This means that we need to look at the _contents_ to which the `outindex`
+            # belongs to add the new index
+            if isinstance(out, ak._v2.contents.ListOffsetArray):


How are you sure that this needs to be an if and not an elif?

We have no tests that satisfy the if isinstance(out, ak._v2.contents.RegularArray) predicate. (I put an Exception in there and ran the tests: none of them failed.)

Aha: it's what happened in v1:

https://github.com/scikit-hep/awkward-1.0/blob/34094aa26e498210047428b2ecde6d0b1aa1fc7f/src/libawkward/array/IndexedArray.cpp#L2260-L2265

Although it would be better to have tests, conformance with v1 is a winning argument.

Also, it's what we have in IndexedOptionArray, and that's very similar to the IndexedArray case. In fact, the IndexedArray might be superfluous, since they had to share an implementation in v1, when IndexedOptionArray was a templated variation on IndexedArray. So this might be a historical artifact.

jpivarski · 2022-03-17T19:19:42Z

src/awkward/_v2/contents/indexedarray.py

            if isinstance(unique, ak._v2.contents.RegularArray):
                unique = unique.toListOffsetArray64(True)

-            elif isinstance(unique, ak._v2.contents.ListOffsetArray):
+            if isinstance(unique, ak._v2.contents.ListOffsetArray):


I had the same question about this as the previous one, but I can see that they're symmetric bits of code.

jpivarski · 2022-03-17T19:25:32Z

src/awkward/_v2/contents/listoffsetarray.py

-                nextshifts = ak._v2.index.Index64.empty(nextlen, self._nplike)
-                nummissing = ak._v2.index.Index64.empty(maxcount[0], self._nplike)
+                nextshifts = ak._v2.index.Index64.empty(nextcarry.length, self._nplike)
+                nummissing = ak._v2.index.Index64.empty(maxcount, self._nplike)


I was wondering about how maxcount lost its [0], but now I see that _rearrange_prepare_next returns the scalar value, rather than the one-element array.

This might come up again if maxcount is ever a TypeTracerArray. But the beauty of this solution is that we'd only have to fix it in one place (in the _rearrange_prepare_next function) if we do have to change it.

agoose77 · 2022-03-17T21:24:54Z

That's a lot of changes, and it's hard to tell from this altitude that they're right.

This is one of those times when you wish you had 100% test coverage. :(

I scanned through it a second time, more carefully, and noted the things that bothered me most, but in each case, I could convince myself that what you've written is correct.

Thanks a lot—these are some deep and detailed edits!

Thanks Jim. I noticed a couple of docs typos while reading the diff again (it's amazing what coming back to something can do for perspective), but I can fix those up in a later PR.

I agree that this is not something that we can grep by the diffs alone. I did make some (albeit reasoned) big changes here, so I hope we've not introduced any new bugs. Still, the tests passing suggests we might be OK, and we can be suspicious of this PR as a default position if anything does come up!

agoose77 added 5 commits March 9, 2022 19:25

Fix: fix docstring line

c562ce5

Merge remote-tracking branch 'origin/main'

c2ddda2

Merge remote-tracking branch 'origin/main'

dac849a

Merge remote-tracking branch 'origin/main'

418d655

WIP: move common routines into their own methods

bf25bee

Refactor: add _prepare_nextshifts

ac6a1ba

agoose77 added 18 commits March 10, 2022 20:09

Refactor: rename _prepare_next

5836739

Refactor: remove parents_length

3a63f39

Fix: check correct content for mergeable

d2480af

This must have been mistyped during v1-v2 :)

Docs: explain null_merged

cb38e96

Refactor: move asserts to first instantiation of index

0dfde85

Fix: don't reference undefined local

503540c

Chore: remove incorrect comment

78fe120

Docs: update comment

d888109

Docs: more work explaining the merge step

3b44761

Refactor: move _rearrange methods above argsort

d0c27cb

Refactor: move assertions + definitions

ede8087

Refactor: clarify logic around inject_nones

cb3758c

Refactor: further clarify logic around inject_nones

b9c297d

Refactor: further clarify logic around inject_nones

c2673e3

Refactor: don't test nonnull in argsort for inject_nones

aa19b18

AFAICT this is *just* an optimisation (it won't functionally break anything to remove this).

Refactor: simplify return code-paths for rearranger & reducers

1d2614c

Fix: return in else case

94e85a5

Refactor: change naming / branch order

b22388c

agoose77 added 4 commits March 12, 2022 18:33

Refactor: restore error message, and drop test for IndexedOptionArray

caff519

This was added in the Python refactor, but doesn't exist in the C++ impl, and I can't see why we added it.

Fix: use if for non exclusive branch

9abcd8f

Add docs for reduction pathways.

Fix: change elif to if for non exclusive case

e2696f3

Refactor: move preparation code into _rearrange_prepare_next

72d787b

agoose77 marked this pull request as ready for review March 12, 2022 19:55

Refactor: restore numnull optimisation & simplify return switch

673a0c2

agoose77 mentioned this pull request Mar 12, 2022

Feat: add cumsum #1362

Closed

agoose77 requested a review from jpivarski March 17, 2022 15:37

jpivarski approved these changes Mar 17, 2022

View reviewed changes

Merge branch 'main' into refactor-cleanup-reducer

9d289b8

jpivarski enabled auto-merge (squash) March 17, 2022 19:29

jpivarski merged commit 8feb7a7 into main Mar 17, 2022

jpivarski deleted the refactor-cleanup-reducer branch March 17, 2022 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: cleanup reducer #1365

Refactor: cleanup reducer #1365

agoose77 commented Mar 10, 2022 •

edited

Loading

jpivarski commented Mar 10, 2022

codecov bot commented Mar 10, 2022 •

edited

Loading

agoose77 commented Mar 10, 2022

agoose77 commented Mar 10, 2022

agoose77 commented Mar 10, 2022

agoose77 commented Mar 12, 2022

jpivarski left a comment

jpivarski Mar 17, 2022

jpivarski Mar 17, 2022

jpivarski Mar 17, 2022 •

edited

Loading

jpivarski Mar 17, 2022

agoose77 commented Mar 17, 2022

Refactor: cleanup reducer #1365

Refactor: cleanup reducer #1365

Conversation

agoose77 commented Mar 10, 2022 • edited Loading

jpivarski commented Mar 10, 2022

codecov bot commented Mar 10, 2022 • edited Loading

Codecov Report

agoose77 commented Mar 10, 2022

agoose77 commented Mar 10, 2022

agoose77 commented Mar 10, 2022

agoose77 commented Mar 12, 2022

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski Mar 17, 2022

Choose a reason for hiding this comment

jpivarski Mar 17, 2022

Choose a reason for hiding this comment

jpivarski Mar 17, 2022 • edited Loading

Choose a reason for hiding this comment

jpivarski Mar 17, 2022

Choose a reason for hiding this comment

agoose77 commented Mar 17, 2022

agoose77 commented Mar 10, 2022 •

edited

Loading

codecov bot commented Mar 10, 2022 •

edited

Loading

jpivarski Mar 17, 2022 •

edited

Loading