Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: cleanup reducer #1365

Merged
merged 30 commits into from
Mar 17, 2022
Merged

Refactor: cleanup reducer #1365

merged 30 commits into from
Mar 17, 2022

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Mar 10, 2022

ListOffsetArray and the Indexed[Option]Array types have a non-negligible amount of boilerplate from v1 that we can eliminate. In addition, there is a reasonable overlap between the reduce, [arg]sort, and unique pathways that we can move into preparatory functions.

I've changed the logic here quite substantially (in the case of IndexedOptionArray), so it would be good to validate that the test suite is covering these changes.

@jpivarski
Copy link
Member

Looking good so far. A definition of _prepare_next wasn't included (so I don't see how the tests could pass). Should the name of that helper function be so generic? (I saw what you said on Gitter about naming things!) Is it just for reducers and operations that handle axis in the same way as reducers but don't change the number of dimensions (e.g. sort, cumsum, ...)?

On naming things, "reducers" and "rearrangers"? "Rearrange" sounds like it's only permutations/sorting, which cumsum is not, but it's in the same spirit.

@codecov
Copy link

codecov bot commented Mar 10, 2022

Codecov Report

Merging #1365 (9d289b8) into main (b2fd2be) will decrease coverage by 0.73%.
The diff coverage is 52.51%.

Impacted Files Coverage Δ
src/awkward/_v2/_connect/cling.py 0.00% <0.00%> (ø)
src/awkward/_v2/_lookup.py 97.50% <0.00%> (ø)
src/awkward/_v2/_prettyprint.py 66.09% <0.00%> (+2.29%) ⬆️
src/awkward/_v2/_typetracer.py 69.14% <0.00%> (ø)
src/awkward/_v2/identifier.py 55.69% <0.00%> (ø)
src/awkward/_v2/operations/convert/ak_from_jax.py 75.00% <0.00%> (ø)
src/awkward/_v2/operations/convert/ak_to_jax.py 75.00% <0.00%> (ø)
src/awkward/_v2/operations/io/ak_from_parquet.py 75.00% <0.00%> (ø)
src/awkward/_v2/operations/io/ak_to_parquet.py 75.00% <0.00%> (ø)
src/awkward/_v2/operations/structure/ak_firsts.py 75.00% <0.00%> (ø)
... and 144 more

@agoose77
Copy link
Collaborator Author

Looking good so far. A definition of _prepare_next wasn't included

Weird, I wonder why we can't see it from the UI? It's here: https://github.com/scikit-hep/awkward-1.0/blob/bf25bee5a2e8b66c41cfb7a5d8b9681acd5c5fdd/src/awkward/_v2/contents/indexedoptionarray.py#L1172-L1214

@agoose77
Copy link
Collaborator Author

At this stage, I'm mainly just pulling things here and there to see what's fundamental vs a variation. It's tricky to fully refactor this because a lot of the SLOC are just kernel calls which are bulky but "primitive"

@agoose77
Copy link
Collaborator Author

naming things, "reducers" and "rearrangers"? "Rearrange" sounds like it's only permutations/sorting, which cumsum is not, but it's in the same spirit.

I like this name!

@agoose77
Copy link
Collaborator Author

OK, I've addressed IndexedOptionArray. I think we can really only factor out the preparatory logic into its own function. The rest of the routines are either sufficiently bespoke or terse that it doesn't add much to make a new function.

I have not replicated the original logic. It looks like we had some branches that would never be visited (e.g., anything in the rearranger routines would only a non-IndexedOptionArray if depth != 1, because we explicitly have out = IndexedOptionArray(...)). These changes pass the test suite, but reducer logic is non-trivial, so I would not be surprised if somewhere I've missed something. Now that I understand this logic a bit better, I think it should be easier to extend to other layout types.

@agoose77 agoose77 marked this pull request as ready for review March 12, 2022 19:55
@agoose77 agoose77 mentioned this pull request Mar 12, 2022
@agoose77 agoose77 requested a review from jpivarski March 17, 2022 15:37
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of changes, and it's hard to tell from this altitude that they're right.

This is one of those times when you wish you had 100% test coverage. :(

I scanned through it a second time, more carefully, and noted the things that bothered me most, but in each case, I could convince myself that what you've written is correct.

Thanks a lot—these are some deep and detailed edits!

Comment on lines 1006 to +1013
if isinstance(out, ak._v2.contents.RegularArray):
out = out.toListOffsetArray64(True)

elif isinstance(out, ak._v2.contents.ListOffsetArray):
# If the result of `_reduce_next` is a list, and we're not applying at this
# depth, then it will have offsets given by the boundaries in parents.
# This means that we need to look at the _contents_ to which the `outindex`
# belongs to add the new index
if isinstance(out, ak._v2.contents.ListOffsetArray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you sure that this needs to be an if and not an elif?

We have no tests that satisfy the if isinstance(out, ak._v2.contents.RegularArray) predicate. (I put an Exception in there and ran the tests: none of them failed.)

Aha: it's what happened in v1:

https://github.com/scikit-hep/awkward-1.0/blob/34094aa26e498210047428b2ecde6d0b1aa1fc7f/src/libawkward/array/IndexedArray.cpp#L2260-L2265

Although it would be better to have tests, conformance with v1 is a winning argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's what we have in IndexedOptionArray, and that's very similar to the IndexedArray case. In fact, the IndexedArray might be superfluous, since they had to share an implementation in v1, when IndexedOptionArray was a templated variation on IndexedArray. So this might be a historical artifact.

Comment on lines 821 to +824
if isinstance(unique, ak._v2.contents.RegularArray):
unique = unique.toListOffsetArray64(True)

elif isinstance(unique, ak._v2.contents.ListOffsetArray):
if isinstance(unique, ak._v2.contents.ListOffsetArray):
Copy link
Member

@jpivarski jpivarski Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question about this as the previous one, but I can see that they're symmetric bits of code.

Comment on lines -1788 to +1511
nextshifts = ak._v2.index.Index64.empty(nextlen, self._nplike)
nummissing = ak._v2.index.Index64.empty(maxcount[0], self._nplike)
nextshifts = ak._v2.index.Index64.empty(nextcarry.length, self._nplike)
nummissing = ak._v2.index.Index64.empty(maxcount, self._nplike)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about how maxcount lost its [0], but now I see that _rearrange_prepare_next returns the scalar value, rather than the one-element array.

This might come up again if maxcount is ever a TypeTracerArray. But the beauty of this solution is that we'd only have to fix it in one place (in the _rearrange_prepare_next function) if we do have to change it.

@jpivarski jpivarski enabled auto-merge (squash) March 17, 2022 19:29
@jpivarski jpivarski merged commit 8feb7a7 into main Mar 17, 2022
@jpivarski jpivarski deleted the refactor-cleanup-reducer branch March 17, 2022 20:06
@agoose77
Copy link
Collaborator Author

That's a lot of changes, and it's hard to tell from this altitude that they're right.

This is one of those times when you wish you had 100% test coverage. :(

I scanned through it a second time, more carefully, and noted the things that bothered me most, but in each case, I could convince myself that what you've written is correct.

Thanks a lot—these are some deep and detailed edits!

Thanks Jim. I noticed a couple of docs typos while reading the diff again (it's amazing what coming back to something can do for perspective), but I can fix those up in a later PR.

I agree that this is not something that we can grep by the diffs alone. I did make some (albeit reasoned) big changes here, so I hope we've not introduced any new bugs. Still, the tests passing suggests we might be OK, and we can be suspicious of this PR as a default position if anything does come up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants