-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: cleanup reducer #1365
Conversation
Looking good so far. A definition of On naming things, "reducers" and "rearrangers"? "Rearrange" sounds like it's only permutations/sorting, which cumsum is not, but it's in the same spirit. |
Codecov Report
|
Weird, I wonder why we can't see it from the UI? It's here: https://github.com/scikit-hep/awkward-1.0/blob/bf25bee5a2e8b66c41cfb7a5d8b9681acd5c5fdd/src/awkward/_v2/contents/indexedoptionarray.py#L1172-L1214 |
At this stage, I'm mainly just pulling things here and there to see what's fundamental vs a variation. It's tricky to fully refactor this because a lot of the SLOC are just kernel calls which are bulky but "primitive" |
I like this name! |
This must have been mistyped during v1-v2 :)
AFAICT this is *just* an optimisation (it won't functionally break anything to remove this).
OK, I've addressed I have not replicated the original logic. It looks like we had some branches that would never be visited (e.g., anything in the rearranger routines would only a non- |
This was added in the Python refactor, but doesn't exist in the C++ impl, and I can't see why we added it.
Add docs for reduction pathways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of changes, and it's hard to tell from this altitude that they're right.
This is one of those times when you wish you had 100% test coverage. :(
I scanned through it a second time, more carefully, and noted the things that bothered me most, but in each case, I could convince myself that what you've written is correct.
Thanks a lot—these are some deep and detailed edits!
if isinstance(out, ak._v2.contents.RegularArray): | ||
out = out.toListOffsetArray64(True) | ||
|
||
elif isinstance(out, ak._v2.contents.ListOffsetArray): | ||
# If the result of `_reduce_next` is a list, and we're not applying at this | ||
# depth, then it will have offsets given by the boundaries in parents. | ||
# This means that we need to look at the _contents_ to which the `outindex` | ||
# belongs to add the new index | ||
if isinstance(out, ak._v2.contents.ListOffsetArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you sure that this needs to be an if
and not an elif
?
We have no tests that satisfy the if isinstance(out, ak._v2.contents.RegularArray)
predicate. (I put an Exception in there and ran the tests: none of them failed.)
Aha: it's what happened in v1:
Although it would be better to have tests, conformance with v1 is a winning argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's what we have in IndexedOptionArray, and that's very similar to the IndexedArray case. In fact, the IndexedArray might be superfluous, since they had to share an implementation in v1, when IndexedOptionArray was a templated variation on IndexedArray. So this might be a historical artifact.
if isinstance(unique, ak._v2.contents.RegularArray): | ||
unique = unique.toListOffsetArray64(True) | ||
|
||
elif isinstance(unique, ak._v2.contents.ListOffsetArray): | ||
if isinstance(unique, ak._v2.contents.ListOffsetArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question about this as the previous one, but I can see that they're symmetric bits of code.
nextshifts = ak._v2.index.Index64.empty(nextlen, self._nplike) | ||
nummissing = ak._v2.index.Index64.empty(maxcount[0], self._nplike) | ||
nextshifts = ak._v2.index.Index64.empty(nextcarry.length, self._nplike) | ||
nummissing = ak._v2.index.Index64.empty(maxcount, self._nplike) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about how maxcount
lost its [0]
, but now I see that _rearrange_prepare_next
returns the scalar value, rather than the one-element array.
This might come up again if maxcount
is ever a TypeTracerArray. But the beauty of this solution is that we'd only have to fix it in one place (in the _rearrange_prepare_next
function) if we do have to change it.
Thanks Jim. I noticed a couple of docs typos while reading the diff again (it's amazing what coming back to something can do for perspective), but I can fix those up in a later PR. I agree that this is not something that we can grep by the diffs alone. I did make some (albeit reasoned) big changes here, so I hope we've not introduced any new bugs. Still, the tests passing suggests we might be OK, and we can be suspicious of this PR as a default position if anything does come up! |
ListOffsetArray
and theIndexed[Option]Array
types have a non-negligible amount of boilerplate from v1 that we can eliminate. In addition, there is a reasonable overlap between thereduce
,[arg]sort
, andunique
pathways that we can move into preparatory functions.I've changed the logic here quite substantially (in the case of
IndexedOptionArray
), so it would be good to validate that the test suite is covering these changes.