Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't project records during broadcasting; push index down #2524

Merged
merged 3 commits into from
Jun 14, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Jun 14, 2023

The broadcast_and_apply logic currently project()s IndexedArrays during recursion. This is more expensive than necessary; caller-provided callback may exit early, avoiding the need for a projection. Instead, we can just push the indexed node into the contents of the RecordArray.

For dask-awkward the existing behaviour has more serious consequence; a much greater number of contents are touched during typetracer time. This PR changes the best-cast behavior of operations that encounter an IndexedArray of RecordArray during broadcasting (e.g. with_field after an advanced index) from touching everything in the record to touching only the offsets above the record. In practice, the presence of option-types within the record array will lead to additional touching of those buffers.

@agoose77 agoose77 temporarily deployed to docs-preview June 14, 2023 16:33 — with GitHub Actions Inactive
@codecov
Copy link

codecov bot commented Jun 14, 2023

Codecov Report

Merging #2524 (f342fb4) into main (1e65496) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_broadcasting.py 93.59% <100.00%> (ø)
src/awkward/contents/indexedarray.py 78.85% <100.00%> (+0.19%) ⬆️

@agoose77 agoose77 temporarily deployed to docs-preview June 14, 2023 16:46 — with GitHub Actions Inactive
@agoose77 agoose77 requested a review from jpivarski June 14, 2023 16:53
@agoose77 agoose77 marked this pull request as ready for review June 14, 2023 16:53
@agoose77 agoose77 temporarily deployed to docs-preview June 14, 2023 17:02 — with GitHub Actions Inactive
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does it! This is the entirety of the 1-point plan (discussed on Slack).

I thought it would have to be another top-level rule beside broadcast_any_indexed, but you're right that it doesn't. Some IndexedArrays may be wrapping RecordArrays and others may not be.

Since this passes the buck to a level that might need to deal with "any indexed" again, it might seem like this could lead to infinite recursion, but it can't: you don't have infinitely many nested RecordArrays to swap them with. At some point, you get to an IndexedArray that is not containing a RecordArray, and at that point, need to project().

So this looks like a complete solution to the problem. We'll see, on a case-by-case basis, if there's any option-type data that is still getting touched because of the IndexedArray.simplified call, but it's at least a step forward, a net reduction in projecting.

@agoose77 agoose77 merged commit 23dc2ae into main Jun 14, 2023
37 checks passed
@agoose77 agoose77 deleted the agoose77/broadcasting-project-record branch June 14, 2023 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants