fix: don't project records during broadcasting; push index down #2524
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
broadcast_and_apply
logic currentlyproject()
sIndexedArray
s during recursion. This is more expensive than necessary; caller-provided callback may exit early, avoiding the need for a projection. Instead, we can just push the indexed node into the contents of theRecordArray
.For
dask-awkward
the existing behaviour has more serious consequence; a much greater number of contents are touched during typetracer time. This PR changes the best-cast behavior of operations that encounter anIndexedArray
ofRecordArray
during broadcasting (e.g.with_field
after an advanced index) from touching everything in the record to touching only the offsets above the record. In practice, the presence of option-types within the record array will lead to additional touching of those buffers.