Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add enforce_concatenated_form #2860

Merged
merged 10 commits into from
Dec 5, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Nov 30, 2023

Tip

This PR makes the following changes:

  1. Merging categoricals is an error, see Categoricals should merge subject to particular rules #2853
  2. _mergemany no longer enforces a backend
  3. More bugs in merging are fixed
  4. New ak.operations.ak_concatenate.enforce_concatenated_form for enforcing a concatenated form.

dask-awkward needs to be able to compute a concatenation operation but without actually concatenating buffers. @jpivarski and I discussed the various ways we could go about this:

  1. Concatenate some typetracer arrays and use enforce_type to produce partitions of the correct type.
  2. Concatenate each partition with a length-zero array of the final type.
  3. Concatenate some typetracer arrays and introduce a new enforce_form to produce partitions of the correct form.
  4. Add a special path to _mergemany that doesn't touch buffers if they have known length-zero.

Each of these solutions has drawbacks. In particular, modifying existing code to satisfy two constraints (perform an optimal merge in convenient amounts of code, and have a length-zero path) is more difficult than having a dedicated function that only handles the "don't touch buffers" case. As such, this PR instead adds a new function that acts like enforce_form but under the conditions that the form was build from ak.concatenate((layout, ...), axis=0) where layout is the content being enforced to form. This allows us to make many simplifying assumptions according to the implementation of _mergemany.

@agoose77 agoose77 force-pushed the agoose77/feat-enforce-mergeable-form branch from 248444a to 99f61b9 Compare November 30, 2023 13:50
Copy link

codecov bot commented Nov 30, 2023

Codecov Report

Merging #2860 (6cab4b0) into main (123fa09) will increase coverage by 0.05%.
The diff coverage is 77.88%.

Additional details and impacted files
Files Coverage Δ
src/awkward/contents/bitmaskedarray.py 70.81% <100.00%> (+1.53%) ⬆️
src/awkward/contents/bytemaskedarray.py 89.75% <100.00%> (+0.73%) ⬆️
src/awkward/contents/content.py 74.81% <100.00%> (-0.28%) ⬇️
src/awkward/contents/listoffsetarray.py 82.86% <100.00%> (+0.13%) ⬆️
src/awkward/contents/numpyarray.py 91.18% <100.00%> (ø)
src/awkward/contents/regulararray.py 87.43% <100.00%> (+0.18%) ⬆️
src/awkward/contents/unmaskedarray.py 74.60% <100.00%> (+0.39%) ⬆️
src/awkward/contents/listarray.py 91.78% <83.33%> (+1.06%) ⬆️
src/awkward/contents/unionarray.py 85.22% <75.00%> (-0.25%) ⬇️
src/awkward/contents/indexedarray.py 80.95% <82.35%> (+2.05%) ⬆️
... and 2 more

... and 1 file with indirect coverage changes

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I noticed all of the typetracer tests that were added to tests/test_0449_merge_many_arrays_in_one_pass.py, too.

We talked about this in our meeting, and the code looks good to me, so this is ready to merge.

src/awkward/contents/content.py Outdated Show resolved Hide resolved
src/awkward/contents/indexedarray.py Show resolved Hide resolved
src/awkward/contents/indexedoptionarray.py Show resolved Hide resolved
Co-authored-by: Jim Pivarski <jpivarski@users.noreply.github.com>
@agoose77 agoose77 merged commit b2cb026 into main Dec 5, 2023
38 checks passed
@agoose77 agoose77 deleted the agoose77/feat-enforce-mergeable-form branch December 5, 2023 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants