Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: unflatten should accept non-packed counts #2097

Merged
merged 5 commits into from
Jan 10, 2023
Merged

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Jan 9, 2023

Comment on lines +904 to +906
numpy.array_equal(left.starts, right.starts)
and numpy.array_equal(left.stops, right.stops)
and visitor(left.content, right.content)
Copy link
Collaborator Author

@agoose77 agoose77 Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd thought that offsets was the shared interface (and it would not work well for ListArray), but of course that's nonsense; it's starts/stops. This change to the test helper fixes it.

Comment on lines +87 to +89
layout = ak.operations.to_layout(
array, allow_record=False, allow_other=False
).to_packed()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately we need to know that at any depth, the counts computed from our externally-visible structure aligns with the current content. Therefore, we need most of the guarantees of to_packed(). This could be done in the recursion if we wanted better performance for the non axis=-1 cases, because we don't need the leaf to be contiguous, or indeed care about any nodes after the unflatten depth.

Comment on lines +96 to +98
counts = ak.operations.to_layout(
counts, allow_record=False, allow_other=False
).to_packed()
Copy link
Collaborator Author

@agoose77 agoose77 Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counts should be a trivial layout, so this is reasonable to pack (we don't need contiguousness, but we do need a non-indexed layout).

counts = ak.operations.to_layout(
counts, allow_record=False, allow_other=False
).to_packed()
if counts.is_option and (counts.content.is_numpy or counts.content.is_unknown):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski's fix

Comment on lines +101 to +103
counts = backend.nplike.to_rectilinear(
ak.operations.fill_none(counts, 0, axis=-1, highlevel=False)
)
Copy link
Collaborator Author

@agoose77 agoose77 Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the use of to_numpy, meaning we won't incur a needless device transfer under CUDA (IIRC).

Comment on lines +120 to +122
current_offsets = backend.index_nplike.empty(len(counts) + 1, np.int64)
current_offsets[0] = 0
backend.index_nplike.cumsum(counts, out=current_offsets[1:])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use nonlocal for readability (nonlocal appears later)

return layout

out = transform(layout, depth=1, axis=axis)
out = ak._do.recursively_apply(apply, layout)
Copy link
Collaborator Author

@agoose77 agoose77 Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix recursion. In v1, we had a transform_child_layouts that basically does what continuation does now (apart from the fact that it accepted a new transform function). Here, we restore the recursive behaviour

@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Merging #2097 (1500d17) into main (42404f2) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_util.py 81.64% <100.00%> (ø)
src/awkward/operations/ak_unflatten.py 95.31% <100.00%> (+1.28%) ⬆️

@agoose77 agoose77 temporarily deployed to docs-preview January 10, 2023 00:10 — with GitHub Actions Inactive
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting: using to_rectilinear instead of project. Since the counts needs to be rectilinear (and not necessarily NumPy), that makes sense.

Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.unflatten error message doesn't indicate axis out of boudns
2 participants