-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support record reducer overloads #2458
Conversation
885d882
to
2f2f9f7
Compare
This seems simple enough; build a list over the However, doing this with existing highlevel primitives may not be trivial. e.g. an implementation of def _my_record_max(array):
key = array["0"]
j = ak.from_regular(
ak.argmax(key, keepdims=True, mask_identity=False, axis=-1)
)
return array[j][..., 0]
ak.behavior[(ak.max, "my_record")] = _tuples_max This will fail if we have an empty sublist and we don't mask the reducer. One of the constraints that we are operating under is the desire not to have third party code using our The above can be extended to support empty sublists: def _option_is_trivial(array):
any_is_none = ak.any(ak.is_none(array, axis=0), axis=None)
return not (ak.typetracer.is_unknown_scalar(any_is_none) or any_is_none)
def _my_record_max(array):
key = array["0"]
j = ak.from_regular(
ak.argmax(key, keepdims=True, mask_identity=True, axis=-1)
)
out = ak.to_layout(
array[j][..., 0]
)
assert out.is_option
if _option_is_trivial(out):
out = out.to_IndexedOptionArray64()
return ak.contents.IndexedArray(
out.index,
out.content
)
else:
return ak.fill_none(out, identity_element) but where there are any empty sublists, it will involve a copy of I think the proper API therefore includes the def _option_is_trivial(array):
any_is_none = ak.any(ak.is_none(array, axis=0), axis=None)
return not (ak.typetracer.is_unknown_scalar(any_is_none) or any_is_none)
def _my_record_max(array: ak.Array, mask_identity: bool):
key = array["0"]
j = ak.from_regular(ak.argmax(key, keepdims=True, mask_identity=True, axis=-1))
out = ak.to_layout(
array[j][..., 0]
)
if mask_identity:
return out
# Avoid content `_carry` for options with empty masks
elif _option_is_trivial(out):
out = out.to_IndexedOptionArray64()
return ak.contents.IndexedArray(
out.index,
out.content
)
# _carry the content (and fill missing values)
else:
return ak.fill_none(out, identity_element) Meanwhile, def _my_record_argmax(array: ak.Array, mask_identity: bool):
return ak.argmax(array["0"], keepdims=False, mask_identity=False, axis=-1) |
Codecov Report
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good argument for the API requiring users to write functions that include a mask_identity
argument. (Since these are functions users write, it's much harder to add required arguments later!) I don't suppose it needs a keepdims
argument, since keeping a dimension should always be done in a regular way—behavior developers shouldn't have control over that.
In principle, the behavior developers might want to raise an exception on mask_identity=False
.
Although you have unit tests for the sum-of-vectors case, it could be good to test-implement it in Vector (here) to be sure that the API is what we need. (Notice all of the __numba_typer__
/__numba_lower__
behaviors defined in this file: the API in Awkward and its use in Vector were developed in tandem.)
I agree. In truth, I'm currently thinking about how vector should support this - the dispatch machinery is fairly long. I suspect it will just require simply writing the same code! |
@jpivarski inspired by the existing (and more complex) binary addition, I took a stab at sum here. For vector, I can't initially think of any reducers that we'd want to implement that do not have identities. Can you think of any? In the event that there are none, the idea here is that we can re-use the sum implementation for numpy-backed vectors too (by |
My implementation here implicitly assumes that records are atoms, i.e. one cannot reduce deeper than the record itself. This was motivated by |
In Vector (and elsewhere), the only reducer I can think of wanting to override at all is Two reducers that can be implemented for Vectors:
That's right. We can enshrine that as a rule. |
1914bdc
to
b454555
Compare
def _apply_record_reducer( | ||
reducer, layout: RecordArray, mask: bool, offsets: ak.index.Index, behavior | ||
) -> Content: | ||
# Build a 1D list over these contents | ||
array = wrap_layout(ak.contents.ListOffsetArray(offsets, layout), behavior=behavior) | ||
# Perform the reduction | ||
return ak.to_layout(reducer(array, mask)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't yet know where to put this. It feels a bit odd creating a highlevel array inside code written in a Content
class definition, so I've moved it to a top-level function, much like the other reducer overloads are defined.
Fixes #1423
ak.behavior
documentation to use MyST Markdown