feat: support record reducer overloads #2458

agoose77 · 2023-05-12T13:56:31Z

Implement non-positional reducer support
Implement positional reducer support
Changed the ak.behavior documentation to use MyST Markdown

agoose77 · 2023-05-12T15:48:47Z

This seems simple enough; build a list over the RecordArray which the custom reducer reduces along axis=-1. However, the open question is how to handle masking/identity. The simplest interface is to always require the custom reducer to include an identity element for empty sublists. This can then be masked out by the dispatcher. We currently do this for our normal reduction pathways.

However, doing this with existing highlevel primitives may not be trivial. e.g. an implementation of ak.max for a record structure, which looks at the first slot:

def _my_record_max(array):
    key = array["0"]
    j = ak.from_regular(
        ak.argmax(key, keepdims=True, mask_identity=False, axis=-1)
    )
    return array[j][..., 0]


ak.behavior[(ak.max, "my_record")] = _tuples_max

This will fail if we have an empty sublist and we don't mask the reducer.

One of the constraints that we are operating under is the desire not to have third party code using our nplike abstraction. Whilst for NumPy, CuPy, and JAX that should be possible (just type-check the buffers), typetracer would prove more fiddly (incidentally, this does add a vote in favour of making our typetracer nplike L2).

The above can be extended to support empty sublists:

def _option_is_trivial(array):
    any_is_none = ak.any(ak.is_none(array, axis=0), axis=None)
    return not (ak.typetracer.is_unknown_scalar(any_is_none) or any_is_none)


def _my_record_max(array):
    key = array["0"]
    j = ak.from_regular(
        ak.argmax(key, keepdims=True, mask_identity=True, axis=-1)
    )
    out = ak.to_layout(
        array[j][..., 0]
    )
    assert out.is_option
    
    if _option_is_trivial(out):
        out = out.to_IndexedOptionArray64()
        return ak.contents.IndexedArray(
            out.index,
            out.content
        )
    else:
        return ak.fill_none(out, identity_element)

but where there are any empty sublists, it will involve a copy of out in ak.fill_none even if the output is subsequently masked.

I think the proper API therefore includes the mask_identity flag. Here's an attempt to write a simple max that doesn't copy the RecordArray unless it has no choice, without using nplike.

def _option_is_trivial(array):
    any_is_none = ak.any(ak.is_none(array, axis=0), axis=None)
    return not (ak.typetracer.is_unknown_scalar(any_is_none) or any_is_none)


def _my_record_max(array: ak.Array, mask_identity: bool):
    key = array["0"]
    j = ak.from_regular(ak.argmax(key, keepdims=True, mask_identity=True, axis=-1))
    out = ak.to_layout(
        array[j][..., 0]
    )

    if mask_identity:
        return out
    # Avoid content `_carry` for options with empty masks
    elif _option_is_trivial(out):
        out = out.to_IndexedOptionArray64()
        return ak.contents.IndexedArray(
            out.index,
            out.content
        )
    # _carry the content (and fill missing values)
    else:
        return ak.fill_none(out, identity_element)

Meanwhile, argmax is trivial!

def _my_record_argmax(array: ak.Array, mask_identity: bool):
    return ak.argmax(array["0"], keepdims=False, mask_identity=False, axis=-1)

codecov · 2023-05-14T16:38:08Z

Codecov Report

Merging #2458 (872de48) into main (4468cf5) will decrease coverage by 0.01%.
The diff coverage is 75.86%.

Additional details and impacted files

Impacted Files	Coverage Δ
src/awkward/operations/ak_with_name.py	`100.00% <ø> (ø)`
src/awkward/_do.py	`83.22% <50.00%> (ø)`
src/awkward/contents/recordarray.py	`84.68% <77.77%> (-0.44%)`	⬇️

jpivarski

This is a good argument for the API requiring users to write functions that include a mask_identity argument. (Since these are functions users write, it's much harder to add required arguments later!) I don't suppose it needs a keepdims argument, since keeping a dimension should always be done in a regular way—behavior developers shouldn't have control over that.

In principle, the behavior developers might want to raise an exception on mask_identity=False.

Although you have unit tests for the sum-of-vectors case, it could be good to test-implement it in Vector (here) to be sure that the API is what we need. (Notice all of the __numba_typer__/__numba_lower__ behaviors defined in this file: the API in Awkward and its use in Vector were developed in tandem.)

agoose77 · 2023-05-15T17:23:20Z

I agree. In truth, I'm currently thinking about how vector should support this - the dispatch machinery is fairly long. I suspect it will just require simply writing the same code!

agoose77 · 2023-05-15T22:04:54Z

@jpivarski inspired by the existing (and more complex) binary addition, I took a stab at sum here.

For vector, I can't initially think of any reducers that we'd want to implement that do not have identities. Can you think of any?

In the event that there are none, the idea here is that we can re-use the sum implementation for numpy-backed vectors too (by moveaxis, reduce, moveaxis, to prepare the array for reduction)

agoose77 · 2023-05-15T22:20:04Z

My implementation here implicitly assumes that records are atoms, i.e. one cannot reduce deeper than the record itself. This was motivated by RecordArray.purelist_depth which returns 1, and justifies not passing axis to the custom reducer. I believe this holds, but weigh in if you think that's a misstep.

jpivarski · 2023-05-15T23:34:50Z

For vector, I can't initially think of any reducers that we'd want to implement that do not have identities. Can you think of any?

In Vector (and elsewhere), the only reducer I can think of wanting to override at all is sum. The any/all pair applies to booleans, which are not geometrical vectors, the min/max pair applies to objects with an ordering, and $n>1$-dimensional vectors do not have an ordering; same for argmin/argmax. I can't even see a reason to have prod (scalar product is not reducible with more than two arguments; vector product only has a definition for 3D vectors; and neither of these should be confused with *, which only combines a scalar with a vector).

Two reducers that can be implemented for Vectors: count and count_nonzero (where "nonzero" means "nonzero magnitude"). They have identities, though.

My implementation here implicitly assumes that records are atoms, i.e. one cannot reduce deeper than the record itself.

That's right. We can enshrine that as a rule.

agoose77 · 2023-05-19T11:52:12Z

src/awkward/contents/recordarray.py

+def _apply_record_reducer(
+    reducer, layout: RecordArray, mask: bool, offsets: ak.index.Index, behavior
+) -> Content:
+    # Build a 1D list over these contents
+    array = wrap_layout(ak.contents.ListOffsetArray(offsets, layout), behavior=behavior)
+    # Perform the reduction
+    return ak.to_layout(reducer(array, mask))


I don't yet know where to put this. It feels a bit odd creating a highlevel array inside code written in a Content class definition, so I've moved it to a top-level function, much like the other reducer overloads are defined.

agoose77 force-pushed the agoose77/feat-record-reducer branch from 885d882 to 2f2f9f7 Compare May 12, 2023 13:57

agoose77 temporarily deployed to docs-preview May 12, 2023 14:10 — with GitHub Actions Inactive

agoose77 temporarily deployed to docs-preview May 12, 2023 16:04 — with GitHub Actions Inactive

agoose77 changed the title ~~wip: support record reducer overloads~~ fea: support record reducer overloads May 14, 2023

agoose77 changed the title ~~fea: support record reducer overloads~~ feat: support record reducer overloads May 14, 2023

agoose77 temporarily deployed to docs-preview May 14, 2023 16:43 — with GitHub Actions Inactive

agoose77 temporarily deployed to docs-preview May 14, 2023 17:01 — with GitHub Actions Inactive

agoose77 marked this pull request as ready for review May 15, 2023 08:49

agoose77 requested a review from jpivarski May 15, 2023 08:49

agoose77 temporarily deployed to docs-preview May 15, 2023 09:41 — with GitHub Actions Inactive

jpivarski approved these changes May 15, 2023

View reviewed changes

agoose77 added 10 commits May 16, 2023 10:54

wip: allow reduction of records

e47f422

fix: use _fields

3d8f2e5

feat: expose mask flag to record reducer

3740c43

feat: support positional reducers

55ee281

test: ensure overloads work

b6f7707

docs: add example for reducer overload

25d8a62

docs: fix formatting, toctree

6127682

docs: wip

21b75b2

docs: add notes about reducer

2a3e798

docs: rework docstring

b454555

agoose77 force-pushed the agoose77/feat-record-reducer branch from 1914bdc to b454555 Compare May 16, 2023 20:36

agoose77 temporarily deployed to docs-preview May 16, 2023 20:47 — with GitHub Actions Inactive

agoose77 commented May 19, 2023

View reviewed changes

agoose77 added 2 commits May 19, 2023 12:54

test: check a positional reducer

467efbb

Merge branch 'main' into agoose77/feat-record-reducer

872de48

agoose77 enabled auto-merge (squash) May 19, 2023 11:55

agoose77 mentioned this pull request May 19, 2023

feat: implement sum, count, and count_nonzero reductions scikit-hep/vector#347

Merged

agoose77 temporarily deployed to docs-preview May 19, 2023 12:08 — with GitHub Actions Inactive

agoose77 merged commit e674dcb into main May 19, 2023

agoose77 deleted the agoose77/feat-record-reducer branch May 19, 2023 12:11

agoose77 mentioned this pull request Jun 6, 2023

fix: request non-option results for overloaded positional record reducers #2502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support record reducer overloads #2458

feat: support record reducer overloads #2458

agoose77 commented May 12, 2023 •

edited

Loading

agoose77 commented May 12, 2023 •

edited

Loading

codecov bot commented May 14, 2023 •

edited

Loading

jpivarski left a comment

agoose77 commented May 15, 2023

agoose77 commented May 15, 2023 •

edited

Loading

agoose77 commented May 15, 2023

jpivarski commented May 15, 2023 •

edited by agoose77

Loading

agoose77 May 19, 2023

feat: support record reducer overloads #2458

feat: support record reducer overloads #2458

Conversation

agoose77 commented May 12, 2023 • edited Loading

agoose77 commented May 12, 2023 • edited Loading

codecov bot commented May 14, 2023 • edited Loading

Codecov Report

jpivarski left a comment

Choose a reason for hiding this comment

agoose77 commented May 15, 2023

agoose77 commented May 15, 2023 • edited Loading

agoose77 commented May 15, 2023

jpivarski commented May 15, 2023 • edited by agoose77 Loading

agoose77 May 19, 2023

Choose a reason for hiding this comment

agoose77 commented May 12, 2023 •

edited

Loading

agoose77 commented May 12, 2023 •

edited

Loading

codecov bot commented May 14, 2023 •

edited

Loading

agoose77 commented May 15, 2023 •

edited

Loading

jpivarski commented May 15, 2023 •

edited by agoose77

Loading