Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducers should preserve parameters #2544

Open
agoose77 opened this issue Jun 26, 2023 · 4 comments
Open

Reducers should preserve parameters #2544

agoose77 opened this issue Jun 26, 2023 · 4 comments
Labels
bug The problem described is something that must be fixed one-hour-fix It can probably be done in an hour

Comments

@agoose77
Copy link
Collaborator

Version of Awkward Array

main

Description and code to reproduce

ak.min(array, axis=-1, keepdims=True) should preserve the parameters of the original array.

@agoose77 agoose77 added bug (unverified) The problem described would be a bug, but needs to be triaged bug The problem described is something that must be fixed one-hour-fix It can probably be done in an hour and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Jun 26, 2023
@jpivarski
Copy link
Member

In the following, the lists have parameter hello: "there",

>>> original = ak.with_parameter([[1], [], [2, 3]], "hello", "there")
>>> original
<Array [[1], [], [2, 3]] type='3 * [var * int64, parameters={"hello": "ther...'>
>>> ak.sum(original, axis=-1, keepdims=True)
<Array [[1], [0], [5]] type='3 * 1 * int64'>

but the output array has lists without parameters.

In fact, the input lists were variable-length and the output lists are fixed-size with a length of 1. That's already a different type. When we keep parameters, it is because we want to keep the type unchanged, but that's already not what's happening in the above. How disruptive of a backward-incompatible change would it be to preserve the is_regular of the lists when keeping a list-type with keepdims=True?


Separately, the type (and therefore the parameters) of the reduced data should be maintained for some reducers. (Above the horizontal line, parameter-preservation is a question that is independent of which reducer we're talking about. In this section, it depends on which reducer we're talking about.)

In the following, the numbers have parameter hello: "there",

>>> original = ak.unflatten(ak.with_parameter([1, 2, 3], "hello", "there"), [1, 0, 2])
>>> original
<Array [[1], [], [2, 3]] type='3 * var * int64[parameters={"hello": "there"}]'>
>>> ak.sum(original, axis=-1)
<Array [1, 0, 5] type='3 * int64'>
>>> ak.sum(original, axis=-1, keepdims=True)
<Array [[1], [0], [5]] type='3 * 1 * int64'>

but the output array has numbers without parameters. It seems that sum ought to be a List[T] → T function, and thus preserve whatever parameters T has. This isn't the case for all reducers:

  • ak.all perhaps ought to keep any parameters on booleans, but shouldn't keep parameters if it converts numbers into booleans (False for anything equal to zero; True otherwise).
  • ak.any same argument.
  • ak.sum, ak.nansum ought to keep parameters.
  • ak.prod, ak.nanprod ought to keep parameters.
  • ak.max, ak.nanmax should keep parameters, and the case is stronger here than it is with the sum and product reducers.
  • ak.min, ak.nanmin same argument.
  • ak.argmax, ak.nanargmax absolutely should not keep parameters, since it returns an index type that is unrelated to whatever input it is given, List[T] → int64.
  • ak.argmin, ak.nanargmin same argument.
  • ak.count same argument. The count is a new integer type.
  • ak.count_nonzero same argument.

The same thing could be said of non-reducer statistics that reduce dimension:

  • ak.moment perhaps ought to keep parameters, for the same reasons that ak.sum does.
  • ak.mean, ak.nanmean same argument.
  • ak.var, ak.nanvar same argument.
  • ak.std, ak.nanstd same argument.
  • ak.covar has two input arrays and multiplies their elements. That should not preserve type.
  • ak.corr also divides by them and returns a unitless number. That should not preserve type.
  • ak.linear_fit same argument.
  • ak.ptp should keep parameters for the same reasons that ak.min and ak.max do.
  • ak.softmax should not, for the same reasons as ak.corr.

It quickly gets to be a complicated landscape. We want to come up with a simple rule for this, so that it's easy to predict what's going to happen (and whether some observed behavior is a bug or not).

The simplest rule would be that they should never preserve their parameters, which is the status quo. Maybe this isn't what one wants, but parameters can be reintroduced, and at least it's easy to know what's going to happen.

The next-simplest rule would be that only ak.max, ak.nanmax, ak.min, ak.nanmin and maybe ak.ptp preserve parameters. By asking for the maximum or minimum such-and-such, you expect to get a such-and-such back. (ak.ptp might get this for free, depending on how it is implemented.)

The next-to-next-simplest rule would be to also include ak.sum, ak.nansum, ak.prod, ak.nanprod, and maybe ak.moment, ak.mean, ak.nanmean, ak.var, ak.nanvar, ak.std, and ak.nanstd (which wouldn't get it for free; they'd have to keep the division by counts from dropping the parameters).

The "next-simplest rule" above looks the most reasonable to me, until we start talking about units. If we're propagating units, using Pint on all of the ufuncs and implementing it ourselves for the reducers (because we always have to implement special features on reducers; there's no avoiding it), then we'd have to do special things for sums versus products, and ak.covar, ak.linear_fit, and ak.softmax get very interesting.

@jpivarski jpivarski changed the title Reducers should preserve parameters` Reducers should preserve parameters Jun 26, 2023
@agoose77
Copy link
Collaborator Author

agoose77 commented Jun 26, 2023

When we keep parameters, it is because we want to keep the type unchanged,

I've not been operating under that impression hitherto. I've applied this rule for nominal parameters like __array__, but not for others.

How disruptive of a backward-incompatible change would it be to preserve the is_regular of the lists when keeping a list-type with keepdims=True?

I think we decided a policy not to do this (see #1943 (comment)).

It quickly gets to be a complicated landscape. We want to come up with a simple rule for this, so that it's easy to predict what's going to happen (and whether some observed behavior is a bug or not).

Yes 😞. Depending upon your view for non-nominal parameters, it could be that these should always be preserved, whilst nominal parameters are governed by the rules you outline above. My rules would be:

  1. Preserve non-nominal parameters in all cases
  2. Preserve nominal parameters for sum, min, max, prod, ptp, moment, mean, var, std, and their nan variants.

Unit handling does get complicated, but we already need to consider units for reducers in #2545. I'm hoping that the unit conversion will happen first, and thereafter units are just another parameter (that we don't want to lose!)

@jpivarski
Copy link
Member

I think we decided a policy not to do this (see #1943 (comment)).

Oh, right.


I'm neutral on whether __units__ is a special parameter, following the same propagation rules as __record__, or whether it's an ordinary parameter, following the same propagation rules as gobbletygook. As long as it's one or the other.

In the implementation, you might find that you need to make it special.


Out of the functions in your list, prod (and nanprod) is actually incompatible with units.

>>> array = ak.unflatten(ak.with_parameter([1, 2, 3, 4, 5], "__units__", "cm"), [3, 0, 2])
>>> array
<Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64[parameters={"__units__...'>

>>> ak.prod(array, axis=-1)
<Array [6, 1, 20] type='3 * int64'>

Technically, the units of the first element of the output should be $\mbox{cm}^2$, the second element should be unitless, and the third element should be $\mbox{cm}^3$. But units have to be the same for a whole array, so that's out of the question.

I don't think any of the others are incompatible with units, and the statistical functions implemented by reducers may be able to inherit the correct unit propagation automatically once the reducers are done.

@agoose77
Copy link
Collaborator Author

Out of the functions in your list, prod (and nanprod) is actually incompatible with units.

Yep, good spot (I was thinking about the group properties of the numerics). I wonder what the mathematical analogue is for the units. pint is able to decide upon a unit for np.prod upon regular arrays, but we have the problem of ragged arrays (as you point out).

@jpivarski jpivarski added this to Unprioritized in Finalization Jan 19, 2024
@jpivarski jpivarski moved this from Unprioritized to P2 in Finalization Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed one-hour-fix It can probably be done in an hour
Projects
Development

No branches or pull requests

2 participants