Quantity.prod() result unit is incorrect with axis or where argument #867

jthielen · 2019-09-02T20:57:18Z

Right now, the .prod() method assumes the full size of the input is collapsed in the result. This gives incorrect results when the axis or where arguments are supplied, as seen below:

import pint

ureg = pint.UnitRegistry()

q = [[1, 2], [3, 4]] * ureg.m

print(q.prod())
print(q.prod(axis=0))
print(q.prod(where=[True, False]))

24 meter ** 4
[3 8] meter ** 4
3 meter ** 4

The unit on the first result where the full array is collapsed to a scalar is correct, but the other two results should have meter ** 2 as the unit.

I don't have a good fix for this yet, since this is the same problem mentioned in #764 (comment). If anyone has any suggestions for a performant way to determine how many unit multiplications occur given both axis and where arguments, please do let me know! Otherwise, I'll try to figure something out and get a PR in for this or include it alongside a prod implementation for __array_function__.

The text was updated successfully, but these errors were encountered:

@keewis

965: Raise ValueError on ambiguous boolean conversion and add pending NumPy functions/ufuncs from issue tracker r=hgrecco a=jthielen There were a few follow-up issues from #905 that were missing fairly simple implementations, so this PR takes care of them in preparation for the upcoming release. Part of this was resolving #866 by raising a `ValueError` due to ambiguity when casting Quantities with offset units to boolean, which is technically a breaking change (although I would argue that the previous behavior was incorrect as discussed in #866). Also, @keewis, this implements `np.any` and `np.all`, which are two of the functions you had mentioned were needed by xarray. I think now the only one still missing is `np.prod` (#867) which will likely have to wait since there hasn't been a good solution yet. - [x] Closes #419; Closes #470; Closes #807; Closes #866 - [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors - [x] The change is fully covered by automated unit tests - [x] Documented in docs/ as appropriate - [x] Added an entry to the CHANGES file Co-authored-by: Jon Thielen <github@jont.cc>

keewis · 2020-04-16T10:59:46Z

couldn't we use the result of numpy to compute the units? And for where, this could be exponent = np.sum(where).

I'm not sure how to compute the units if both axis and where are specified, though. Is it even possible to (always) compute a common unit in that case, or should we drop the units / declare them as dimensionless?

I'm thinking of something like

def prod(quantity, axis=None, *args, **kwargs):
    def compute_units(unit, in_size, out_size, axis, where):
        if axis is not None and where is not None:
            return ureg.dimensionless

        if where is not None:
            exponent = np.sum(where)
        else:
            exponent = in_size // out_size
        return unit ** exponent

    result = np.prod(quantity.magnitude, axis=axis, *args, **kwargs)
    
    units = compute_units(
        quantity.units,
        quantity.size,
        result.size,
        axis,
        kwargs.get("where"),
    )

    return ureg.Quantity(result, units)

(but the default value of where seems to be something different from None, so we might need to use that instead)

jthielen · 2020-04-20T19:58:05Z

@keewis Good point that in the either/or case of axis and where, we can use NumPy functions to compute the result. It's probably better to move forward with that as a "good enough but not perfect solution", and only operate on dimensionless when both the axis and where arguments are provided.

Cases that I had in mind like the following are well-defined, but couldn't come up with a good way to handle in general:

np.prod(np.arange(6).reshape((2, 3)) * ureg.meter, axis=1, where=[True, False, True])

[ 0 15] meter ** 2

keewis · 2020-04-20T20:30:31Z

I guess we could try to filter the cases where this works. I've been trying something like

a = np.linspace(1, 2, 20).reshape(4, 5) * ureg.m
where = a < 1.3

for which we obviously cannot compute a common unit.

I think with a reshaped where (i.e. new_where.size == where.size and new_where.ndim == a.ndim) and

axis = 1
exponent = np.sum(where, axis=axis)

we could use dimensionless or raise an exception if exponent is not a scalar and the non-zero elements have different values.

In my example (threshold 1.3), exponent would be

[5 1 0 0]

while with a threshold of 1.5 this would be

[5 5 0 0]

So the former would be dimensionless and the latter meter ** 5

jthielen · 2020-04-21T17:33:57Z

@keewis That almost all sounds great. I would disagree though that the former example should return units of dimensionless...I think it should error instead. It would be rather suprising (to me at least) to have a function multiplying things with units of meters, only to get dimensionless out. Rather, having a warning that says something to the effect of "A common unit could not be computed from non-dimensionless input. Try again with dimensionless input." would make more sense to me.

keewis · 2020-04-21T17:51:32Z

I wasn't sure if returning dimensionless or raising would be better. The implementation of np.cumprod first tries to convert to dimensionless, so doing the same in prod would be consistent.

jthielen · 2020-04-21T17:57:19Z

I wasn't sure if returning dimensionless or raising would be better. The implementation of np.cumprod first tries to convert to dimensionless, so doing the same in prod would be consistent.

Indeed, it tries to convert to dimensionless, but unless I terribly messed something up in the implementation, that will raise a DimensionalityError when you have something like meters, not silently change units. It should only has an effect for things like percent and radians.

keewis · 2020-04-21T18:06:36Z

what I wanted to suggest was that we replicate that behavior (sorry if that was ambiguous): if we can't determine a common unit and the input unit is incompatible to dimensionless, raise a DimensionalityError

Edit: maybe we should do that before the (potentially expensive) computation of the result

jthielen · 2020-04-21T18:29:15Z

what I wanted to suggest was that we replicate that behavior (sorry if that was ambiguous): if we can't determine a common unit and the input unit is incompatible to dimensionless, raise a DimensionalityError

Edit: maybe we should do that before the (potentially expensive) computation of the result

Ah, got it, sorry for the confusion. Doing that check ahead of time does seem like a good idea.

jthielen · 2020-04-23T21:57:48Z

I think this issue should be reopened, since #1087 didn't really resolve it directly. #1087 was great in that it took care of implementing np.prod() under __array_function__ with axis xor where specified, but this issue also referred to Quantity.prod(), which was not changed, and also covering the case of axis and where specified is not yet covered. If instead it would be better to open a new issue (or issues) covering those more specific issues, let me know.

1120: revise the unit computation for np.prod r=hgrecco a=keewis #1087 left out the case where `axis` and `where` are specified and also used the size of the result to compute the output unit if only `axis` was specified. This changes that to use `a.shape[axis]` instead and also implements the support for both `axis` and `where` by broadcasting `where` against the array, applying `np.sum` along `axis` and using the only one unique value (`0` doesn't count) as an exponent. In case there's more than that, it will try to cast to `dimensionless`. I'm not quite sure if using `np.broadcast_arrays` is the best way to get the exponents, though. Edit: **Todo**: make the error message easier to understand - [x] Closes #867 - [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors - [x] The change is fully covered by automated unit tests - [ ] Documented in docs/ as appropriate - [ ] Added an entry to the CHANGES file Co-authored-by: Keewis <keewis@posteo.de>

jthielen mentioned this issue Sep 2, 2019

tests for arrays with units pydata/xarray#3238

Merged

16 tasks

jthielen mentioned this issue Dec 5, 2019

NEP-18 Compatibility #905

Merged

hgrecco added the numpy Numpy related bug/enhancement label Dec 17, 2019

hgrecco mentioned this issue Dec 19, 2019

array function implementation for np.cumprod / np.nancumprod fails with axis #939

Closed

jthielen mentioned this issue Dec 30, 2019

Raise ValueError on ambiguous boolean conversion and add pending NumPy functions/ufuncs from issue tracker #965

Merged

5 tasks

keewis mentioned this issue Apr 23, 2020

Implement prod #1087

Merged

5 tasks

bors bot closed this as completed in 5189aa0 Apr 23, 2020

bors bot closed this as completed in #1087 Apr 23, 2020

This was referenced Jun 17, 2020

synchronize Quantity.prod with np.prod(q) #1118

Closed

revise the unit computation for np.prod #1120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantity.prod() result unit is incorrect with axis or where argument #867

Quantity.prod() result unit is incorrect with axis or where argument #867

jthielen commented Sep 2, 2019 •

edited

keewis commented Apr 16, 2020 •

edited

jthielen commented Apr 20, 2020

keewis commented Apr 20, 2020 •

edited

jthielen commented Apr 21, 2020

keewis commented Apr 21, 2020

jthielen commented Apr 21, 2020

keewis commented Apr 21, 2020 •

edited

jthielen commented Apr 21, 2020

jthielen commented Apr 23, 2020

Quantity.prod() result unit is incorrect with axis or where argument #867

Quantity.prod() result unit is incorrect with axis or where argument #867

Comments

jthielen commented Sep 2, 2019 • edited

keewis commented Apr 16, 2020 • edited

jthielen commented Apr 20, 2020

keewis commented Apr 20, 2020 • edited

jthielen commented Apr 21, 2020

keewis commented Apr 21, 2020

jthielen commented Apr 21, 2020

keewis commented Apr 21, 2020 • edited

jthielen commented Apr 21, 2020

jthielen commented Apr 23, 2020

jthielen commented Sep 2, 2019 •

edited

keewis commented Apr 16, 2020 •

edited

keewis commented Apr 20, 2020 •

edited

keewis commented Apr 21, 2020 •

edited