Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantity.prod() result unit is incorrect with axis or where argument #867

Closed
jthielen opened this issue Sep 2, 2019 · 9 comments · Fixed by #1087 or #1120
Closed

Quantity.prod() result unit is incorrect with axis or where argument #867

jthielen opened this issue Sep 2, 2019 · 9 comments · Fixed by #1087 or #1120
Labels
numpy Numpy related bug/enhancement

Comments

@jthielen
Copy link
Contributor

jthielen commented Sep 2, 2019

Right now, the .prod() method assumes the full size of the input is collapsed in the result. This gives incorrect results when the axis or where arguments are supplied, as seen below:

import pint

ureg = pint.UnitRegistry()

q = [[1, 2], [3, 4]] * ureg.m

print(q.prod())
print(q.prod(axis=0))
print(q.prod(where=[True, False]))
24 meter ** 4
[3 8] meter ** 4
3 meter ** 4

The unit on the first result where the full array is collapsed to a scalar is correct, but the other two results should have meter ** 2 as the unit.

I don't have a good fix for this yet, since this is the same problem mentioned in #764 (comment). If anyone has any suggestions for a performant way to determine how many unit multiplications occur given both axis and where arguments, please do let me know! Otherwise, I'll try to figure something out and get a PR in for this or include it alongside a prod implementation for __array_function__.

@hgrecco hgrecco added the numpy Numpy related bug/enhancement label Dec 17, 2019
bors bot added a commit that referenced this issue Dec 30, 2019
965: Raise ValueError on ambiguous boolean conversion and add pending NumPy functions/ufuncs from issue tracker r=hgrecco a=jthielen

There were a few follow-up issues from #905 that were missing fairly simple implementations, so this PR takes care of them in preparation for the upcoming release.

Part of this was resolving #866 by raising a `ValueError` due to ambiguity when casting Quantities with offset units to boolean, which is technically a breaking change (although I would argue that the previous behavior was incorrect as discussed in #866).

Also, @keewis, this implements `np.any` and `np.all`, which are two of the functions you had mentioned were needed by xarray. I think now the only one still missing is `np.prod` (#867) which will likely have to wait since there hasn't been a good solution yet.

- [x] Closes #419; Closes #470; Closes #807; Closes #866
- [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors
- [x] The change is fully covered by automated unit tests
- [x] Documented in docs/ as appropriate
- [x] Added an entry to the CHANGES file


Co-authored-by: Jon Thielen <github@jont.cc>
@keewis
Copy link
Contributor

keewis commented Apr 16, 2020

couldn't we use the result of numpy to compute the units? And for where, this could be exponent = np.sum(where).

I'm not sure how to compute the units if both axis and where are specified, though. Is it even possible to (always) compute a common unit in that case, or should we drop the units / declare them as dimensionless?

I'm thinking of something like

def prod(quantity, axis=None, *args, **kwargs):
    def compute_units(unit, in_size, out_size, axis, where):
        if axis is not None and where is not None:
            return ureg.dimensionless

        if where is not None:
            exponent = np.sum(where)
        else:
            exponent = in_size // out_size
        return unit ** exponent

    result = np.prod(quantity.magnitude, axis=axis, *args, **kwargs)
    
    units = compute_units(
        quantity.units,
        quantity.size,
        result.size,
        axis,
        kwargs.get("where"),
    )

    return ureg.Quantity(result, units)

(but the default value of where seems to be something different from None, so we might need to use that instead)

@jthielen
Copy link
Contributor Author

@keewis Good point that in the either/or case of axis and where, we can use NumPy functions to compute the result. It's probably better to move forward with that as a "good enough but not perfect solution", and only operate on dimensionless when both the axis and where arguments are provided.

Cases that I had in mind like the following are well-defined, but couldn't come up with a good way to handle in general:

np.prod(np.arange(6).reshape((2, 3)) * ureg.meter, axis=1, where=[True, False, True])
[ 0 15] meter ** 2

@keewis
Copy link
Contributor

keewis commented Apr 20, 2020

I guess we could try to filter the cases where this works. I've been trying something like

a = np.linspace(1, 2, 20).reshape(4, 5) * ureg.m
where = a < 1.3

for which we obviously cannot compute a common unit.

I think with a reshaped where (i.e. new_where.size == where.size and new_where.ndim == a.ndim) and

axis = 1
exponent = np.sum(where, axis=axis)

we could use dimensionless or raise an exception if exponent is not a scalar and the non-zero elements have different values.

In my example (threshold 1.3), exponent would be

[5 1 0 0]

while with a threshold of 1.5 this would be

[5 5 0 0]

So the former would be dimensionless and the latter meter ** 5

@jthielen
Copy link
Contributor Author

@keewis That almost all sounds great. I would disagree though that the former example should return units of dimensionless...I think it should error instead. It would be rather suprising (to me at least) to have a function multiplying things with units of meters, only to get dimensionless out. Rather, having a warning that says something to the effect of "A common unit could not be computed from non-dimensionless input. Try again with dimensionless input." would make more sense to me.

@keewis
Copy link
Contributor

keewis commented Apr 21, 2020

I wasn't sure if returning dimensionless or raising would be better. The implementation of np.cumprod first tries to convert to dimensionless, so doing the same in prod would be consistent.

@jthielen
Copy link
Contributor Author

I wasn't sure if returning dimensionless or raising would be better. The implementation of np.cumprod first tries to convert to dimensionless, so doing the same in prod would be consistent.

Indeed, it tries to convert to dimensionless, but unless I terribly messed something up in the implementation, that will raise a DimensionalityError when you have something like meters, not silently change units. It should only has an effect for things like percent and radians.

@keewis
Copy link
Contributor

keewis commented Apr 21, 2020

what I wanted to suggest was that we replicate that behavior (sorry if that was ambiguous): if we can't determine a common unit and the input unit is incompatible to dimensionless, raise a DimensionalityError

Edit: maybe we should do that before the (potentially expensive) computation of the result

@jthielen
Copy link
Contributor Author

what I wanted to suggest was that we replicate that behavior (sorry if that was ambiguous): if we can't determine a common unit and the input unit is incompatible to dimensionless, raise a DimensionalityError

Edit: maybe we should do that before the (potentially expensive) computation of the result

Ah, got it, sorry for the confusion. Doing that check ahead of time does seem like a good idea.

@keewis keewis mentioned this issue Apr 23, 2020
5 tasks
@bors bors bot closed this as completed in 5189aa0 Apr 23, 2020
@bors bors bot closed this as completed in #1087 Apr 23, 2020
@jthielen
Copy link
Contributor Author

I think this issue should be reopened, since #1087 didn't really resolve it directly. #1087 was great in that it took care of implementing np.prod() under __array_function__ with axis xor where specified, but this issue also referred to Quantity.prod(), which was not changed, and also covering the case of axis and where specified is not yet covered. If instead it would be better to open a new issue (or issues) covering those more specific issues, let me know.

bors bot added a commit that referenced this issue Jun 18, 2020
1120: revise the unit computation for np.prod r=hgrecco a=keewis

#1087 left out the case where `axis` and `where` are specified and also used the size of the result to compute the output unit if only `axis` was specified. This changes that to use `a.shape[axis]` instead and also implements the support for both `axis` and `where` by broadcasting `where` against the array, applying `np.sum` along `axis` and using the only one unique value (`0` doesn't count) as an exponent. In case there's more than that, it will try to cast to `dimensionless`.

I'm not quite sure if using `np.broadcast_arrays` is the best way to get the exponents, though.

Edit: **Todo**: make the error message easier to understand

- [x] Closes #867
- [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors
- [x] The change is fully covered by automated unit tests
- [ ] Documented in docs/ as appropriate
- [ ] Added an entry to the CHANGES file


Co-authored-by: Keewis <keewis@posteo.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
numpy Numpy related bug/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants