Split in two Quantity Classes for Scalar and Array objects #753

hgrecco · 2019-01-13T23:46:42Z

When Pint was in its infancy there were several options ahead of us to support numpy arrays:

Just one type deriving from a np.array
Good: Works with numpy.
Bad: requires numpy for everything.
One type deriving from np.array and another which does not
Good: works with numpy without requiring it for non numpy operations
Bad: makes things more complicated.
Make an array of Scalar Quantity objects
Good: easy to implement
Bad: not very efficient
Create our own dtype derived form a type which is not numpy related
Good: works with numpy without requiring it for non numpy operations
Bad: a custom dtype difficult to get right and distribute. Fragile.
(Current solution). One type which stores in the magnitude a numeric object (whatever it is)
Good: works with numpy without requiring it for non numpy operations. Easy to implement
Bad: Some numpy operations are not supported

The current solutions has served us well (very well, I would say) and we were able to enforce correctness for operations with units without requiring users to import a particular math package and without monkey patching. So there is a strong reason to keep things like they are. But there are a few reasons to move forward

The landscape of numpy related packages is growing and failures might occurring within them in pint related operations are difficult to track and fix.
Pandas is becoming the defacto standar for many data related operations. We currently have a provisional support but we would like to make it easier to work with it.

To recap the reason why some numpy operations are not supported is because within numpy the fuction asanyarray is called multiple times and this copies the content of the array but looses the type (Quantity) because Quantity is not a np.array subclass. We even had a prototype of solution (2) which you can see a very outdated version in the _npsubclass

Please join the discussion.

The text was updated successfully, but these errors were encountered:

dopplershift · 2019-01-18T00:28:27Z

The new array_function support is probably relevant: https://www.numpy.org/neps/nep-0018-array-function-protocol.html

andrewgsavage · 2019-02-05T23:35:18Z

Thanks for opening this!

I think the reasoning behind Pandas' ExtensionTypes to to get around some difficulties with numpy. One example being int32 not being able to store nan, so we get this behaviour in the _npsubclass branch:

Q_(np.array([1, 2]),"m").magnitude.dtype
dtype('int32')

Q_(np.array([1, np.nan]),"m").magnitude.dtype
dtype('float64')

Which makes 1 and 2 undesirable.

From a pint-pandas perspective, an issue I found was having a single class for both Scalars and sequences. It would be good to have QuantityScalar and QuantitySequence to inherit from Quantity.

The array_function looks promising. I followed the example implementation and was able to get concatenate to return a quantity:

@implements(concatenate)
def _(arrays, axis=0, out=None):
    out_units = arrays[0].units
    arrays = [q.to(out_units) for q in arrays]
    return ureg.Quantity(np.concatenate(arrays, axis=axis, out=out), out_units)

my_array = ureg.Quantity([1,2],"m")
my_array_mm = ureg.Quantity([1,2],"mm")
concatenate([my_array,my_array_mm]) 
(1.02.00.0010.002)meter

While you'd usually get

np.concatenate([my_array,my_array_mm])
C:\Users\A\Anaconda3\envs\py36\lib\site-packages\pint\quantity.py:1377: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)
array([1, 2, 1, 2])

Will need to get the latest numpy version to try it properly (so it's using np.concatenate instead of concatenate like in https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006 )

hgrecco · 2019-02-06T03:13:27Z

@andrewgsavage Not sure if the code is right:. Why not: [q.m_as(out_units) for q in arrays]

But in any case, it looks promising. My proposal would be that one side:
1.- We create a numpy module
2.- We implement some functions like concatenate and others
3.- And we write another testsuite just for these that requires the latest numpu

On the other side, we start brainstorming how to do it with QuantityScalar and QuantitySequence.

andrewgsavage · 2019-02-12T22:50:32Z

Yea that'd be better, surprised I didn't get a warning message.

Sounds good. Are you thinking in a separate module like pint-pandas?

wrt scalar/sequence, I was thinking something like this:
#764

hgrecco · 2019-02-13T00:45:34Z

Are you thinking in a separate module like pint-pandas

Not sure. It seems to me that 99% of the people who use pint for science will use numpy. I would just create a numpy module inside pint (unless there are good reasons not to do it).
Pandas, on the other hand, is more specific (although it is growing fast) and the API for extending it is unstable. So developing outside makes sense in my view.

I put some comments on #764

shoyer · 2019-09-29T02:04:35Z

From a pint-pandas perspective, an issue I found was having a single class for both Scalars and sequences. It would be good to have QuantityScalar and QuantitySequence to inherit from Quantity.

Could you elaborate on how this would be helpful?

I don't understand the motivation for the separate scalar/sequence classes in pint.

NumPy does have separate scalars/sequences, but it is generally regarded as a design mistake. The main virtue of separate scalars is that they are immutable and hashable, so you can use them as dictionary keys, but the separate code paths add a significant amount of complexity, both internally in NumPy and externally for its users. NumPy does not even manage to keep the distinction straight: it will convert 0-dimensional arrays into scalars when used in many operations.

crusaderky · 2019-10-10T11:23:14Z

I too am personally against the idea of splitting the classes.
I think that NEP18 is the way to go, and both xarray and dask.array are working very well with it e.g. in conjunction with sparse.

I'm also seriously concerned about numpy scalar arrays (ndim=0) - they should be considered carefully early in the design phase.

jthielen · 2019-10-15T03:02:09Z

I'm sorry for the very delayed response in regards to this and the changes of #875...grad school can get busy sometimes. 😬

My previous agreement with the split quantity class change (e.g. #764 (comment), #875 (comment)) was mostly motivated by two things:

Getting NEP 18 working with pint as soon as possible
Making sure everything worked properly with MetPy

1 was looking to be taken care of by #764 initially or later #875 with a follow-up PR, and I was able to test that 2 held (especially in fixing #751 / #760 / #881). I was indifferent about the quantity class split itself.

When it comes to reviewing implementation details, though, I still have a lot to learn and I feel it's a bit beyond my level of experience. So, I'm very thankful for @crusaderky's and @shoyer's informative comments to clarify the problems with #875, and I defer to their judgement on the matter.

With all this in mind, however, I'm still antsy to see a working NEP 18 implementation in pint make a release soon (MetPy's planned v1.0 release at the end of 2019 needs improved xarray support in the current unit-aware calculations, and having NEP 18 support in both xarray and pint would be extremely helpful for this). Given the discussion here and in #875 (comment), and how the timeline has gone on so far, would the best way to move forward be:

forgo the quantity class split
get Global Quantity, Unit, and Measurement #880 merged
implement an independent fix for iter for single value.... #751 / matplotlib example from the docs failure (Neither Quantity object nor its magnitude supports indexing) #760 / Sphinx errors #881
reimplement __array_function__, using the tests in Split Quantity into scalar and sequence classes #764 and Additional numpy __array_function__ implementations andrewgsavage/pint#6
- also, given the timeline, maintain compatibility with numpy versions both with and without NEP 18 support (which would be a change from Split Quantity into scalar and sequence classes #764 (comment))
release pint v0.10

?

I'd be glad to still help where I can to get this moving along (in particular, I could help in fixing #751 and #760 and reimplementing __array_function__ based on what was done before once #880 is merged).

crusaderky · 2019-10-16T22:38:29Z

@jthielen glad to hear this is going to move again.
Are you a maintainer? @hgrecco seems to be almost completely absent from github since august... :(

jthielen · 2019-10-16T22:50:35Z

@crusaderky No, I am just a downstream user who really wants to get pint working smoothly with xarray (my main contributions are to MetPy).

@cpascual Do you have any thoughts here? You are the one other merging reviewer I found in git log since bors has been in use.

andrewgsavage mentioned this issue Feb 5, 2019

missing list of quantities constructor #761

Closed

hgrecco mentioned this issue Apr 17, 2019

The __array_function__ protocol for overriding NumPy's API #790

Closed

andrewgsavage mentioned this issue Sep 28, 2019

Split quantity (No __array_function__) #875

Open

jthielen mentioned this issue Oct 30, 2019

Split Quantity into scalar and sequence classes #764

Closed

jthielen mentioned this issue Dec 22, 2019

NEP-18 and python scalars #950

Closed

andrewgsavage mentioned this issue Jun 28, 2020

Workaround pintpandas iter issue #1125

Closed

5 tasks

znicholls mentioned this issue Jul 2, 2020

Shall we create a QuantityArray class? #1128

Open

jthielen mentioned this issue Sep 18, 2020

Fix scalar dispatching, fixes #6631 and #6611 dask/dask#6632

Merged

4 tasks

jthielen mentioned this issue Jul 3, 2021

pint scalars have no shape, unlike numpy scalars #1342

Open

burnpanck mentioned this issue Dec 14, 2023

avoid setting force_ndarray_like = True xarray-contrib/pint-xarray#216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split in two Quantity Classes for Scalar and Array objects #753

Split in two Quantity Classes for Scalar and Array objects #753

hgrecco commented Jan 13, 2019

dopplershift commented Jan 18, 2019

andrewgsavage commented Feb 5, 2019

hgrecco commented Feb 6, 2019

andrewgsavage commented Feb 12, 2019

hgrecco commented Feb 13, 2019

shoyer commented Sep 29, 2019

crusaderky commented Oct 10, 2019

jthielen commented Oct 15, 2019 •

edited

Loading

crusaderky commented Oct 16, 2019

jthielen commented Oct 16, 2019

Split in two Quantity Classes for Scalar and Array objects #753

Split in two Quantity Classes for Scalar and Array objects #753

Comments

hgrecco commented Jan 13, 2019

dopplershift commented Jan 18, 2019

andrewgsavage commented Feb 5, 2019

hgrecco commented Feb 6, 2019

andrewgsavage commented Feb 12, 2019

hgrecco commented Feb 13, 2019

shoyer commented Sep 29, 2019

crusaderky commented Oct 10, 2019

jthielen commented Oct 15, 2019 • edited Loading

crusaderky commented Oct 16, 2019

jthielen commented Oct 16, 2019

jthielen commented Oct 15, 2019 •

edited

Loading