Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split in two Quantity Classes for Scalar and Array objects #753

Open
hgrecco opened this issue Jan 13, 2019 · 10 comments
Open

Split in two Quantity Classes for Scalar and Array objects #753

hgrecco opened this issue Jan 13, 2019 · 10 comments

Comments

@hgrecco
Copy link
Owner

hgrecco commented Jan 13, 2019

When Pint was in its infancy there were several options ahead of us to support numpy arrays:

  1. Just one type deriving from a np.array
    Good: Works with numpy.
    Bad: requires numpy for everything.
  2. One type deriving from np.array and another which does not
    Good: works with numpy without requiring it for non numpy operations
    Bad: makes things more complicated.
  3. Make an array of Scalar Quantity objects
    Good: easy to implement
    Bad: not very efficient
  4. Create our own dtype derived form a type which is not numpy related
    Good: works with numpy without requiring it for non numpy operations
    Bad: a custom dtype difficult to get right and distribute. Fragile.
  5. (Current solution). One type which stores in the magnitude a numeric object (whatever it is)
    Good: works with numpy without requiring it for non numpy operations. Easy to implement
    Bad: Some numpy operations are not supported

The current solutions has served us well (very well, I would say) and we were able to enforce correctness for operations with units without requiring users to import a particular math package and without monkey patching. So there is a strong reason to keep things like they are. But there are a few reasons to move forward

  1. The landscape of numpy related packages is growing and failures might occurring within them in pint related operations are difficult to track and fix.
  2. Pandas is becoming the defacto standar for many data related operations. We currently have a provisional support but we would like to make it easier to work with it.

To recap the reason why some numpy operations are not supported is because within numpy the fuction asanyarray is called multiple times and this copies the content of the array but looses the type (Quantity) because Quantity is not a np.array subclass. We even had a prototype of solution (2) which you can see a very outdated version in the _npsubclass

Please join the discussion.

@dopplershift
Copy link
Contributor

The new array_function support is probably relevant: https://www.numpy.org/neps/nep-0018-array-function-protocol.html

@andrewgsavage
Copy link
Collaborator

Thanks for opening this!

I think the reasoning behind Pandas' ExtensionTypes to to get around some difficulties with numpy. One example being int32 not being able to store nan, so we get this behaviour in the _npsubclass branch:

Q_(np.array([1, 2]),"m").magnitude.dtype
dtype('int32')

Q_(np.array([1, np.nan]),"m").magnitude.dtype
dtype('float64')

Which makes 1 and 2 undesirable.

From a pint-pandas perspective, an issue I found was having a single class for both Scalars and sequences. It would be good to have QuantityScalar and QuantitySequence to inherit from Quantity.


The array_function looks promising. I followed the example implementation and was able to get concatenate to return a quantity:

@implements(concatenate)
def _(arrays, axis=0, out=None):
    out_units = arrays[0].units
    arrays = [q.to(out_units) for q in arrays]
    return ureg.Quantity(np.concatenate(arrays, axis=axis, out=out), out_units)

my_array = ureg.Quantity([1,2],"m")
my_array_mm = ureg.Quantity([1,2],"mm")
concatenate([my_array,my_array_mm]) 
(1.02.00.0010.002)meter
 

While you'd usually get

np.concatenate([my_array,my_array_mm])
C:\Users\A\Anaconda3\envs\py36\lib\site-packages\pint\quantity.py:1377: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)
array([1, 2, 1, 2])

Will need to get the latest numpy version to try it properly (so it's using np.concatenate instead of concatenate like in https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006 )

@hgrecco
Copy link
Owner Author

hgrecco commented Feb 6, 2019

@andrewgsavage Not sure if the code is right:. Why not: [q.m_as(out_units) for q in arrays]

But in any case, it looks promising. My proposal would be that one side:
1.- We create a numpy module
2.- We implement some functions like concatenate and others
3.- And we write another testsuite just for these that requires the latest numpu

On the other side, we start brainstorming how to do it with QuantityScalar and QuantitySequence.

@andrewgsavage
Copy link
Collaborator

Yea that'd be better, surprised I didn't get a warning message.

Sounds good. Are you thinking in a separate module like pint-pandas?

wrt scalar/sequence, I was thinking something like this:
#764

@hgrecco
Copy link
Owner Author

hgrecco commented Feb 13, 2019

Are you thinking in a separate module like pint-pandas

Not sure. It seems to me that 99% of the people who use pint for science will use numpy. I would just create a numpy module inside pint (unless there are good reasons not to do it).
Pandas, on the other hand, is more specific (although it is growing fast) and the API for extending it is unstable. So developing outside makes sense in my view.

I put some comments on #764

@shoyer
Copy link

shoyer commented Sep 29, 2019

From a pint-pandas perspective, an issue I found was having a single class for both Scalars and sequences. It would be good to have QuantityScalar and QuantitySequence to inherit from Quantity.

Could you elaborate on how this would be helpful?

I don't understand the motivation for the separate scalar/sequence classes in pint.

NumPy does have separate scalars/sequences, but it is generally regarded as a design mistake. The main virtue of separate scalars is that they are immutable and hashable, so you can use them as dictionary keys, but the separate code paths add a significant amount of complexity, both internally in NumPy and externally for its users. NumPy does not even manage to keep the distinction straight: it will convert 0-dimensional arrays into scalars when used in many operations.

@crusaderky
Copy link
Contributor

I too am personally against the idea of splitting the classes.
I think that NEP18 is the way to go, and both xarray and dask.array are working very well with it e.g. in conjunction with sparse.

I'm also seriously concerned about numpy scalar arrays (ndim=0) - they should be considered carefully early in the design phase.

@jthielen
Copy link
Contributor

jthielen commented Oct 15, 2019

I'm sorry for the very delayed response in regards to this and the changes of #875...grad school can get busy sometimes. 😬

My previous agreement with the split quantity class change (e.g. #764 (comment), #875 (comment)) was mostly motivated by two things:

  1. Getting NEP 18 working with pint as soon as possible
  2. Making sure everything worked properly with MetPy

1 was looking to be taken care of by #764 initially or later #875 with a follow-up PR, and I was able to test that 2 held (especially in fixing #751 / #760 / #881). I was indifferent about the quantity class split itself.

When it comes to reviewing implementation details, though, I still have a lot to learn and I feel it's a bit beyond my level of experience. So, I'm very thankful for @crusaderky's and @shoyer's informative comments to clarify the problems with #875, and I defer to their judgement on the matter.

With all this in mind, however, I'm still antsy to see a working NEP 18 implementation in pint make a release soon (MetPy's planned v1.0 release at the end of 2019 needs improved xarray support in the current unit-aware calculations, and having NEP 18 support in both xarray and pint would be extremely helpful for this). Given the discussion here and in #875 (comment), and how the timeline has gone on so far, would the best way to move forward be:

?

I'd be glad to still help where I can to get this moving along (in particular, I could help in fixing #751 and #760 and reimplementing __array_function__ based on what was done before once #880 is merged).

@crusaderky
Copy link
Contributor

@jthielen glad to hear this is going to move again.
Are you a maintainer? @hgrecco seems to be almost completely absent from github since august... :(

@jthielen
Copy link
Contributor

@crusaderky No, I am just a downstream user who really wants to get pint working smoothly with xarray (my main contributions are to MetPy).

@cpascual Do you have any thoughts here? You are the one other merging reviewer I found in git log since bors has been in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants