Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add more mixins + tests. #10460

Closed
wants to merge 4 commits into from
Closed

Conversation

hameerabbasi
Copy link
Contributor

Fixes #10446

This is my first time contributing to Numpy, so I may not have checked every box (although I have skimmed the contributing guidelines).

Please feel free to poke and prod me to make this perfect.

@charris charris changed the title Add more mixins + tests. ENH: Add more mixins + tests. Jan 23, 2018
@njsmith
Copy link
Member

njsmith commented Jan 23, 2018

I still don't like this ad hoc split creation of multiple different mixins. It feels poorly motivated, like we're just doing it in case someone maybe wants to make a class less compatible with ndarray in some arbitrary way. Without concrete use cases, I'd rather see these combined into a single class. (We can always split it up later if someone finds a real use case.)


It is useful for writing classes that do not inherit from `numpy.ndarray`,
but that should support reductions like arrays as described in :ref:`A
Mechanism for Overriding Ufuncs <neps.ufunc-overrides>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really accurate - np.sum() and other functions will work just fine on any object defining __array_ufunc__ - all this is doing is adding alias methods that match ndarray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll change that.

@mhvk
Copy link
Contributor

mhvk commented Jan 23, 2018

Agreed with @njsmith - I think we should have a clear use case. I certainly think in the absence of evidence to the contrary it makes more sense to somewhat expand NDOperatorsMixin by including mean and sum (which should generally be fine for any class that defines operators like __add__), but see little point in having separate mixins for reduce and accumulate. If we go this route at all, it should be something like NDArithmeticMethods or so. But really I think this needs further discussion in #10446; indeed, to avoid splitting the discussion too much, perhaps we should close this PR?

@hameerabbasi
Copy link
Contributor Author

hameerabbasi commented Jan 24, 2018

Here's a use case (I'm fine merging the mixins, it can be handled in __array_ufunc__ anyway.)

But, for example, in sparse arrays, accumulation leads to a dense array and won't be supported at first and reductions are supported (because reducing an all zero axis usually results in a zero.)

@hameerabbasi
Copy link
Contributor Author

hameerabbasi commented Jan 24, 2018

@mhvk The point was to move the discussion here. When there's code to be seen, it always helps the discussion along.

@hameerabbasi
Copy link
Contributor Author

I was wondering about one more thing, whether it would be possible (or good?) to make ndarray itself inherit from these mixins. This would reduce code duplication and automatically test the mixins through ndarray, and ensure consistency.

@hameerabbasi
Copy link
Contributor Author

Or maybe I can edit the implementations in numpy.core._methods itself.

@mhvk
Copy link
Contributor

mhvk commented Jan 25, 2018

@hameerabbasi - I like the idea of ndarray inheriting from the mixin and as a first step it certainly seems good to just use the functions from _methods directly where possible. This seems OK for all reduce/accumulate functions, but for mean and var one would have to remove the asanyarray -- or, I guess inversely, one could create the mixin class and then in _methods do

_amax = NDArrayArithmeticMethods.amax
_amin = ...
def _mean(...):
    arr = asanyarray(a)
    return NDArrayArithmeticMethods.mean(arr, ...)
.
.
.

@mhvk
Copy link
Contributor

mhvk commented Jan 25, 2018

Despite my liking above, I do feel what is really needed here is more discussion -- maybe even something like a NEP -- that lays out the levels of compatibility one likes to provide, including actual use cases, ideally with existing classes. I think there are a number:

  1. Shape compatibility (reshape, ravel, transpose, __getitem__, etc.); these need the duck-type array to provide shape (and possibly ndim and size, thought those can have defaults); we ShapedLikeNDArray for our time and coordinate classes (which have underlying ndarray).
  2. Operator compatibility (done)
  3. Arithmetic compatibility; here, but shouldn't this just be together with (2)?
  4. Concatenation and stacking

cc @shoyer

@shoyer
Copy link
Member

shoyer commented Jan 25, 2018

My use case for separating NDOperatorsMixin and mixins for reduction methods is that xarray could use the former but not the later: we have our own version of reduction methods with different arguments (including skipna for skipping NaN and dim for named dimensions).

So in general I like this approach but I agree with @mhvk that more discussion may be warranted first.

@njsmith
Copy link
Member

njsmith commented Jan 25, 2018

@shoyer If you subclass NDOperatorsMixin in xarray, wouldn't your extended-but-presumably-mostly-compatible methods just override the ones in the mixin?

Another random thought that I should have had before: should this be an ABC? Basically the same as a mixin, but with a little more language-supported structure (e.g. you can declare that an object should count as a subclass without actually inheriting methods).

@shoyer
Copy link
Member

shoyer commented Jan 25, 2018

If you subclass NDOperatorsMixin in xarray, wouldn't your extended-but-presumably-mostly-compatible methods just override the ones in the mixin?

Yes, that's mostly true. Though there's at least one extra method currently defined on the mixin that we don't implement (ptp(), which in my experience is rarely used).

Another random thought that I should have had before: should this be an ABC?

Yes, it would be nice to define ABCs for cases where the interface is not already clear (e.g., shape compatibility). I don't know if all of these need to be ABCs, though. Python doesn't have any ABC for supporting arithmetic operators, for example -- it's just a matter of if the class implements the right special methods.

@njsmith
Copy link
Member

njsmith commented Jan 26, 2018

Python doesn't have any ABC for supporting arithmetic operators, for example -- it's just a matter of if the class implements the right special methods.

https://docs.python.org/3.7/library/numbers.html ?

IIUC an ABC does 4 things:

  • clearly documents which methods/attributes make up an interface (though in practice people can and do cheat when it makes sense, see for example the whole io hierarchy)

  • provides mixin helpers for implementing some of them in terms of the others

  • clearly documents which ones the mixin methods assume you will provide, and provides some runtime checking

  • gives the interface a name, which can be used statically by PEP 484 types, or dynamically to declare or check that a type implements the interface (ABC.register, isinstance). E.g. we can imagine an asduckarray that does something like:

    def asduckarray(obj):
        if isinstance(obj, DuckArray):
            return obj
        return asarray(obj)

I'm not sure this is the right way to write asduckarray (and please someone come up with a better name), but this constellation of features is interesting and meaningful beyond just the mixin helper part.

@shoyer
Copy link
Member

shoyer commented Jan 26, 2018

Python doesn't have any ABC for supporting arithmetic operators, for example -- it's just a matter of if the class implements the right special methods.

https://docs.python.org/3.7/library/numbers.html ?

Yes, almost, but numpy.ndarray doesn't implement it (issubclass(np.ndarray, numbers.Number) -> False).

Anyways, I agree. There are lots of nice reasons to like ABCs. It would be great to figure out a small hierarchy that would suffice for duck-typing core NumPy operations.

@eric-wieser
Copy link
Member

issubclass(np.ndarray, numbers.Number) -> False

Well, not all ndarrays contain numbers :)

@hameerabbasi
Copy link
Contributor Author

hameerabbasi commented Jan 26, 2018

I'm not sure this is the right way to write asduckarray

If we have just one mixin, I'm pretty sure it's the right way. If multiple, then maybe not. Another way could be hasattr(obj, '__array_ufunc__').

(and please someone come up with a better name)

How about asarraylike() or ascustomarray()? I was also thinking of repurposing asanyarray(), although I'm not sure how much this will break. The plus side is would be code would not need to be rewritten to support duck arrays. The down-side is if someone expects full ndarray compatibility, that would be broken.

and then in _methods do ... arr = asanyarray(a)

I assume asanyarray would be replaced by asarraylike here, otherwise we'd only be returning ndarray subclasses.

Edit: +1 for ABCs, although keep in mind that for Python 2, we would have to use ABCMeta

@hameerabbasi
Copy link
Contributor Author

Can someone point me to a short guide to how one would write a NEP? Maybe I could write one up for this purpose.

@shoyer
Copy link
Member

shoyer commented Feb 27, 2018

@hameerabbasi take a look at https://github.com/numpy/numpy/blob/master/doc/neps/nep-0000.rst. I think there was an intention to put this on a website somewhere but I'm not sure what the status of that is.

@hameerabbasi
Copy link
Contributor Author

On second thought, maybe a NEP is overkill? I'm not sure about this, though. I do feel a lot of people would have a say on the design.

@mattip
Copy link
Member

mattip commented Oct 16, 2018

What is the standing of this PR? On one level, It seems there is consensus to merge the various mixin classes together so this PR should be modified. Documentation of its use is also needed. On a higher level, are mixins a concept that has taken hold after their addition in 1.13? Should this somehow be unified with the idea of duck arrays now that NEP 22 has been accepted?

@hameerabbasi
Copy link
Contributor Author

If a consensus is reached I’d be happy to move this forward in any form.

@shoyer
Copy link
Member

shoyer commented Oct 16, 2018

I think writing a separate mixin, as is done here, is the right approach. There is at least one clear use-case for a separate mixin like this that adds reductions (xarray).

This goes under Principle 5: Make it easy to do the right thing.

There are two specific changes that I'd like to see here:

  1. Clearly document the interface. This mixin presumes that the class implements dtype and shape attributes that match NumPy. It also needs an __array_ufunc__ implementation that supports mutation, via the out argument (eventually, it would be nice for mutability to be optional). Is anything else required?
  2. If possible, reduce the level of code duplication with numpy.core._methods. Can _mean and _var there use these implementations, along the lines of what @mhvk suggested above ENH: Add more mixins + tests. #10460 (comment)?

@hameerabbasi
Copy link
Contributor Author

hameerabbasi commented Oct 17, 2018

Continuing on principle 5: The way I see it, there are three things to be decided:

  1. Whether to use ABCs. +1
  2. Whether to split finely: +1
  3. Whether to replace NumPy's own functions to reduce maintenance burden. +1

@njsmith @shoyer @mhvk Can I have your answers to these particular questions, and I'll move this PR in the way this discussion goes.

@hameerabbasi
Copy link
Contributor Author

Re: Making it easy to do the right thing: ABCs do that. Of course, I'll add docs for these various things. :-)

@hameerabbasi
Copy link
Contributor Author

hameerabbasi commented Oct 17, 2018

Re: How to split, I was considering splitting off things that require anything other than __array_ufunc__ into a different ABC, and accumulations into yet another. Of course, this is open to discussion and I may be over-doing things here but this is motivated by differing requirements for things like mean and ptp.

Re: Arbitrary reasons, there may be very good reasons someone isn't implementing something. For example, accumulations don't give a sparse array, there may be manpower required to implement everything at once and so we do it piece by piece, something may be immutable.

Re: Out argument, I can change the implementation, but that would make it less efficient. One thing I've used in pydata/sparse is faux-inplace-operations, to support operators etc. Basically, I make self a shallow copy of the result. It fits nicely into my end-goal of making these arrays interchangeable to an extent.

Base automatically changed from master to main March 4, 2021 02:04
@seberg
Copy link
Member

seberg commented Jun 29, 2022

@hameerabbasi I am going to close this for now. It seems to be stalled on the grounds that we are not sure whether it is a good idea to include. The main point for reconsideration might be to upstream things from pydata/sparse (I am not sure if this is what this is).

On the other hand, maybe it would also be good to just list pydata/sparse as an example for how it can be done somewhere?

Please reopen/comment if you disagree (or just open a new PR), but at this point it seems unlikely the PR would move anyway without a clean start.

@seberg seberg closed this Jun 29, 2022
@hameerabbasi
Copy link
Contributor Author

Thanks @seberg. I think duck-array helpers common to all libraries have their place, maybe it isn't in NumPy though. I think there was a discussion on that at one point.

@seberg
Copy link
Member

seberg commented Jun 29, 2022

Yeah, probably, I am not sure what the result of the discussion was though :). I do think it might be nice to rescue such duck-helpers somewhere. If even into an unpolished "informational" NEP, unfortunately it is one of those tasks that may be hard to find someone who has the cycles and will to do it :(.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more mixins that do the right thing for duck arrays based on __array_ufunc__ and __array_function__
8 participants