Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow no conversion to scalar guarantee in ufunc and ufunc.outer #14489

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

seberg
Copy link
Member

@seberg seberg commented Sep 12, 2019

The current commit uses ... (Ellipsis) instead of a leave_wrapped singleton, which is short to write, but other than
that does not have much reason going for. So we should probably
change that.

xref gh-13105

@seberg seberg added 01 - Enhancement 54 - Needs decision 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. labels Sep 12, 2019
@eric-wieser
Copy link
Member

Would be great if we could add this for reductions too - we don't want to end up in a limbo where leave_wrapped / Ellipsis is accepted by some out arguments but not others.

@seberg
Copy link
Member Author

seberg commented Sep 13, 2019

Well, I am still not sure if I don't think that arr.sum() should return a scalar, while array_1d.sum(1) should not. I realize that most users will probably be confused by that. Although, that doesn't mean our singleton cannot mean keep/make array here even when we might want to change the normal behaviour at some point.

The change itself is going to be easy though, I think.

@eric-wieser
Copy link
Member

Although, that doesn't mean our singleton cannot mean keep/make array here even when we might want to change the normal behaviour at some point.

I'm not suggesting we change the default any time soon, just that we give users the option to leave things wrapped for reductions too.

@seberg
Copy link
Member Author

seberg commented Sep 13, 2019

Yeah, I was considering for a moment if ... should only ensure an array if axis!=None. But probably there is no point in that, since the main point of this is to make things easily predictable, and that is simplest if an array is guaranteed.

@seberg
Copy link
Member Author

seberg commented Sep 13, 2019

Hmmmpf, there is an issue. Implementors of __array_wrap__ (mainly/only memmap maybe), may create a scalar. The context does contain the Ellipsis, so subclasses could fix it up. But that is currently not true for reductions (which do not give a context).

The other option would be to guarantee an array (EDIT: read np.ndarray type) return (unless there is an override with __array_ufunc__.

@seberg
Copy link
Member Author

seberg commented Sep 13, 2019

@mhvk since I migrated a bit from the issue. just in case you are interested. Reduction is not an issue of course. The main issue is that __array_wrap__ of subclasses may still unpack the scalar, so the question would be if we should just not wrap at all...

@eric-wieser
Copy link
Member

The main issue is that array_wrap of subclasses may still unpack the scalar,

I think if this happens, we should just throw an exception - since __array_wrap__ is only called on array subclasses, it should be sufficient to check if the return value is still an ndarray subclass - if not, then the __array_wrap__ didn't comply with the requested out argument - something like raise NotImplementedError("Subclass does not appear to support `out=leave_wrapped`")

@hameerabbasi
Copy link
Contributor

I'd like to add this to triage review.

@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label Feb 26, 2020
@seberg
Copy link
Member Author

seberg commented Feb 26, 2020

We could also use np.ndarray as "singleton" and enforce NumPy array output (no subclasses)?

@mattip mattip added triaged Issue/PR that was discussed in a triage meeting and removed 54 - Needs decision triage review Issue/PR to be discussed at the next triage meeting labels Feb 26, 2020
@eric-wieser
Copy link
Member

Now that's an interesting idea. I almost wonder if we want to allow passing any array subclass in there.

@seberg
Copy link
Member Author

seberg commented Feb 26, 2020

@eric-wieser well, we could initially only support ndarray. The thing I think that is missing is to figure out where we want to use this right now. I think it may be that almost(?) all places where we would be using this internally, we already make sure we are wokring with arrays.

But yes, I like the idea as such to use the type... The issue is that we have no control over what subclass.__array_wrap__ does, and if someone would pass in subclass we want to prevent getting a scalar back from __array_wrap__, which is tricky.

@mhvk
Copy link
Contributor

mhvk commented Feb 27, 2020

Passing in np.ndarray sounds OK. Right now, all it would do is prevent the call that turns array scalars into arrays, so adding other subclasses would introduce a bit of extra code, but I can see that it might be useful inside __array_ufunc__ implementations (replace any empty output with the class you want and no need for a .view after), though I'm not sure it helps clarity much. Maybe best to keep that for possible later follow-up?

@eric-wieser
Copy link
Member

So to be clear, the issue here is that __array_wrap__ has no way to distinguish the out=ndarray case from the out=None case - so we just skip __array_wrap__ entirely.

Does __array_ufunc__ receive the information that a scalar output is requested? Could you perhaps add a test for that?

@eric-wieser
Copy link
Member

Perhaps one option to get out of this issue: we change the signature to __array_wrap__(obj, ctx, unwrap_0d=True), and only pass unwrap_0d when it is False. That way, old code continues to work, but attempting to use out=np.ndarray either:

  • Fails loudly, forcing library authors to implement their half of it
  • Emits a warning, and then casts to ndarray

@eric-wieser eric-wieser added the triage review Issue/PR to be discussed at the next triage meeting label Jun 17, 2020
@eric-wieser
Copy link
Member

Marking for re-triage, just because this came up as a pain-point during #16273

@mhvk
Copy link
Contributor

mhvk commented Jun 17, 2020

I haven't really followed this in detail, but just looked at the PR and that certainly looks good.

I guess the choices are:

  1. To let things fall as they may - I'd expect most __array_wrap__ implementations would happen to do the right thing;
  2. To start passing on context so __array_wrap__ implementations can be adjusted;
  3. To error if out=np.ndarray and wrapping is requested.

I'm not quite sure where we are in the development cycle, but if it is a few months to the next release, one could just start with the present PR and see if any of the packages that test against -dev start to fail...

@eric-wieser
Copy link
Member

To let things fall as they may - I'd expect most __array_wrap__ implementations would happen to do the right thing

Notably, np.ma.array would not do the right thing - it would ignore the request to leave the scalar unwrapped, and unwrap it anyway.

@mhvk
Copy link
Contributor

mhvk commented Jun 17, 2020

np.ma.array we can of course adjust, but I see your point, that if we start using this internally in numpy functions, then we may break subclasses which wrap, explicitly cast to scalar, and now some numpy functions do not work with them any more. So, that suggests option (2), of passing on a context, raising a DeprecationWarning if __array_wrap__ returns a scalar, and either let things possibly fail or temporarily recast any numpy scalars returned by __array_wrap__ to an ndarray internally.

In this respect, perhaps the original Ellipses was the better choice after all, since then one can describe the functionaility as out=... meaning that the code will behave as if as if the call is replaced by ufunc.method(*args, **kwargs)[...]. The fact that internally it means that in a case where we exactly know what the answer will be and thus not do the cast to a scalar is then an implementation detail. And with wrapping, one could exactly do what is described. Of course, that could still cause failures, but perhaps less likely (and if the context is passed on, fairly easy to solve).

@eric-wieser
Copy link
Member

eric-wieser commented Jun 17, 2020

And with wrapping, one could exactly do what is described

I think the point you're missing is that __array_wrap__ is not passed np.ndarray or ... - it's passed the already-allocated and computed out array - so there's no way to know whether the caller was asking for array-preservation. Without augmenting the argument set to __array_wrap__, it is impossible for a subclass to behave.

@mhvk
Copy link
Contributor

mhvk commented Jun 17, 2020

Hmm, indeed. So I guess your solution makes more sense. I do wonder whether instead of creating a new keyword argument for what is a niche case instead always pass on a context, also for ufunc reductions, and include out in there. (Sorry if this was discussed; I read through the thread but may have missed it)

@seberg
Copy link
Member Author

seberg commented Jun 18, 2020

@eric-wieser triage marking is nice, but I need a bit more head-start to bring it up (and it would be best if you are around too). I still have to get to wrapping my head around the options, I guess the only way to support array-wrap is to add the argument. Which was why I think the PR basically used np.ndarray to indicate that subclasses are never wrapped, and why I agree that ... may be better if you indicate that subclasses (and maybe for __array_ufunc__ any array-like) is OK, but no scalar.

@seberg seberg removed the triage review Issue/PR to be discussed at the next triage meeting label Aug 25, 2020
@mattip
Copy link
Member

mattip commented Nov 12, 2020

@seberg is this meant to be part of the ufunc reworking for 1.20 or do we let it slide to a future release?

@seberg
Copy link
Member Author

seberg commented Nov 12, 2020

Let it slide, it might create ufunc rework merge conflicts, but is an unrelated feature. Should look at it and see if I can figure out how subclasses should be dealt with though :/.

Base automatically changed from master to main March 4, 2021 02:04
@charris
Copy link
Member

charris commented Apr 12, 2021

@seberg It looks like your recent work leads to a conflict.

@charris
Copy link
Member

charris commented Apr 21, 2021

@seberg Needs rebase.

@mattip mattip added this to In Progress in Ufunc refactor May 5, 2021
@seberg seberg removed this from In Progress in Ufunc refactor May 6, 2021
@charris
Copy link
Member

charris commented Feb 20, 2023

@seberg Still interested in this? Needs rebase.

@mhvk
Copy link
Contributor

mhvk commented Feb 20, 2023

It sounds like there is some serious thought about a 2.0. Maybe in that case we can just start to return array scalars instead of proper scalars? I.e., start to omit the cast to scalar? Apart from being mutable, the difference seems rather small.

@mhvk
Copy link
Contributor

mhvk commented Apr 15, 2023

Does numpy 2.0 include deprecating scalars in favour of array scalars? If so, this becomes somewhat moot, but I didn't see it on the roadmap

@seberg
Copy link
Member Author

seberg commented Apr 16, 2023

I don't want to aim for it. For me there are still quite a lot of unknowns and probably some exploration needed about how feasible it is.
(And yes, I tend to think that removing scalars everywhere may well leave a weird gap also, but then maybe I am just not bold enough about forcing change. I am sure there are things that would be great to change around scalars, I do dislike most things about them.)

@mhvk
Copy link
Contributor

mhvk commented Apr 16, 2023

In that case, it is perhaps worth revising this PR?

@seberg
Copy link
Member Author

seberg commented Apr 17, 2023

Yes, likely. Still haven't had the spark of the original idea how to deal with subclasses. We could also abuse subok=True instead of out=.

An alternative thing that would be a big change, but not as big as removing scalars one, would be this: https://github.com/seberg/numpy/pull/new/try-0d-preservation that I tried to explore a bit today.

The proposal would be to:

  • Ensure that 0-d arrays are preserved in ufuncs, np.add(0d_array, 2) will return a 0d array. (And all other functions, this only has ufuncs right now!)
  • Reductions would return scalars if axis=None.
  • We may or may not need a (internal) helper arr, wrap = asarray_and_wrap(obj) that returns a function to do the "decaying".

(The tail of that is not super trivial, quite a few test failures, scipy also relies on scalars in a few places for example, but it looks harmless). But run that branch with:

NPY_NUMPY_2_BEHAVIOR=1

to try. A fun little caveat: np.matmul(vector, vector) should be a scalar so is a bit hackish still using the old behavior due to that.

@mhvk
Copy link
Contributor

mhvk commented Oct 8, 2023

This would still be nice to have, though I now wonder if numpy 2.0 isn't a good opportunity to just change it -- really, if one gives arrays as input, one should get arrays as output, even if the arrays happen to have shape ()... It will simplify quite a bit of code...

@InessaPawson InessaPawson removed the triaged Issue/PR that was discussed in a triage meeting label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified.
Projects
Status: Pending author's response
Development

Successfully merging this pull request may close these issues.

None yet

7 participants