ENH: Provide a way to disable flattening of 0d arrays to scalars #13105

eric-wieser · 2019-03-11T01:48:36Z

If only given 0d inputs, even if they are of type ndarray, ufuncs will decay their output to a scalar via [()] (as noted in #4563, #5819). While we can't change this behavior now without creating signficant pain downstream, we could add a way to opt out of it.

#13100 raises a case where np.fix resorts to calling np.asanyarray(np.ceil(x, out=out)) in order to ensure that the result is an array. Unfortunately, this has a number of draw-backs:

It discards duck-types, like dask arrays
It changes the dtype of 0d object arrays containing array-likes

Proposed implementation:

Create a new np.leave_wrapped sentinel object that can be passed as an out argument
Add support in ufunc.__call__, np.take, ... for passing out=np.leave_wrapped meaning "do not call PyArray_Return", causing the result to never be a scalar
Expose PyArray_Return to python as np.core.unpack_scalar
Implement np.fix as:

def fix(x, out=None):
    if out is None:
        do_unwrap = True
        out = np.leave_wrapped
    else:
        do_unwrap = False
    res = nx.ceil(x, out=out)
    res = nx.floor(x, out=res, where=nx.greater_equal(x, 0, out=np.leave_wrapped))

    if do_unwrap:
        res = np.unpack_scalar(res)   # PyArray_Return
    return res

Original unpack_scalars=True proposal

Add a new unpack_scalars=True kwarg to ufunc.__call__, ufunc.reduce. When False, the current behavior of going through PyArray_Return is disabled. Alternative names:
- decay=True
- unpack_0d=True
- unpack_0d_ndarray=True (PyArray_Return already does not apply to subclasses)
Add a new np.unpack_scalar(arr) function to expose PyArray_Return to python code. This would not be overloadable with __array_function__, since existing uses of PyArray_Return are also not.

With these changes, the current implementation of np.fix would change from:

def fix(x, out=None):
    res = nx.asanyarray(nx.ceil(x, out=out))
    res = nx.floor(x, out=res, where=nx.greater_equal(x, 0))

    if out is None and type(res) is nx.ndarray:
        res = res[()]
    return res

to

def fix(x, out=None, *, unpack_scalars=True):
    res = nx.ceil(x, out=out, unpack_scalars=False)
    res = nx.floor(x, out=res, where=nx.greater_equal(x, 0, unpack_scalars=False), unpack_scalars=False)

    if unpack_scalars and out is None:
        res = np.unpack_scalar(res)
    return res

If needed, I could promote this to an NEP

The text was updated successfully, but these errors were encountered:

rgommers · 2019-03-11T02:22:01Z

This is a pretty ugly workaround for an inconsistency in the current design. An alternative would be to simply make it a breaking change at some point. Possibly in combination with doing something about array scalars?

eric-wieser · 2019-03-11T02:40:25Z

I agree it's a workaround, but it has some nice properties:

It provides a transition strategy - we could:
- Introduce the argument with a default of True
- Start warning if the argument is not passed explicitly
- Change the default to False
It allows us to fix the problems with functions like np.fix on duck-types sooner, rather than waiting until we have a coherent design for how to handle scalars.

seberg · 2019-03-11T09:11:01Z

At least the unpack_scalars name sounds much nicer to me ;) (I think it maybe should be singular, but nitting).

I agree it is very ugly, but unless we think we can pull off some incompatible break which fixes this sooner rather than later it may be a practical path. If you really want to be complete we would be adding kwargs all over the python API (possibly even adding inconsistencies with the C-Api).

That said, we have to weigh of the gain and cost a bit I suppose. If numpy would be the only user, I am not sure it is worth it, since I have some doubt it will help us much with the breaking change if we add a kwarg to half of the numpy API.

rgommers · 2019-03-11T16:03:08Z

since I have some doubt it will help us much with the breaking change if we add a kwarg to half of the numpy API.

more like, it makes it quite a bit harder to make a change later, since we then have keywords that we'd like to remove again but they now have users

If you really want to be complete we would be adding kwargs all over the python API (possibly even adding inconsistencies with the C-Api).

sounds like something to avoid ....

That said, we have to weigh of the gain and cost a bit I suppose

I think the gain for most end users is fairly minimal, however for libraries that implement array-like data structures or ndarray subclasses it would be very nice to fix this. A NEP to give more context here and discuss costs/benefits could be useful indeed.

eric-wieser · 2019-03-11T16:34:18Z

If you really want to be complete we would be adding kwargs all over the python API

One possible alternative would be to instead spell it as out=np.leave_wrapped, which would reduce the number of argument lists that need changing.

rgommers · 2019-03-11T17:28:54Z

It provides a transition strategy - we could:

that doesn't really work. in particular, warning if the argument is not passed in is not a good idea. you would be forcing every single user to specify this argument. that's way worse than a breaking change.

Start warning if the argument is not passed explicitly

or maybe you meant, only warn if the argument is not passed and a 0-d array is passed in? that's better but also tricky, because very few people will be testing with 0-d arrays in the first place.

One possible alternative would be to instead spell it as out=np.leave_wrapped, which would reduce the number of argument lists that need changing.

That would be preferred to adding new keywords everywhere probably. However, you then can't have both out=some_array and out=np.leave_wrapped. That's two not-often-used things combined, but could be an issue for duck arrays that need to pass on their own out keyword with out=out.

seberg · 2019-03-11T22:38:02Z

@rgommers if out is supplied, it is used. Thus, conversion to scalars does not happen anyway in that case (and things are well behaved in that sense). So, I do think that such an overloading could probably cover all cases, and is thus likely a better hack than a new kwarg everywhere (some functions might still need to have out added though!).

charris · 2019-03-11T22:38:51Z

Might add this to the breaking changes doc: https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release

mhvk · 2019-03-12T21:45:53Z

I'm not sure at all I like the new keyword. Perhaps also good to think why we may want it:

Here, the scalar fed into out is immutable; CPython solves a similar problem by turning cases like a += 1 into a = a+1 if a is immutable. Might this be a solution? (Probably more of a breaking change, but putting it out there...)
In other functions, we count on the output of ufuncs applied to arrays to also be arrays, and thus have attributes like .ndim, .shape, etc. This is partially solved by returning scalars which have these attributes (even though they don't really make sense for them), but it doesn't work for object arrays, where one just gets the object (see bug fix in BUG: ensure linspace works on object input. #13092).

Overall, I wonder if it is really such a breaking change to at least change the output to be an array scalar if all inputs were array scalars...

seberg · 2019-03-13T08:31:54Z

@mhvk, I might agree with preserving scalars that were given in (though there are rough edges around non-numpy scalars maybe), and I think that reductions with None should return scalars in a "minimal" scenario. Of course all of that is up for a lot of discussion, and the alternative would be that in the long run we want only arrays.
That, however, is at least more difficult in the sense that we have to add logic doing the actual preservation step (and in the case of reductions, we might even need something like Erics protocol suggestion, although possibly in a different form). Might be worth poking a bit though as an alternative. It still could (and in a few places certainly will) break code, although much less than removing this completely.

I used to like that idea, but becoming less certain nowadays. On the other hand we now have logic allowing non-numpy scalars to preserve themselves (using __array_ufunc__) if they wanted to now, so maybe it is getting less weird in that sense.

seberg · 2019-03-13T08:34:39Z

Sorry, addendum: I think Eric's idea of using a singleton to signal "please return arrays" should be able to replace all occurrences where out=scalar would help. I see two possibilities:

You may not actually use out otherwise, so not really a point in providing a dummy scalar.
You were already using out (and out must be an array), either there is no issue, or you had an issue in a previous step, in which case situation 1. occurred one step earlier.

mhvk · 2019-03-13T14:01:27Z

@seberg - thanks for reminding of @eric-wieser's second suggestion, of using a special singleton for out to prevent unwrapping. That idea indeed might work, and feels reasonably logical in that you then use out either to provide storage, or to tell something about the type of output you want.

About the special value: currently, the documentation states:
"""
The ‘out’ keyword argument is expected to be a tuple with one entry per output (which can be None for arrays to be allocated by the ufunc)
"""
So, strictly following the documentation the special value for avoiding unpacking a scalar could in fact be None, i.e., we'd leave the absence of out (or np._NoValue) as the way to signal one wants actual downgrading to a scalar. I think for ufuncs this almost certainly would work well; for ufunc-like functions with out arguments, though, one would have to replace out=None with out=np._NoValue (and downstream libraries might have problems, to the extent of course that they actually care whether anything is a numpy scalar or array scalar.

p.s. More for completeness than preference: Other arguments that could be extended/re-used are order and subok.

eric-wieser · 2019-03-13T15:21:00Z

That's a really good argument for using the out argument to indicate whether to unpack - it allows us to enable it per output for multi-output ufuncs.

mhvk · 2019-04-16T19:50:10Z

The further discussion in #13100 reminded me of this one: it seems we are converging here on the idea of a special value for out to indicate one always wants an array back even if it is a scalar. The main question would seem to be what that value would be...

eric-wieser · 2019-04-17T01:36:52Z

< moved to top-matter >

mhvk · 2019-04-17T18:58:47Z

Yes, and I think it one could even make a decorator that does the start and end... (the where part doesn't need its leave_wrapped so with the decorator, the code itself would be very clean).

eric-wieser · 2019-04-17T19:39:18Z

@mhvk: I thought about that, but I wasn't sure how to handle multiple input and/or output arguments

eric-wieser · 2019-04-17T19:42:07Z

the where part doesn't need its leave_wrapped

It might for object arrays, and it's definitely more efficient with it.

mhvk · 2019-04-17T19:56:50Z

I was thinking just about the case of a ufunc-like function that has a single out-argument. Indeed, perhaps we could have a @ufunc_like decorator for that...

mhvk · 2019-05-02T09:34:42Z

@seberg - I don't think the proposal here, of allowing out=leave_wrapped, needs a major release to happen - it is fully backward compatible. What would need a major release is to change the default meaning of out=None...

seberg · 2019-05-02T15:17:32Z

Woops, sorry, yeah, I went through the wiki and did not look careful enough that this issue discussed the workaround (in a sense) and not a change of behaviour.

seberg · 2019-09-12T04:47:20Z

It is a bit horrifying (and there is a lot of duplicate work going on in ufuncs that should be cleaned up :(. But I can make it work for __call__ and outer(not reductions) with 5 lines of code changed. My first thing was to use .../Ellipsis as a sentinel, I don't fully hate that to be honest.
For reduction the problem is a bit less extreme probably, so maybe we should just do that? I doubt it makes the general cleanup seriously harder...

seberg · 2019-09-12T04:47:44Z

(The reason for looking at it was that the fix for gh-14326 seemed way too ugly once more)

eric-wieser · 2019-09-12T05:03:46Z

I'm not sure I see a particularly good argument for using ... instead of a bespoke singleton.

seberg · 2019-09-12T05:07:29Z

I don't think there is, it was just lazyness. The only argument is that it fits to indexing in the sense that in indexing ... ensures an array/view return. But a singleton is probably better, already since it is more descriptive.

eric-wieser · 2019-09-12T05:16:28Z

I remember there were some objectiosn to singletons regarding np.never_copy (@rgommers?) - but in a world where third party libraries are going to start receiving np.leave_wrapped in __array_ufunc__, they'll have a much better time if the repr and doc of that object tells them what they need to do, than they would if they just get ... without explanation

seberg · 2019-09-12T05:31:04Z

Yeah, it is somewhat (maybe too) convenient to write. For third parties overriding with __array_ufunc__ it might be very confusing to suddenly see Ellipsis crop up. I opened gh-14489, but it does not cover reductions at the moment (not sure that matters), and is written with Ellipsis.

seberg · 2019-09-12T05:33:04Z

Considering how simple this is (to a part because the arguments are parsed multiple times...), I would not even mind to use a private sentinel, just to use it internally.

mhvk · 2019-09-12T13:16:05Z

Great to see an actual PR! On the sentinel, just to be sure it has been considered and rejected consciously, what about the idea of using an explicit None for "allocate array" (including 0d) and a new sentinel for keeping the behaviour of "allocate array but cast to scalar if 0d"? As I noted in #13105 (comment), this would make ufuncs consistent with the documentation.

seberg · 2019-09-12T15:33:07Z

@mhvk None is already accepted to be identical to not passing anything in, for multiple outputs you can also do (out_arr, None) to only use one out.

mhvk · 2019-09-12T16:11:29Z

@seberg - I realize that one can do None already, but the docstring states that in that case an array is allocated. Which for scalar output is not actually the case. My suggestion was to fix the inconsistency between docstring and code by fixing the code - but allowing an option to still decay to scalar for those that might need it (which could even be the default if no explicit out is provided).

I guess the main question is what the hoped-for end-state is. If it is to (eventually) return array scalars for 0-dim output, then perhaps my suggestion makes that easier.

Obviously, though, this presumes that very little code will rely on the output being an actual scalar rather than an array scalar. I think this may well be OK, since making it an array scalar does not limit anything (but does allow new things, like using it as an out in further ufuncs).

Anyway, just a thought. Your PR that at least allows one to ask for array scalars to be returned is a big improvement already.

seberg · 2019-09-12T16:36:32Z

Fair point, although I am slightly hesitant to change behaviour to the documentation, and am not sure if I like None and no input to behave differently, simply because that is not the typical pattern in python.

mhvk · 2019-09-12T16:56:18Z

Agreed. I guess options are:

Make code consistent with docs and never unwrap scalars.
As (1), but allow np.unwrap_scalar.
As (2), and make it default if nothing passed in (like np._NoValue)
Make docs consistent with code, but allow np.leave_wrapped.

seberg · 2019-09-12T17:04:55Z

Long term, the last option would be to preserve scalars if passed in. That is in parts weird, but especially in a world where you could already tack on __array_ufunc__ to the scalars to achieve the same, I personally still like it.

I hate the automatic unwrapping in general. I think it is just wrong and was tacked on in an attempt to speed things up, missing the point that scalars are only reasonably produced in very few places (such as reductions) where it is actually not unreasonable to return true scalars (not weird array scalars).

mhvk · 2019-09-12T17:30:43Z

Good point about scalars to scalars - preserving those seems logical but then one definitely needs a leave_wrapped argument for things like np.fix.

Also agreed that the automatic unwrapping is just wrong (for normal calls at least), which is partially why I'm trying to think how we can get to a state where that doesn't happen...

seberg · 2019-09-12T17:56:23Z

It is a general issue, about what np.asarray does, for UFuncs, my plan would be to make the order:

Discovery of dtype + dimension
Dispatch to UFuncImpl (meaning all dtypes are specified and fixed)
Do the actual array coercion and UFuncImpl execution.
You could think about things like scalar inputs -> scalar outputs.

And about actually changing anything... Well, right now I do not see how to get around doing a compatibility breaking change (at which point I suppose the new behaviour here may force array outputs, even if we decide against that for all scalar inputs). Maybe we should do a "lets see what happens" release at some point by making an env-variable which is only on during a prerelease at best.

eric-wieser · 2019-09-12T18:38:56Z

I would lean towards @mhvk's 4, since it gives us all the tools we need to remove internal hacks, without breaking "leaf" users (we will likely break __array_ufunc__ implementers in some places, but they're normally easy to find).

mhvk · 2023-10-10T14:13:58Z

Belatedly reviving the option to just not convert to scalars ever as a proposal for numpy 2.0 in #24897.

eric-wieser added the 23 - Wish List label Mar 11, 2019

eric-wieser mentioned this issue Mar 11, 2019

ENH: np.fix as ufunc #13100

Open

eric-wieser mentioned this issue Apr 15, 2019

np.average crashes for 1D decimal object array (now only without weights) #8696

Open

seberg added the 60 - Major release Issues that need or may be better addressed in a major release label May 2, 2019

seberg removed the 60 - Major release Issues that need or may be better addressed in a major release label May 2, 2019

shoyer added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label May 10, 2019

eric-wieser mentioned this issue May 11, 2019

BUG: Fix np.ma.take on un-indexable scalars #9353

Closed

seberg mentioned this issue Sep 12, 2019

ENH: Allow no conversion to scalar guarantee in ufunc and ufunc.outer #14489

Open

mhvk mentioned this issue Oct 15, 2019

Allow longdouble values through input validation astropy/astropy#9368

Merged

eric-wieser mentioned this issue Nov 10, 2019

Numpy functions return a scalar when the input is a 0-dimensional array #6051

Open

seberg mentioned this issue Dec 24, 2019

ENH: Remove PyArray_Return logic converting 0-D arrays to scalars indiscriminately #15165

Closed

This was referenced Feb 17, 2020

Some functions raise an unhelpful error when applied to object array #15587

Open

nanmean with masked array gives readonly error only with floats #12283

Open

eric-wieser mentioned this issue Apr 24, 2020

WIP,ENH: Allow copy=np.never_copy for array creation and reshape #11897

Closed

5 tasks

eric-wieser mentioned this issue May 18, 2020

MAINT: Clean up the implementation of quantile #16284

Merged

eric-wieser mentioned this issue Oct 23, 2020

Inconsistent failure to do numeric operations on object-typed masked arrays #17612

Open

steppi mentioned this issue Jun 27, 2023

RFC: Ufunc return type for zero dimensional array input. scipy/scipy#18768

Open

mhvk mentioned this issue Oct 10, 2023

ENH: No longer auto-convert array scalars to numpy scalars in ufuncs (and elsewhere?) #24897

Open

ENH: Provide a way to disable flattening of 0d arrays to scalars #13105

ENH: Provide a way to disable flattening of 0d arrays to scalars #13105

Comments

eric-wieser commented Mar 11, 2019 • edited Loading

rgommers commented Mar 11, 2019

eric-wieser commented Mar 11, 2019

seberg commented Mar 11, 2019

rgommers commented Mar 11, 2019

eric-wieser commented Mar 11, 2019

rgommers commented Mar 11, 2019

seberg commented Mar 11, 2019

charris commented Mar 11, 2019

mhvk commented Mar 12, 2019

seberg commented Mar 13, 2019

seberg commented Mar 13, 2019

mhvk commented Mar 13, 2019

eric-wieser commented Mar 13, 2019

mhvk commented Apr 16, 2019

eric-wieser commented Apr 17, 2019 • edited Loading

mhvk commented Apr 17, 2019

eric-wieser commented Apr 17, 2019

eric-wieser commented Apr 17, 2019

mhvk commented Apr 17, 2019

mhvk commented May 2, 2019

seberg commented May 2, 2019

seberg commented Sep 12, 2019

seberg commented Sep 12, 2019

eric-wieser commented Sep 12, 2019

seberg commented Sep 12, 2019

eric-wieser commented Sep 12, 2019 • edited Loading

seberg commented Sep 12, 2019

seberg commented Sep 12, 2019

mhvk commented Sep 12, 2019

seberg commented Sep 12, 2019

mhvk commented Sep 12, 2019

seberg commented Sep 12, 2019

mhvk commented Sep 12, 2019

seberg commented Sep 12, 2019

mhvk commented Sep 12, 2019

seberg commented Sep 12, 2019

eric-wieser commented Sep 12, 2019

mhvk commented Oct 10, 2023

eric-wieser commented Mar 11, 2019 •

edited

Loading

eric-wieser commented Apr 17, 2019 •

edited

Loading

eric-wieser commented Sep 12, 2019 •

edited

Loading