Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Provide a way to disable flattening of 0d arrays to scalars #13105

Open
eric-wieser opened this issue Mar 11, 2019 · 38 comments
Open

ENH: Provide a way to disable flattening of 0d arrays to scalars #13105

eric-wieser opened this issue Mar 11, 2019 · 38 comments
Labels
23 - Wish List 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified.

Comments

@eric-wieser
Copy link
Member

eric-wieser commented Mar 11, 2019

If only given 0d inputs, even if they are of type ndarray, ufuncs will decay their output to a scalar via [()] (as noted in #4563, #5819). While we can't change this behavior now without creating signficant pain downstream, we could add a way to opt out of it.

#13100 raises a case where np.fix resorts to calling np.asanyarray(np.ceil(x, out=out)) in order to ensure that the result is an array. Unfortunately, this has a number of draw-backs:

  • It discards duck-types, like dask arrays
  • It changes the dtype of 0d object arrays containing array-likes

Proposed implementation:

  • Create a new np.leave_wrapped sentinel object that can be passed as an out argument
  • Add support in ufunc.__call__, np.take, ... for passing out=np.leave_wrapped meaning "do not call PyArray_Return", causing the result to never be a scalar
  • Expose PyArray_Return to python as np.core.unpack_scalar
  • Implement np.fix as:
def fix(x, out=None):
    if out is None:
        do_unwrap = True
        out = np.leave_wrapped
    else:
        do_unwrap = False
    res = nx.ceil(x, out=out)
    res = nx.floor(x, out=res, where=nx.greater_equal(x, 0, out=np.leave_wrapped))

    if do_unwrap:
        res = np.unpack_scalar(res)   # PyArray_Return
    return res

Original unpack_scalars=True proposal

  • Add a new unpack_scalars=True kwarg to ufunc.__call__, ufunc.reduce. When False, the current behavior of going through PyArray_Return is disabled. Alternative names:
    • decay=True
    • unpack_0d=True
    • unpack_0d_ndarray=True (PyArray_Return already does not apply to subclasses)
  • Add a new np.unpack_scalar(arr) function to expose PyArray_Return to python code. This would not be overloadable with __array_function__, since existing uses of PyArray_Return are also not.

With these changes, the current implementation of np.fix would change from:

def fix(x, out=None):
    res = nx.asanyarray(nx.ceil(x, out=out))
    res = nx.floor(x, out=res, where=nx.greater_equal(x, 0))

    if out is None and type(res) is nx.ndarray:
        res = res[()]
    return res

to

def fix(x, out=None, *, unpack_scalars=True):
    res = nx.ceil(x, out=out, unpack_scalars=False)
    res = nx.floor(x, out=res, where=nx.greater_equal(x, 0, unpack_scalars=False), unpack_scalars=False)

    if unpack_scalars and out is None:
        res = np.unpack_scalar(res)
    return res
If needed, I could promote this to an NEP
@rgommers
Copy link
Member

This is a pretty ugly workaround for an inconsistency in the current design. An alternative would be to simply make it a breaking change at some point. Possibly in combination with doing something about array scalars?

@eric-wieser
Copy link
Member Author

I agree it's a workaround, but it has some nice properties:

  • It provides a transition strategy - we could:
    • Introduce the argument with a default of True
    • Start warning if the argument is not passed explicitly
    • Change the default to False
  • It allows us to fix the problems with functions like np.fix on duck-types sooner, rather than waiting until we have a coherent design for how to handle scalars.

@seberg
Copy link
Member

seberg commented Mar 11, 2019

At least the unpack_scalars name sounds much nicer to me ;) (I think it maybe should be singular, but nitting).

I agree it is very ugly, but unless we think we can pull off some incompatible break which fixes this sooner rather than later it may be a practical path. If you really want to be complete we would be adding kwargs all over the python API (possibly even adding inconsistencies with the C-Api).

That said, we have to weigh of the gain and cost a bit I suppose. If numpy would be the only user, I am not sure it is worth it, since I have some doubt it will help us much with the breaking change if we add a kwarg to half of the numpy API.

@rgommers
Copy link
Member

since I have some doubt it will help us much with the breaking change if we add a kwarg to half of the numpy API.

more like, it makes it quite a bit harder to make a change later, since we then have keywords that we'd like to remove again but they now have users

If you really want to be complete we would be adding kwargs all over the python API (possibly even adding inconsistencies with the C-Api).

sounds like something to avoid ....

That said, we have to weigh of the gain and cost a bit I suppose

I think the gain for most end users is fairly minimal, however for libraries that implement array-like data structures or ndarray subclasses it would be very nice to fix this. A NEP to give more context here and discuss costs/benefits could be useful indeed.

@eric-wieser
Copy link
Member Author

If you really want to be complete we would be adding kwargs all over the python API

One possible alternative would be to instead spell it as out=np.leave_wrapped, which would reduce the number of argument lists that need changing.

@rgommers
Copy link
Member

It provides a transition strategy - we could:

that doesn't really work. in particular, warning if the argument is not passed in is not a good idea. you would be forcing every single user to specify this argument. that's way worse than a breaking change.

Start warning if the argument is not passed explicitly

or maybe you meant, only warn if the argument is not passed and a 0-d array is passed in? that's better but also tricky, because very few people will be testing with 0-d arrays in the first place.

One possible alternative would be to instead spell it as out=np.leave_wrapped, which would reduce the number of argument lists that need changing.

That would be preferred to adding new keywords everywhere probably. However, you then can't have both out=some_array and out=np.leave_wrapped. That's two not-often-used things combined, but could be an issue for duck arrays that need to pass on their own out keyword with out=out.

@seberg
Copy link
Member

seberg commented Mar 11, 2019

@rgommers if out is supplied, it is used. Thus, conversion to scalars does not happen anyway in that case (and things are well behaved in that sense). So, I do think that such an overloading could probably cover all cases, and is thus likely a better hack than a new kwarg everywhere (some functions might still need to have out added though!).

@charris
Copy link
Member

charris commented Mar 11, 2019

Might add this to the breaking changes doc: https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release

@mhvk
Copy link
Contributor

mhvk commented Mar 12, 2019

I'm not sure at all I like the new keyword. Perhaps also good to think why we may want it:

  1. Here, the scalar fed into out is immutable; CPython solves a similar problem by turning cases like a += 1 into a = a+1 if a is immutable. Might this be a solution? (Probably more of a breaking change, but putting it out there...)
  2. In other functions, we count on the output of ufuncs applied to arrays to also be arrays, and thus have attributes like .ndim, .shape, etc. This is partially solved by returning scalars which have these attributes (even though they don't really make sense for them), but it doesn't work for object arrays, where one just gets the object (see bug fix in BUG: ensure linspace works on object input. #13092).

Overall, I wonder if it is really such a breaking change to at least change the output to be an array scalar if all inputs were array scalars...

@seberg
Copy link
Member

seberg commented Mar 13, 2019

@mhvk, I might agree with preserving scalars that were given in (though there are rough edges around non-numpy scalars maybe), and I think that reductions with None should return scalars in a "minimal" scenario. Of course all of that is up for a lot of discussion, and the alternative would be that in the long run we want only arrays.
That, however, is at least more difficult in the sense that we have to add logic doing the actual preservation step (and in the case of reductions, we might even need something like Erics protocol suggestion, although possibly in a different form). Might be worth poking a bit though as an alternative. It still could (and in a few places certainly will) break code, although much less than removing this completely.

I used to like that idea, but becoming less certain nowadays. On the other hand we now have logic allowing non-numpy scalars to preserve themselves (using __array_ufunc__) if they wanted to now, so maybe it is getting less weird in that sense.

@seberg
Copy link
Member

seberg commented Mar 13, 2019

Sorry, addendum: I think Eric's idea of using a singleton to signal "please return arrays" should be able to replace all occurrences where out=scalar would help. I see two possibilities:

  1. You may not actually use out otherwise, so not really a point in providing a dummy scalar.
  2. You were already using out (and out must be an array), either there is no issue, or you had an issue in a previous step, in which case situation 1. occurred one step earlier.

@mhvk
Copy link
Contributor

mhvk commented Mar 13, 2019

@seberg - thanks for reminding of @eric-wieser's second suggestion, of using a special singleton for out to prevent unwrapping. That idea indeed might work, and feels reasonably logical in that you then use out either to provide storage, or to tell something about the type of output you want.

About the special value: currently, the documentation states:
"""
The ‘out’ keyword argument is expected to be a tuple with one entry per output (which can be None for arrays to be allocated by the ufunc)
"""
So, strictly following the documentation the special value for avoiding unpacking a scalar could in fact be None, i.e., we'd leave the absence of out (or np._NoValue) as the way to signal one wants actual downgrading to a scalar. I think for ufuncs this almost certainly would work well; for ufunc-like functions with out arguments, though, one would have to replace out=None with out=np._NoValue (and downstream libraries might have problems, to the extent of course that they actually care whether anything is a numpy scalar or array scalar.

p.s. More for completeness than preference: Other arguments that could be extended/re-used are order and subok.

@eric-wieser
Copy link
Member Author

That's a really good argument for using the out argument to indicate whether to unpack - it allows us to enable it per output for multi-output ufuncs.

@mhvk
Copy link
Contributor

mhvk commented Apr 16, 2019

The further discussion in #13100 reminded me of this one: it seems we are converging here on the idea of a special value for out to indicate one always wants an array back even if it is a scalar. The main question would seem to be what that value would be...

@eric-wieser
Copy link
Member Author

eric-wieser commented Apr 17, 2019

< moved to top-matter >

@mhvk
Copy link
Contributor

mhvk commented Apr 17, 2019

Yes, and I think it one could even make a decorator that does the start and end... (the where part doesn't need its leave_wrapped so with the decorator, the code itself would be very clean).

@eric-wieser
Copy link
Member Author

@mhvk: I thought about that, but I wasn't sure how to handle multiple input and/or output arguments

@eric-wieser
Copy link
Member Author

the where part doesn't need its leave_wrapped

It might for object arrays, and it's definitely more efficient with it.

@mhvk
Copy link
Contributor

mhvk commented Apr 17, 2019

I was thinking just about the case of a ufunc-like function that has a single out-argument. Indeed, perhaps we could have a @ufunc_like decorator for that...

@seberg seberg added the 60 - Major release Issues that need or may be better addressed in a major release label May 2, 2019
@mhvk
Copy link
Contributor

mhvk commented May 2, 2019

@seberg - I don't think the proposal here, of allowing out=leave_wrapped, needs a major release to happen - it is fully backward compatible. What would need a major release is to change the default meaning of out=None...

@seberg seberg removed the 60 - Major release Issues that need or may be better addressed in a major release label May 2, 2019
@seberg
Copy link
Member

seberg commented May 2, 2019

Woops, sorry, yeah, I went through the wiki and did not look careful enough that this issue discussed the workaround (in a sense) and not a change of behaviour.

@shoyer shoyer added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label May 10, 2019
@seberg
Copy link
Member

seberg commented Sep 12, 2019

It is a bit horrifying (and there is a lot of duplicate work going on in ufuncs that should be cleaned up :(. But I can make it work for __call__ and outer(not reductions) with 5 lines of code changed. My first thing was to use .../Ellipsis as a sentinel, I don't fully hate that to be honest.
For reduction the problem is a bit less extreme probably, so maybe we should just do that? I doubt it makes the general cleanup seriously harder...

@seberg
Copy link
Member

seberg commented Sep 12, 2019

(The reason for looking at it was that the fix for gh-14326 seemed way too ugly once more)

@eric-wieser
Copy link
Member Author

I'm not sure I see a particularly good argument for using ... instead of a bespoke singleton.

@seberg
Copy link
Member

seberg commented Sep 12, 2019

I don't think there is, it was just lazyness. The only argument is that it fits to indexing in the sense that in indexing ... ensures an array/view return. But a singleton is probably better, already since it is more descriptive.

@eric-wieser
Copy link
Member Author

eric-wieser commented Sep 12, 2019

I remember there were some objectiosn to singletons regarding np.never_copy (@rgommers?) - but in a world where third party libraries are going to start receiving np.leave_wrapped in __array_ufunc__, they'll have a much better time if the repr and doc of that object tells them what they need to do, than they would if they just get ... without explanation

@seberg
Copy link
Member

seberg commented Sep 12, 2019

Yeah, it is somewhat (maybe too) convenient to write. For third parties overriding with __array_ufunc__ it might be very confusing to suddenly see Ellipsis crop up. I opened gh-14489, but it does not cover reductions at the moment (not sure that matters), and is written with Ellipsis.

@seberg
Copy link
Member

seberg commented Sep 12, 2019

Considering how simple this is (to a part because the arguments are parsed multiple times...), I would not even mind to use a private sentinel, just to use it internally.

@mhvk
Copy link
Contributor

mhvk commented Sep 12, 2019

Great to see an actual PR! On the sentinel, just to be sure it has been considered and rejected consciously, what about the idea of using an explicit None for "allocate array" (including 0d) and a new sentinel for keeping the behaviour of "allocate array but cast to scalar if 0d"? As I noted in #13105 (comment), this would make ufuncs consistent with the documentation.

@seberg
Copy link
Member

seberg commented Sep 12, 2019

@mhvk None is already accepted to be identical to not passing anything in, for multiple outputs you can also do (out_arr, None) to only use one out.

@mhvk
Copy link
Contributor

mhvk commented Sep 12, 2019

@seberg - I realize that one can do None already, but the docstring states that in that case an array is allocated. Which for scalar output is not actually the case. My suggestion was to fix the inconsistency between docstring and code by fixing the code - but allowing an option to still decay to scalar for those that might need it (which could even be the default if no explicit out is provided).

I guess the main question is what the hoped-for end-state is. If it is to (eventually) return array scalars for 0-dim output, then perhaps my suggestion makes that easier.

Obviously, though, this presumes that very little code will rely on the output being an actual scalar rather than an array scalar. I think this may well be OK, since making it an array scalar does not limit anything (but does allow new things, like using it as an out in further ufuncs).

Anyway, just a thought. Your PR that at least allows one to ask for array scalars to be returned is a big improvement already.

@seberg
Copy link
Member

seberg commented Sep 12, 2019

Fair point, although I am slightly hesitant to change behaviour to the documentation, and am not sure if I like None and no input to behave differently, simply because that is not the typical pattern in python.

@mhvk
Copy link
Contributor

mhvk commented Sep 12, 2019

Agreed. I guess options are:

  1. Make code consistent with docs and never unwrap scalars.
  2. As (1), but allow np.unwrap_scalar.
  3. As (2), and make it default if nothing passed in (like np._NoValue)
  4. Make docs consistent with code, but allow np.leave_wrapped.

@seberg
Copy link
Member

seberg commented Sep 12, 2019

Long term, the last option would be to preserve scalars if passed in. That is in parts weird, but especially in a world where you could already tack on __array_ufunc__ to the scalars to achieve the same, I personally still like it.

I hate the automatic unwrapping in general. I think it is just wrong and was tacked on in an attempt to speed things up, missing the point that scalars are only reasonably produced in very few places (such as reductions) where it is actually not unreasonable to return true scalars (not weird array scalars).

@mhvk
Copy link
Contributor

mhvk commented Sep 12, 2019

Good point about scalars to scalars - preserving those seems logical but then one definitely needs a leave_wrapped argument for things like np.fix.

Also agreed that the automatic unwrapping is just wrong (for normal calls at least), which is partially why I'm trying to think how we can get to a state where that doesn't happen...

@seberg
Copy link
Member

seberg commented Sep 12, 2019

It is a general issue, about what np.asarray does, for UFuncs, my plan would be to make the order:

  1. Discovery of dtype + dimension
  2. Dispatch to UFuncImpl (meaning all dtypes are specified and fixed)
  3. Do the actual array coercion and UFuncImpl execution.
  4. You could think about things like scalar inputs -> scalar outputs.

And about actually changing anything... Well, right now I do not see how to get around doing a compatibility breaking change (at which point I suppose the new behaviour here may force array outputs, even if we decide against that for all scalar inputs). Maybe we should do a "lets see what happens" release at some point by making an env-variable which is only on during a prerelease at best.

@eric-wieser
Copy link
Member Author

I would lean towards @mhvk's 4, since it gives us all the tools we need to remove internal hacks, without breaking "leaf" users (we will likely break __array_ufunc__ implementers in some places, but they're normally easy to find).

@mhvk
Copy link
Contributor

mhvk commented Oct 10, 2023

Belatedly reviving the option to just not convert to scalars ever as a proposal for numpy 2.0 in #24897.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
23 - Wish List 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified.
Projects
None yet
Development

No branches or pull requests

6 participants