Skip to content

Conversation

seberg
Copy link
Member

@seberg seberg commented Mar 24, 2025

As of now a WIP, because it would be nice to see what _methods.py. Happy if someone just picks it up also, thought I would show how trivial it is now.

The API change is to allow out=..., for multi-argument outputs it is currently out=(..., ...), although I guess we should allow short-handing that (nobody needs to pick it per argument).

I.e. all of these return arrays (and unlike the last time around it works with subclasses, although right now it passes ... as context so that may confuse them or __array_ufunc__ implementations.):

np.add(1, 2, out=...)
a = np.add.reduce(np.arange(10), out=...)

There is some discussion in gh-14489 about this. I still think ... is the best, because out=np.ndarray doesn't generalize for __array_ufunc__, and yes, I like ... because it also indicates an array return in indexing. (arr[i, ...] always returns an array.)


The ufunc part works (not sure if 100% complete), but the _methods.py changes do not (and change behavior subtlely, if possibly correctly). The actual change is trivial (after I had refactored this a while ago).

I think we should do this, but the fact that the _methods.py changes are not trivial makes me think there may be some idea missing.

It may be that the solution is for these functions to use conversion helper once at the start and then convert to scalars based on that at the end though (rather than dragging around subclasses).


CC @mdhaber you may be interested in this. We could probably just put this through (with a few new tests) without the _methods.py changes hashed out. (There must be a few other low-hanging places where this would simplify code, that are not as maze-like as _methods.py.)

@jorenham
Copy link
Member

I can type it, so I like it 👌🏻

@ngoldbaum
Copy link
Member

Thanks for working on this! I think having a way to opt in to this behavior is the first and necessary step to eventually getting rid of scalars. Worth experimenting with for sure.

@seberg
Copy link
Member Author

seberg commented Mar 24, 2025

Thanks for working on this! I think having a way to opt in to this behavior is the first and necessary step to eventually getting rid of scalars.

Well, I have all the code to ensure that 0-D arrays in returns 0-D arrays (and reductions return scalars for axis=None).
The problem with that isn't making it work with NumPy. The main problem by now is seeing what it means for downstream, because they would start returning arrays (if they use np.asarray() even where a NumPy ufunc would not).

@seberg
Copy link
Member Author

seberg commented Mar 24, 2025

But yes, I agree... No matter how we evolve, having out=... for now won't stand in the way and help transitioning (as well as being immediately helpful).

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this revived! Some transcription errors, though! (see inline). Also, I'd tend to use ... everywhere it is allowed instead of Ellipsis.

I guess a more general question is whether it makes sense to have a special name for this (lust like np.new_axis is None), maybe np.keep_as_array. Although, while None has no obvious connection to adding an axis, ... has precedent for indexing (as you note), so on balance it seems fine as is. Obviously, do need to add documentation as well.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fine to start with the simple case. For documentation, I guess the main place is _add_newdocs.py for the various ufunc cases (starting around l.4900). And then the tests of course...

if (_set_full_args_out(1, out_obj, &full_args) < 0) {
if (out_obj == Py_Ellipsis) {
out_obj = NULL;
return_scalar = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you use NPY_FALSE below, maybe here too?

@@ -3487,6 +3487,7 @@ PyUFunc_GenericReduction(PyUFuncObject *ufunc,
*/
PyObject *otype_obj = NULL, *out_obj = NULL, *indices_obj = NULL;
PyObject *keepdims_obj = NULL, *wheremask_obj = NULL;
int return_scalar = 1; /* scalar return is disabled for out=... */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also use NPY_TRUE, like below?

@seberg seberg marked this pull request as ready for review March 25, 2025 20:40
@seberg
Copy link
Member Author

seberg commented Mar 25, 2025

Going to mark this ready for review. It feels a bit light on documentation, so we can merge, but we may want to search if there is some other place to document this (or maybe broadcast it out through some other channels...).

@mdhaber
Copy link
Contributor

mdhaber commented Mar 26, 2025

Thanks @seberg. Happy to see movement in this direction!

seberg added 7 commits March 26, 2025 12:06
As of now a WIP, because it would be nice to see what `_methods.py`.

The API change is to allow `out=...`, for multi-argument outputs it
is currently `out=(..., ...)`, although I guess we should allow
short-handing that (nobody needs to pick it per argument).

There is some discussion in numpygh-14489 about this.  I still think `...`
is the best, because `out=np.ndarray` doesn't generalize for
`__array_ufunc__`, and yes, I like `...` because it also indicates an
array return in indexing.  (`arr[i, ...]` always returns an array.)

---

The ufunc part works (not sure if 100% complete), but the `_methods.py`
changes do not (and change behavior subtlely, if possibly correctly).
The actual change is trivial (after I had refactored this a while ago).

I think we should do this, but the fact that the `_methods.py` changes
are not trivial makes me think there may be some idea missing.

It may be that the solution is for these functions to use conversion
helper once at the start and then convert to scalars based on that at the
end though (rather than dragging around subclasses).
This also means we don't pass it out as a context, which is probably
safer.  (it is passed as part of `return_scalar=True` already)
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
(in a sense, I care less about subclasses/arrays because I think
we jus tshouldn't return scalars there, but..._
@mhvk mhvk force-pushed the force-array-ufunc branch from 070b9dc to a717674 Compare March 26, 2025 16:06
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - this looks all good, but I thought maybe the doc changes could be a bit better. I mention these inline, but after doing so, I felt that this was just as likely to lead to more delay. So, instead, I force-pushed the simpler change to your branch, and will open another one with a bit of docs follow-up. Hope that makes sense! With that, this one is all OK as far as I'm concerned.

Alternate array object(s) in which to put the result; if provided, it
must have a shape that the inputs broadcast to. A tuple of arrays
(possible only as a keyword argument) must have length equal to the
number of outputs; use None for uninitialized outputs to be
allocated by the ufunc.
If ``out=...`` is passed, the output is guaranteed not to be an array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete "not" after guaranteed!

Alternatively, maybe rewrite a little more so that the order is followed and that the text becomes more similar between the various methods:

        Location(s) into which the result(s) are stored.
        If not provided or None, new array(s) are created by the ufunc.
        If passed as a keyword argument, can be Ellipses (``out=...``) to
        ensure an array is returned even if the result is 0-dimensional,
        or a tuple with length equal to the number of outputs (where None
        can be used for allocation by the ufunc).

A location into which the result is stored. If not provided or None,
a freshly-allocated array is returned. For consistency with
``ufunc.__call__``, if given as a keyword, this may be wrapped in a
1-element tuple.
If ``out=...`` is passed, the output is guaranteed not to be an array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either minimal change as above, or

        Location into which the result is stored.
        If not provided or None, a freshly-allocated array is returned.
        If passed as a keyword argument, can be Ellipses (``out=...``) to
        ensure an array is returned even if the result is 0-dimensional
        (which is useful especially for object dtype), or a 1-element tuple
        (latter for consistency with ``ufunc.__call__``).

@@ -5276,6 +5290,7 @@
a freshly-allocated array is returned. For consistency with
``ufunc.__call__``, if given as a keyword, this may be wrapped in a
1-element tuple.
(``out=...`` is supported, but results are never scalar).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the above, maybe change to

        Location into which the result is stored.
        If not provided or None, a freshly-allocated array is returned.
        For consistency with ``ufunc.__call__``, if passed as a keyword
        argument, can be Ellipses (``out=...``, which has the same effect
        as None as an array is always returned), or a 1-element tuple.

@@ -5357,6 +5372,7 @@
a freshly-allocated array is returned. For consistency with
``ufunc.__call__``, if given as a keyword, this may be wrapped in a
1-element tuple.
(``out=...`` is supported, but results are never scalar).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps the same as last one here.

@seberg
Copy link
Member Author

seberg commented Mar 26, 2025

So, instead, I force-pushed the simpler change to your branch

Thanks! Nothing nicer than seeing pushes to your branch (in my book) :). I'll wait for that and bring it up in the meeting tonight, but maybe we can put it in very soon then.

@mhvk
Copy link
Contributor

mhvk commented Mar 26, 2025

@seberg - my follow-up PR is at seberg#51 - this is to your branch just so can look at it. If you think it is uncontroversial, do feel free to include it in your PR. But happy to do it as follow-up too.

@ngoldbaum
Copy link
Member

I think this feature probably deserves at least a paragraph somewhere on this page: https://numpy.org/doc/stable/user/basics.ufuncs.html

@seberg
Copy link
Member Author

seberg commented Mar 26, 2025

I think this feature probably deserves at least a paragraph somewhere on this page:

Yes, Marten added that in his PR to mine, just need to manage to merge it (the github API doesn't let me merge it into my branch though, right now...)

EDIT: Went through now, must have just been a temporary/current slowness in github. (I might do a pass, but I think what Marten wrote is also good as is.)

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some doc suggestions related to ... being visually (albeit not syntactically) ambiguous, particularly in a terminal without rst rendering.

Overall looks great, this is a neat new feature that I'm sure array API implementers might be interested in.

Did this get a mailing list ping? It probably deserves one.

Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@seberg
Copy link
Member Author

seberg commented Mar 26, 2025

I'm sure array API implementers might be interested in.

Maybe interesting, but not obviously/immediately useful, I think. For that you would have to make that namespace actually use it, which is possible but not as long as it is the identical namespace.
(There is also always the worry that e.g. SciPy might actually want to return scalars to look like NumPy in some places... Although, my opinion there is that SciPy should maybe lead and avoid scalars, not NumPy)

@lucascolley
Copy link
Contributor

lucascolley commented Mar 26, 2025

I'm sure array API implementers might be interested in.

👍!

Maybe interesting, but not obviously/immediately useful, I think. For that you would have to make that namespace actually use it, which is possible but not as long as it is the identical namespace.

Relatively immediate I think. In SciPy and scikit-learn we aren't using the numpy namespace proper, everything goes through array_api_compat.numpy for code that has been converted to support the standard. So I think it would just need a PR to array-api-compat to map array_api_compat.numpy.multiply to numpy.multiply(..., out=...) etc.

There is also always the worry that e.g. SciPy might actually want to return scalars to look like NumPy in some places...

Yes, but we already have the pattern return x[()] if x.ndim == 0 else x about 50 times for places where we want to return scalars for this reason but may not end up using a ufunc (or things otherwise end up coerced back to 0D arrays). So I don't think this PR will make things tangibly worse, apart from maybe needing to use that pattern in some extra places. Our test suite should be ready for this, we have assertions which forbid 0D NumPy arrays at https://github.com/scipy/scipy/blob/main/scipy/_lib/_array_api_no_0d.py.

my opinion there is that SciPy should maybe lead and avoid scalars, not NumPy

We already do in some parts of SciPy, like special.factorial and ndimage (see https://github.com/scipy/scipy/blob/0f1fd4a7268b813fa2b844ca6038e4dfdf90084a/scipy/ndimage/_support_alternative_backends.py#L39-L41). But the general vibe I get from scipy/scipy#18768 is that it seems very difficult to make a full transition in SciPy without NumPy shifting on things like

np.array(0) * 2     # scalar, not 0d array
- np.array(0)       # scalar, not 0d-array

@seberg
Copy link
Member Author

seberg commented Mar 27, 2025

Right, if you do your compat namespace it is useful. I am fine with you doing, but it does basically cement that you cannot plan to ever get away from that.
To be fair, I don't mind that, I just feel if that is so, then maybe NumPy should export the namespace like that and drop the pretense that it is identical.

Or... I suppose maybe just live with the fact that there are two slightly different things that tend to be called by the same name.

@lucascolley
Copy link
Contributor

you cannot plan to ever get away from that.

But you said on the mailing list:

this doesn't preclude a general transition towards returning
scalars less aggressively (or never)

I understand that won't happen any time soon, but NumPy 3.0 perhaps??

@lucascolley
Copy link
Contributor

Relatively immediate I think

(Modulo difficulties with supporting this for older np versions)

Comment on lines +10 to +11
This spelling is borrowed from ``arr1d[0, ...]`` where the ``...``
also ensures a non-scalar return.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could also mention that arr0d[...] is a valid lvalue as in arr0d[...] = 3.14

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't really understand why arr[...] assignment is relevant in this context? (Besides any excuse for telling folks about ... is a good excuse ;).)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, my comment is not relevant.

From the observation that arr0d[()] is scalar and arr0d[...] is a 0D array I wrongly assumed that the first is not a valid lvalue...

But I just realized that arr0d.__setitem__((), 3.14) and arr0d.__setitem__(Ellipsis, 3.14) are prefectly equivalent, so please forget my comment.

@seberg
Copy link
Member Author

seberg commented Apr 1, 2025

Since there were no comments on the mailing list, and I don't think it's a huge change (with an approval), I'll plan to merge this in a few days unless there are comments.

@lucascolley
Copy link
Contributor

do you have any further thoughts on #28576 (comment) ? Of course not blocking for this PR.

@seberg
Copy link
Member Author

seberg commented Apr 1, 2025

This summarizes most of it: #28576 (comment), I have repeated my opinions on this a few times recently...

I understand that won't happen any time soon, but NumPy 3.0 perhaps??

Maybe, but I am personally not willing to look into getting rid of scalars entirely (just explaining my opinion here over and over is enough ;)).
I am simply not fully convinced it is the right/feasible:

  1. The pain we feel doesn't come from scalars (yes it causes implementation pain). It origiates from converting 0-D arrays to scalars so they sneaking up like ninjas.
  2. Scalars do have some advantages:
    • They are not containers (and shouldn't pretend to be). If they are, they may be very different containers (e.g. strings).
    • Scalars are e.g. immutable.
    • speed
  3. I am not convinced that the transition wouldn't be utterly horrible and devastating until someone tries around a bit, and I won't be the one to do it. For example, I could imagine that pandas would require a lot of work if you were to do it.

So, the thing I have already is a branch that stop converting to 0-D arrays while returning scalars where it makes sense to do so. (See other issue.)
I can also see DTypes deciding they don't have a scalar (effectively), or even a context manager to enable this or enable my other branch optionally if things are hard...
The problem with that branch isn't actually so much NumPy, it is more figuring out what SciPy needs to do to align itself (in whatever way it wants), or what we need to provide to allow that.

Apparently, I haven't yet pointed to there from this particular PR, but this discussion should be in: #24897 or similar issues.

@seberg
Copy link
Member Author

seberg commented Apr 2, 2025

OK, thanks everyone, I'll put this in also because I suspect it is relevant to gh-28624.

@seberg seberg merged commit 72e1458 into numpy:main Apr 2, 2025
72 checks passed
@seberg seberg deleted the force-array-ufunc branch April 2, 2025 12:20
MaanasArora pushed a commit to MaanasArora/numpy that referenced this pull request Apr 11, 2025
* API,ENH: Allow forcing an array result in ufuncs

This allows using `out=...` for ufuncs (including reductions) to enforce an array result.
With it, the result will never be converted to a scalar, but subclasses are preserved
normally.
(A subclass might still misbehave and convert to a scalar anyway, but it would be a bug there.)

There is some discussion in numpygh-14489 about this.  I still think `...`
is the best, because `out=np.ndarray` doesn't generalize for
`__array_ufunc__`, and yes, I like `...` because it also indicates an
array return in indexing.  (`arr[i, ...]` always returns an array.)

It uses this in a few places, although there are many more, e.g. `_methods.py`
should probably use it, but it is very subtle there.
(Also because some paths are probably subtly wrong for object dtype.)

---------

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Co-authored-by: Marten H. van Kerkwijk <mhvk@astro.utoronto.ca>
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants