-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
API,ENH: Allow forcing an array result in ufuncs #28576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I can type it, so I like it 👌🏻 |
Thanks for working on this! I think having a way to opt in to this behavior is the first and necessary step to eventually getting rid of scalars. Worth experimenting with for sure. |
Well, I have all the code to ensure that 0-D arrays in returns 0-D arrays (and reductions return scalars for |
But yes, I agree... No matter how we evolve, having |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see this revived! Some transcription errors, though! (see inline). Also, I'd tend to use ...
everywhere it is allowed instead of Ellipsis
.
I guess a more general question is whether it makes sense to have a special name for this (lust like np.new_axis is None
), maybe np.keep_as_array
. Although, while None
has no obvious connection to adding an axis, ...
has precedent for indexing (as you note), so on balance it seems fine as is. Obviously, do need to add documentation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fine to start with the simple case. For documentation, I guess the main place is _add_newdocs.py
for the various ufunc
cases (starting around l.4900). And then the tests of course...
numpy/_core/src/umath/ufunc_object.c
Outdated
if (_set_full_args_out(1, out_obj, &full_args) < 0) { | ||
if (out_obj == Py_Ellipsis) { | ||
out_obj = NULL; | ||
return_scalar = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you use NPY_FALSE
below, maybe here too?
numpy/_core/src/umath/ufunc_object.c
Outdated
@@ -3487,6 +3487,7 @@ PyUFunc_GenericReduction(PyUFuncObject *ufunc, | |||
*/ | |||
PyObject *otype_obj = NULL, *out_obj = NULL, *indices_obj = NULL; | |||
PyObject *keepdims_obj = NULL, *wheremask_obj = NULL; | |||
int return_scalar = 1; /* scalar return is disabled for out=... */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also use NPY_TRUE
, like below?
Going to mark this ready for review. It feels a bit light on documentation, so we can merge, but we may want to search if there is some other place to document this (or maybe broadcast it out through some other channels...). |
Thanks @seberg. Happy to see movement in this direction! |
As of now a WIP, because it would be nice to see what `_methods.py`. The API change is to allow `out=...`, for multi-argument outputs it is currently `out=(..., ...)`, although I guess we should allow short-handing that (nobody needs to pick it per argument). There is some discussion in numpygh-14489 about this. I still think `...` is the best, because `out=np.ndarray` doesn't generalize for `__array_ufunc__`, and yes, I like `...` because it also indicates an array return in indexing. (`arr[i, ...]` always returns an array.) --- The ufunc part works (not sure if 100% complete), but the `_methods.py` changes do not (and change behavior subtlely, if possibly correctly). The actual change is trivial (after I had refactored this a while ago). I think we should do this, but the fact that the `_methods.py` changes are not trivial makes me think there may be some idea missing. It may be that the solution is for these functions to use conversion helper once at the start and then convert to scalars based on that at the end though (rather than dragging around subclasses).
This also means we don't pass it out as a context, which is probably safer. (it is passed as part of `return_scalar=True` already)
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
(in a sense, I care less about subclasses/arrays because I think we jus tshouldn't return scalars there, but..._
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - this looks all good, but I thought maybe the doc changes could be a bit better. I mention these inline, but after doing so, I felt that this was just as likely to lead to more delay. So, instead, I force-pushed the simpler change to your branch, and will open another one with a bit of docs follow-up. Hope that makes sense! With that, this one is all OK as far as I'm concerned.
numpy/_core/_add_newdocs.py
Outdated
Alternate array object(s) in which to put the result; if provided, it | ||
must have a shape that the inputs broadcast to. A tuple of arrays | ||
(possible only as a keyword argument) must have length equal to the | ||
number of outputs; use None for uninitialized outputs to be | ||
allocated by the ufunc. | ||
If ``out=...`` is passed, the output is guaranteed not to be an array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete "not" after guaranteed!
Alternatively, maybe rewrite a little more so that the order is followed and that the text becomes more similar between the various methods:
Location(s) into which the result(s) are stored.
If not provided or None, new array(s) are created by the ufunc.
If passed as a keyword argument, can be Ellipses (``out=...``) to
ensure an array is returned even if the result is 0-dimensional,
or a tuple with length equal to the number of outputs (where None
can be used for allocation by the ufunc).
numpy/_core/_add_newdocs.py
Outdated
A location into which the result is stored. If not provided or None, | ||
a freshly-allocated array is returned. For consistency with | ||
``ufunc.__call__``, if given as a keyword, this may be wrapped in a | ||
1-element tuple. | ||
If ``out=...`` is passed, the output is guaranteed not to be an array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either minimal change as above, or
Location into which the result is stored.
If not provided or None, a freshly-allocated array is returned.
If passed as a keyword argument, can be Ellipses (``out=...``) to
ensure an array is returned even if the result is 0-dimensional
(which is useful especially for object dtype), or a 1-element tuple
(latter for consistency with ``ufunc.__call__``).
numpy/_core/_add_newdocs.py
Outdated
@@ -5276,6 +5290,7 @@ | |||
a freshly-allocated array is returned. For consistency with | |||
``ufunc.__call__``, if given as a keyword, this may be wrapped in a | |||
1-element tuple. | |||
(``out=...`` is supported, but results are never scalar). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the above, maybe change to
Location into which the result is stored.
If not provided or None, a freshly-allocated array is returned.
For consistency with ``ufunc.__call__``, if passed as a keyword
argument, can be Ellipses (``out=...``, which has the same effect
as None as an array is always returned), or a 1-element tuple.
numpy/_core/_add_newdocs.py
Outdated
@@ -5357,6 +5372,7 @@ | |||
a freshly-allocated array is returned. For consistency with | |||
``ufunc.__call__``, if given as a keyword, this may be wrapped in a | |||
1-element tuple. | |||
(``out=...`` is supported, but results are never scalar). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And perhaps the same as last one here.
Thanks! Nothing nicer than seeing pushes to your branch (in my book) :). I'll wait for that and bring it up in the meeting tonight, but maybe we can put it in very soon then. |
I think this feature probably deserves at least a paragraph somewhere on this page: https://numpy.org/doc/stable/user/basics.ufuncs.html |
Yes, Marten added that in his PR to mine, just need to manage to merge it (the github API doesn't let me merge it into my branch though, right now...) EDIT: Went through now, must have just been a temporary/current slowness in github. (I might do a pass, but I think what Marten wrote is also good as is.) |
Ufunc array out doc followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some doc suggestions related to ...
being visually (albeit not syntactically) ambiguous, particularly in a terminal without rst rendering.
Overall looks great, this is a neat new feature that I'm sure array API implementers might be interested in.
Did this get a mailing list ping? It probably deserves one.
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
Maybe interesting, but not obviously/immediately useful, I think. For that you would have to make that namespace actually use it, which is possible but not as long as it is the identical namespace. |
👍!
Relatively immediate I think. In SciPy and scikit-learn we aren't using the
Yes, but we already have the pattern
We already do in some parts of SciPy, like np.array(0) * 2 # scalar, not 0d array
- np.array(0) # scalar, not 0d-array |
Right, if you do your compat namespace it is useful. I am fine with you doing, but it does basically cement that you cannot plan to ever get away from that. Or... I suppose maybe just live with the fact that there are two slightly different things that tend to be called by the same name. |
But you said on the mailing list:
I understand that won't happen any time soon, but NumPy 3.0 perhaps?? |
(Modulo difficulties with supporting this for older np versions) |
This spelling is borrowed from ``arr1d[0, ...]`` where the ``...`` | ||
also ensures a non-scalar return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could also mention that arr0d[...]
is a valid lvalue as in arr0d[...] = 3.14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't really understand why arr[...]
assignment is relevant in this context? (Besides any excuse for telling folks about ...
is a good excuse ;).)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, my comment is not relevant.
From the observation that arr0d[()]
is scalar and arr0d[...]
is a 0D array I wrongly assumed that the first is not a valid lvalue...
But I just realized that arr0d.__setitem__((), 3.14)
and arr0d.__setitem__(Ellipsis, 3.14)
are prefectly equivalent, so please forget my comment.
Since there were no comments on the mailing list, and I don't think it's a huge change (with an approval), I'll plan to merge this in a few days unless there are comments. |
do you have any further thoughts on #28576 (comment) ? Of course not blocking for this PR. |
This summarizes most of it: #28576 (comment), I have repeated my opinions on this a few times recently...
Maybe, but I am personally not willing to look into getting rid of scalars entirely (just explaining my opinion here over and over is enough ;)).
So, the thing I have already is a branch that stop converting to 0-D arrays while returning scalars where it makes sense to do so. (See other issue.) Apparently, I haven't yet pointed to there from this particular PR, but this discussion should be in: #24897 or similar issues. |
OK, thanks everyone, I'll put this in also because I suspect it is relevant to gh-28624. |
* API,ENH: Allow forcing an array result in ufuncs This allows using `out=...` for ufuncs (including reductions) to enforce an array result. With it, the result will never be converted to a scalar, but subclasses are preserved normally. (A subclass might still misbehave and convert to a scalar anyway, but it would be a bug there.) There is some discussion in numpygh-14489 about this. I still think `...` is the best, because `out=np.ndarray` doesn't generalize for `__array_ufunc__`, and yes, I like `...` because it also indicates an array return in indexing. (`arr[i, ...]` always returns an array.) It uses this in a few places, although there are many more, e.g. `_methods.py` should probably use it, but it is very subtle there. (Also because some paths are probably subtly wrong for object dtype.) --------- Signed-off-by: Sebastian Berg <sebastianb@nvidia.com> Co-authored-by: Marten H. van Kerkwijk <mhvk@astro.utoronto.ca> Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
As of now a WIP, because it would be nice to see what
_methods.py
. Happy if someone just picks it up also, thought I would show how trivial it is now.The API change is to allow
out=...
, for multi-argument outputs it is currentlyout=(..., ...)
, although I guess we should allow short-handing that (nobody needs to pick it per argument).I.e. all of these return arrays (and unlike the last time around it works with subclasses, although right now it passes
...
as context so that may confuse them or__array_ufunc__
implementations.):There is some discussion in gh-14489 about this. I still think
...
is the best, becauseout=np.ndarray
doesn't generalize for__array_ufunc__
, and yes, I like...
because it also indicates an array return in indexing. (arr[i, ...]
always returns an array.)The ufunc part works (not sure if 100% complete), but the
_methods.py
changes do not (and change behavior subtlely, if possibly correctly). The actual change is trivial (after I had refactored this a while ago).I think we should do this, but the fact that the
_methods.py
changes are not trivial makes me think there may be some idea missing.It may be that the solution is for these functions to use conversion helper once at the start and then convert to scalars based on that at the end though (rather than dragging around subclasses).
CC @mdhaber you may be interested in this. We could probably just put this through (with a few new tests) without the
_methods.py
changes hashed out. (There must be a few other low-hanging places where this would simplify code, that are not as maze-like as_methods.py
.)