-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add Array API 2023.12 version support #26724
Conversation
862c8ec
to
005e41e
Compare
Hi @seberg, This PR will add support for 2023.12 Array API standard that NumPy aims to implement.
With these items merged we will be 2023.12 compliant for 2.1.0 release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some in-line comments. Main ones are that any new or changed API should be properly documented, including a changelog entry.
For cumulative_sum
, arguably a better way would be to implement include_initial
- though that is obviously a lot more work, changing the implementation of accumulate
for the ufuncs. But it would imply one can actually include any initial.
numpy/_core/_methods.py
Outdated
@@ -98,8 +98,8 @@ def _count_reduce_items(arr, axis, keepdims=False, where=True): | |||
|
|||
def _clip(a, min=None, max=None, out=None, **kwargs): | |||
if min is None and max is None: | |||
raise ValueError("One of max or min must be given") | |||
|
|||
# TODO: is there a better way to return identity? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine.
But it implies a (small) API change, which perhaps is worth a changelog entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then out
argument is ignored. But what if a user passed it? Throw an error?
I wonder about something like:
def identity(x, out=None):
if out is None:
return x
else:
# TODO: move x contents to out
...
return out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, you are right. And perhaps in general one would expect a copy, like for np.minimum
and np.maximum
. In that case, how about return np.positive(x, out=out)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.positive
will work here - thank you!
numpy/_core/fromnumeric.py
Outdated
|
||
|
||
@array_function_dispatch(_clip_dispatcher) | ||
def clip(a, a_min, a_max, out=None, **kwargs): | ||
def clip(a, a_min=_NotPassed, a_max=_NotPassed, out=None, *, min=_NotPassed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just use np._NoValue
- that is meant for cases like these (and looks better in the documentation!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/fromnumeric.py
Outdated
Array API compatible cumulative_sum | ||
""" | ||
if out is not None and include_initial: | ||
raise ValueError("not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message should be a bit clearer about what is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right, still WIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, yes, should have seen that this is a draft!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, done!
numpy/_core/fromnumeric.py
Outdated
raise ValueError("not supported") | ||
|
||
if x.ndim >= 2 and axis is None: | ||
raise ValueError("requires axis arg") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, expand the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/fromnumeric.py
Outdated
initial_shape = list(x.shape) | ||
initial_shape[axis] = 1 | ||
res = np.concat( | ||
[np.zeros(initial_shape, dtype=res.dtype), res], axis=axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, seems a bit of a missed opportunity to actually allow initial
to be passed on too. But easy to adjust if that were to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Let's keep additive and multiplicative identity as initial
for cumulative_sum
and cumulative_prod
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the plan is to update accumulate
, or add a separate exclusive scan method, I would include this behavior there. I see cumulative_sum
/cumsum
as a shorthand for the common sum.accumulate
, where more advanced cases should be accessed by using the ufunc methods.
And anyways, you can always do cumulative_sum(x, include_initial=True) + initial
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in this PR we can ship include_initial
in cumulative_sum
only to make it to 2.1.0
release with 2023.12 compatibility. Updating accumulate
ufunc to also accept include_initial
could be addressed separately as ufunc enhancement.
numpy/_core/fromnumeric.py
Outdated
def cumulative_sum(x, /, *, axis=None, dtype=None, out=None, | ||
include_initial=False): | ||
""" | ||
Array API compatible cumulative_sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is presumably going to be exposed in the main namespace, this needs a proper docstring, including the differences with regular cumsum
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@mhvk could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, some comments inline, but a larger one on passing in out
and include_initial
: this is not too hard to do, with the following replacement,
def _cumulative_func(x, func, axis, dtype, out, include_initial):
x_ndim = ndim(x)
if axis is None:
if x_ndim >= 2:
raise ValueError("For arrays which have more than one dimension "
"``axis`` argument is required.")
axis = 0
if out is not None and include_initial:
item = [slice(None)] * x_ndim
item[axis] = slice(1, None)
func.accumulate(x, axis=axis, dtype=dtype, out=out[tuple(item)])
item[axis] = 0
out[tuple(item)] = func.identity
return out
res = func.accumulate(x, axis=axis, dtype=dtype, out=out)
if include_initial:
initial_shape = list(x.shape)
initial_shape[axis] = 1
res = np.concat(
[np.full_like(res, func.identity, shape=initial_shape), res],
axis=axis
)
return res
.github/workflows/linux.yml
Outdated
@@ -243,6 +243,7 @@ jobs: | |||
python -m pip install -r requirements/build_requirements.txt | |||
python -m pip install -r requirements/test_requirements.txt | |||
python -m pip install -r array-api-tests/requirements.txt | |||
python -m pip install --upgrade hypothesis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? Why not set a minimum version instead? (just coriosity, but the changes to the workflow seem unrelated to the PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test_requirements.txt
there's hypothesis==6.81.1
. array-api-tests
suite requires newer version so I updated it to the latest hypothesis==6.104.1
.
@ngoldbaum I see that you updated the version a year ago. Is there a reason not to use >=
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhvk I removed that --upgrade
line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point NumPy used a bot to keep the test dependencies regularly updated, but it caused a lot of noise on forks of NumPy due to a bug and we decided to turn that off. That means the test requirements are only updated as people notice they are out of date. Please feel free to update the hypothesis version in the test requirements and if it doesn't affect anything else in CI it should be fine.
We do want pinned versions though to make it less likely that an upstream version change will suddenly break our CI.
compatible alternatives for `numpy.cumsum` and `numpy.cumprod`. | ||
* `numpy.clip` now supports ``max`` and ``min`` keyword arguments which are meant | ||
to replace ``a_min`` and ``a_max``. Also, for ``np.clip(a)`` or ``np.clip(a, None, None)`` | ||
an identity will be returned instead of raising an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "a copy of the input array will be ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/fromnumeric.py
Outdated
return (a, a_min, a_max) | ||
def _clip_dispatcher(a, a_min=None, a_max=None, out=None, *, min=None, | ||
max=None, **kwargs): | ||
return (a,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong: there is no reason one couldn't dispatch on the minimum and maximum too (they can be arrays), so I think this should become return (a, a_min, a_max, min, max)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I made it (a, a_min, a_max, out, min, max)
as other functions here also put out
here. Done!
numpy/_core/fromnumeric.py
Outdated
@@ -2283,6 +2292,19 @@ def clip(a, a_min, a_max, out=None, **kwargs): | |||
array([3, 4, 2, 3, 4, 5, 6, 7, 8, 8]) | |||
|
|||
""" | |||
if a_min is np._NoValue and a_max is np._NoValue: | |||
a_min = None if min == np._NoValue else min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace ==
with is
here and on the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/fromnumeric.py
Outdated
raise ValueError("Passing ``out`` and ``include_initial=True`` " | ||
"at the same time is not supported.") | ||
|
||
x = asarray(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use np.asanyarray(x)
here, so that for astropy I don't have to override these new functions!
Alternatively, omit this whole line and use if np.ndim(x) >= 2
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used np.atleast1d(x)
as for scalars func.accumulate
throws an error. Internally atleast1d
uses asanyarray
. Used np.ndim(x)
here also anyway.
numpy/_core/fromnumeric.py
Outdated
for more details. | ||
include_initial : bool, optional | ||
Boolean indicating whether to include the initial value (zeros) as | ||
the first value in the output. ``include_initial=True`` changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this constraint necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constraint removed.
numpy/_core/fromnumeric.py
Outdated
|
||
Examples | ||
-------- | ||
>>> a = np.array([1,2,3,4,5,6]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do insert spaces after the ,
in the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/fromnumeric.py
Outdated
1000000.0050000029 | ||
|
||
""" | ||
initial_func = np.zeros if include_initial else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my suggestion, the full implementation becomes,
return _cumulative_func(x, um.add, axis, dtype, out, include_initial)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/fromnumeric.py
Outdated
@@ -2643,6 +2665,197 @@ def all(a, axis=None, out=None, keepdims=np._NoValue, *, where=np._NoValue): | |||
keepdims=keepdims, where=where) | |||
|
|||
|
|||
def _cumulative_func(x, func, axis, dtype, out, initial_func): | |||
if out is not None and initial_func is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really necessary, since we know where the output is placed (see below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/fromnumeric.py
Outdated
|
||
res = func(x, axis=axis, dtype=dtype, out=out) | ||
|
||
if initial_func: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would become if include_initial
with my suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A leftover...
numpy/_core/numeric.py
Outdated
@@ -2630,6 +2635,11 @@ def astype(x, dtype, /, *, copy = True): | |||
raise TypeError( | |||
f"Input should be a NumPy array. It is a {type(x)} instead." | |||
) | |||
if device not in ["cpu", None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: make it device is not None and device != "cpu"
to speed up the 99.999999% of the cases where the default is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@mhvk Thank you for a thorough review! I applied all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but for a few very minor nits. Maybe squash commits while you are at it?
numpy/_core/fromnumeric.py
Outdated
@@ -2643,6 +2665,202 @@ def all(a, axis=None, out=None, keepdims=np._NoValue, *, where=np._NoValue): | |||
keepdims=keepdims, where=where) | |||
|
|||
|
|||
def _cumulative_func(x, func, axis, dtype, out, include_initial): | |||
x = np.atleast_1d(x) | |||
x_ndim = ndim(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on atleast_1d
. But now using ndim
is no longer required, so I would just replace with x.ndim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -0,0 +1,6 @@ | |||
* `numpy.cumulative_sum` and `numpy.cumulative_prod` were added as Array API | |||
compatible alternatives for `numpy.cumsum` and `numpy.cumprod`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it useful to mention the differences? I.e., that one cannot pass in an initial
, but can include_initial
in the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added one line about include_initial
functionality. Done!
numpy/_core/fromnumeric.py
Outdated
|
||
Examples | ||
-------- | ||
>>> a = np.array([1,2,3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are at it, one more space after comma...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/fromnumeric.py
Outdated
|
||
The cumulative product for each column (i.e., over the rows) of ``b``: | ||
|
||
>>> b = np.array([[1,2,3], [4,5,6]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Numpy recently merged support for the 2023.12 revision of the Array API: numpy/numpy#26724 This breaks two of our tests and I've chosen to skip those tests for now: 1. The first breakage was caused by differences in how numpy and JAX cast negative floats to `uint8`. Specifically `np.float32(-1).astype(np.uint8)` returns `np.uint8(255)` whereas `jnp.float32(-1).astype(jnp.uint8)` produces `Array(0, dtype=uint8)`. We don't make any promises about consistency with casting floats to ints, noting that this can even be backend dependent. I don't believe this failure is identifying any unexpected behavior, and we test many other dtypes properly so I'm not concerned about skipping this test. 2. The second failure was caused by the fact that the approach we took in google#20550 to support backwards compatibility and the Array API for `clip` differs from the one used in numpy/numpy#26724. Again, the behavior is consistent, but it produces a different signature. I've skipped checking `clip`'s signature, but we should revisit it once the `a_min` and `a_max` parameters have been removed from JAX. Fixes google#22251
Numpy recently merged support for the 2023.12 revision of the Array API: numpy/numpy#26724 This breaks two of our tests: 1. The first breakage was caused by differences in how numpy and JAX cast negative floats to `uint8`. Specifically `np.float32(-1).astype(np.uint8)` returns `np.uint8(255)` whereas `jnp.float32(-1).astype(jnp.uint8)` produces `Array(0, dtype=uint8)`. We don't make any promises about consistency with casting floats to ints, noting that this can even be backend dependent. To fix our test, we now only generate positive inputs when the output dtype is unsigned. 2. The second failure was caused by the fact that the approach we took in google#20550 to support backwards compatibility and the Array API for `clip` differs from the one used in numpy/numpy#26724. Again, the behavior is consistent, but it produces a different signature. I've skipped checking `clip`'s signature, but we should revisit it once the `a_min` and `a_max` parameters have been removed from JAX. Fixes google#22251
This PR upgrades Array API and
array-api-tests
suite to2023.12
version:np.cumulative_sum
andnp.cumulative_prod
.min
andmax
keyword arguments tonp.clip
. Allows passingNone
s for botha_min
anda_max
.device
keyword argument tonp.astype
.