New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: np
-coercible array-likes are not accepted with array API flag set
#19118
Comments
https://scipy.github.io/devdocs/dev/api-dev/array_api.html currently says about the flag:
And that the standard
Is that what we want it to do? That is, without the flag, original behavior is maintained - we always use vanilla NumPy - and with the flag, we use
Maybe, but maybe it should be |
Looking at the code, the refusal of array-likes seems intentional to me. However, the RFC is pretty clear:
But yes, if it is the case that some array-likes are not currently "converted to NumPy arrays", then this quotation seems to just be saying 'preserve the current behaviour'. The parts of |
It's was indeed an intentional behavior as Matt describes. We had a few iterations on that during the first PR and said that the new behavior, flag ON, should only accept standard arrays. Hence fail with array like and we don't coerce (also coercing would mean that we would make an implicit decision in case of GPU). Often times, we see that in tests, the use of lists is just a convenience and not really motivated. (I am not opposed to any change, just explaining the background of the current state.) |
Indeed. I could have spelled this out even clearer in the RFC I think (using the word "array-like"), but my intent was not to change how array-like's work today. I only intended for known-problematic cases like
Either this, or we may want to use
I think I missed that discussion during review of gh-18668, and I cannot find it back there. Maybe it's in one of the many resolved comments; that's pretty hard to search for. |
While we're on this again, can we add parameters to |
Those are two separate topics, which I think need qualitatively different solutions:
|
Object Arrays
Initially, that's fine. Eventually, I think there is value in making this public. It is time consuming to ensure that functions return accurate results for all valid inputs. For example, I just merged gh-18714 after dozens of similar PRs. Going forward, I think it is more efficient to use arbitrary precision arithmetic to help us measure and document the reliable domains of float64 implementations, and we'd allow users to rely on arbitrary precision arithmetic for values outside the documented domain. The reliable domains of float64 implementations can be expanded as time permits (e.g. when there are no bugs to fix) or when there are both interested contributors and reviewers.
Yes. It would be similar to the Array API. For some functions, it would be possible to support arbitrary precision end to end; for others, it would not, and we either raise an error (better, I think) or continue in finite precision. Masked Arrays
Use of masked arrays for implementation (e.g. of NaN policy) has led to a lot of reported bugs. It can be done correctly, but it is not recommended because it has proven tricky to do so.
That's fine if we can still declare |
That's an implementation rather than an API thing though. It looks to me like it'd be better to have a single implementation, and a separate set of APIs. In the
Agreed. The other option would be to drop it completely. I'm not sure performance is the rationale for having masked arrays though - it seems more like a functionality thing to me than a performance thing. |
Okay, sounds like we do want a fix for this. Should I submit a PR of something like my draft but with |
@lucascolley Let's continue with something like your draft to keep this moving. |
OK, we can consider that in a separate thread once we start working on this again. Currently, there are a lot of
My comment about performance was about whether the implementation actually uses NumPy's masked array functionality to perform the calculations. import numpy as np
from scipy import stats
rng = np.random.default_rng(8946529483563963)
n = 100000
x = rng.normal(size=n)
mask = rng.random(size=n) > 0.5
xm = np.ma.MaskedArray(x, mask)
%timeit stats.ttest_1samp(xm, 0) # 2.82 ms ± 59.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit stats.mstats.ttest_1samp(xm, 0) # 3.84 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# for shorter slices, or if there are only a few masked elements, mstats can be a little faster
# but I suspect that consistent, correct functionality is more important than the speed But yes, my assumption has been that if we retain the ability to work with masked arrays, we can't guarantee that these will be as fast as running the function on regular arrays. |
Fully agreed. Using masked arrays causes a function to do extra work, so it's going to be slower in many cases. And we have to deal with suboptimal infrastructure from numpy here.
Sounds good to me. |
Describe your issue.
From offline discussion with @rgommers, behaviour with
np
-coercible array-likes should not have regressed in #18668, but it has.One potential solution is to attempt to use
np.asarray
incompliance_scipy
on types unrecognised byarray-api-compat
, a (definitely sub-optimal but I think correct) draft of which I have here: lucascolley@7ecfcc6.Reproducing Code Example
Error message
SciPy/NumPy/Python version and system information
The text was updated successfully, but these errors were encountered: