-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
PERF: allow creation of read-only array views of scalars #29876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In ufuncs in particular, a bottleneck for dealing with scalars is turning them into arrays. This PR adds a short-cut for those classes in PyArray_FromAny, where if the array is not specifically requested to be writeable, a readable view of the scalar is returned. For python scalars, only float is supported; int and complex are more tricky (strings and bytes should presumably be possible, but I didn't add them yet). Timings: ``` a = np.float64(1.5) %timeit np.array(a, copy=False) # Comparing with copy=True on main. 167->110 ns for float64, 94 ns for float %timeit np.add(a, a) 512->431 ns for float64, a = 1.5 %timeit np.array(a, copy=False) # Comparing with copy=True on main. 167->110 ns for float64, 94 ns for float %timeit np.add(a, a) 526->414 ns ``` An annoyance is that quite a bit of code relies on `np.asanyarray(scalar)` to return a writeable array. As implemented, this is ensured by checking the `PyArray_FORCECAST` flag, which can be safely unset for `copy=False`. As a result, however, some of the error messages change in nature. This could be resolved with a new flag, if needed.
|
Hmmmm, returning a read-only view seems totally fine to me. But I am a bit surprised that it makes a 100ns difference. I am still curious if we can't do fast path for scalar array creation deeper, but I guess I should try that myself a bit. I am also worried that this very much oversimplifies things to the point of being incorrect. One thing might be to check that |
|
Prodding it a bit, |
|
@seberg - I think the problem is real, but I am not sure this is the best way forward - in particular, I don't quite like the hack I did to make The reason Note that I did do timings at various places, and the memory allocation is definitely a problem - though not so much the allocation itself, but rather getting
By replacing that part with a simple With my laptop in performance mode to try to make it more stable (so everything is faster), I think on second thought this may be the better route. The main annoyance is that one has to deal with different options in |
|
I was prodding it a bit myself @ngoldbaum once suggest But I got a bit side-tracked in
I wouldn't even try to use
Yes, we can't avoid a bit at least, but looking at the profile there is a silly amount spend in EDIT:
Hmmmm, more significant than I would have thought. Maybe a reason for merging allocations for small arrays. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Where can I find this |
|
Hah, true it's a bit hidden, conda seems to have it https://github.com/mstange/samply |
|
Moving this to draft since I think it is not a bad idea but would definitely need to be "on request" -- and it is not clear there is a need for it if storing the data on the array instance works well enough. |
With gh-28576, it has become possible to ensure that when calling a ufunc, the output is guaranteed to be an array. This PR uses that to replace np.asanyarray calls in some functions. I'm not sure this is complete -- these were just parts of code for which test failed if np.anyarray(scalar) is made to return a read-only array (see gh-29876). I do think there are all small improvements, though.
In ufuncs in particular, a bottleneck for dealing with scalars is turning them into arrays. This PR adds a short-cut for those classes in PyArray_FromAny, where if the array is not specifically requested to be writeable, a readable view of the scalar is returned. For python scalars, only float is supported; int and complex are more tricky (strings and bytes should presumably be possible, but I didn't add them yet).
Timings:
An annoyance is that quite a bit of code relies on
np.asanyarray(scalar)to return a writeable array. As implemented, this is ensured by checking thePyArray_FORCECASTflag, which can be safely unset forcopy=False. As a result, however, some of the error messages change in nature. This could be resolved with a new flag, if needed. Alternatively, this whole machinery could just be moved toconvert_ufunc_arguments, since those arguably benefit most from this.