-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Generalize and shorten the ufunc "trivially iterable" path #18836
Conversation
(PyArray_FLAGS(arr2)&(NPY_ARRAY_C_CONTIGUOUS| \ | ||
NPY_ARRAY_F_CONTIGUOUS)) \ | ||
((PyArray_FLAGS(arr1) & (NPY_ARRAY_C_CONTIGUOUS| \ | ||
NPY_ARRAY_F_CONTIGUOUS)) == \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I will undo this... Its a bit tricky (correct, but its not really better as is, and the points in the code still using it should maybe just be modified).
The point is that if the arrays are 1-D, they don't have to be contiguous at all, so this check could be skipped entirely if both are 1-D. My new code does this, which makes things like:
arr_1d[::2] + arr1_d[1::2]
or any other such operation where one of the operands is 1-D strided much faster (mainly lower overhead, but that can be a factor of 2 for small additions). Making it ==
works for the above, but won't work for:
arr_1d[::2] + other_arr_1d_contiguous
since then the contiguity doesn't match... that is why I think I will undo it to avoid confusion.
7fe8248
to
f7179e4
Compare
This avoids the large macros designed for specific number of arguments and generalizes it to all ufuncs with a single output (Instead of only 1 and 2 operand input ufuncs.) That is not a big generalization, but the code actually shortens and while it is slightly more complex in umath itself, it avoids the fairly complex macros as well. The `==` fixes a case where the fast-path was previously not used even though it would be valid to use it. These are still used outside the ufunc machinery.
f7179e4
to
57b483d
Compare
) \ | ||
) && \ | ||
PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK(arr1, arr2, arr1_read, arr2_read) && \ | ||
PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK(arr1, arr3, arr1_read, arr3_read) && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never care about overlap in read-only operands, so these are no-ops. (in the case of nin=2, nout=1 operations, which is the only place where we used this).
Example of the previously missed trivial-iteration case:
Before the change:
EDIT: Other cases seem to be stable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a lot nicer - I recall being confused about the macros but here the code is quite simple and nice.
Py_INCREF(dtypes[nin]); | ||
op[nin] = (PyArrayObject *) PyArray_NewFromDescr(&PyArray_Type, | ||
dtypes[nin], operation_ndim, operation_shape, | ||
NULL, NULL, operation_order==NPY_ARRAY_F_CONTIGUOUS, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maybe the trickiest logic in the whole PR. And maybe it should use an &
(the code in NewFromDescr
is weird about this). But the test suit unsurprisingly finds many errors if you just pass 0 always here, so its fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many details of ufuncs that I don't have a firm grasp of, but it is much easier to grok try_trivial_single_output_loop
than chunk it replaces 馃憤
Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Previsouly we unnecessarily required contiguous inputs for 1-D "trivially iterable". That requirement was unnecessary, but also guarded against certain self-overlaps. This was not visible because a self-overlapping array is rarely aligned (but for complex this happens typically).
Thanks @seberg |
This avoids the large macros designed for specific number of
arguments and generalizes it to all ufuncs with a single output
(Instead of only 1 and 2 operand input ufuncs.)
That is not a big generalization, but the code actually shortens
and while it is slightly more complex in umath itself, it avoids
the fairly complex macros as well.
This code is very slightly useful to me, because I have to push the
innerloop
resolution down until afterfixed_strides
is defined. Which currently happens in two spots (for 1 or 2 input ufuncs). For a bit I thought there was more to it, so this fell out.I currently slightly prefer it though, and think it is probably easier to understand then
PyArray_TRIVIALLY_ITERABLE_TRIPLE
... So would like to continue my work based off this (and this being merged soon 馃).