Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Generalize and shorten the ufunc "trivially iterable" path #18836

Merged
merged 3 commits into from
May 6, 2021

Conversation

seberg
Copy link
Member

@seberg seberg commented Apr 22, 2021

This avoids the large macros designed for specific number of
arguments and generalizes it to all ufuncs with a single output
(Instead of only 1 and 2 operand input ufuncs.)

That is not a big generalization, but the code actually shortens
and while it is slightly more complex in umath itself, it avoids
the fairly complex macros as well.


This code is very slightly useful to me, because I have to push the innerloop resolution down until after fixed_strides is defined. Which currently happens in two spots (for 1 or 2 input ufuncs). For a bit I thought there was more to it, so this fell out.

I currently slightly prefer it though, and think it is probably easier to understand then PyArray_TRIVIALLY_ITERABLE_TRIPLE... So would like to continue my work based off this (and this being merged soon 馃).

(PyArray_FLAGS(arr2)&(NPY_ARRAY_C_CONTIGUOUS| \
NPY_ARRAY_F_CONTIGUOUS)) \
((PyArray_FLAGS(arr1) & (NPY_ARRAY_C_CONTIGUOUS| \
NPY_ARRAY_F_CONTIGUOUS)) == \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I will undo this... Its a bit tricky (correct, but its not really better as is, and the points in the code still using it should maybe just be modified).

The point is that if the arrays are 1-D, they don't have to be contiguous at all, so this check could be skipped entirely if both are 1-D. My new code does this, which makes things like:

arr_1d[::2] + arr1_d[1::2]

or any other such operation where one of the operands is 1-D strided much faster (mainly lower overhead, but that can be a factor of 2 for small additions). Making it == works for the above, but won't work for:

arr_1d[::2] + other_arr_1d_contiguous

since then the contiguity doesn't match... that is why I think I will undo it to avoid confusion.

This avoids the large macros designed for specific number of
arguments and generalizes it to all ufuncs with a single output
(Instead of only 1 and 2 operand input ufuncs.)

That is not a big generalization, but the code actually shortens
and while it is slightly more complex in umath itself, it avoids
the fairly complex macros as well.

The `==` fixes a case where the fast-path was previously not
used even though it would be valid to use it. These are still
used outside the ufunc machinery.
) \
) && \
PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK(arr1, arr2, arr1_read, arr2_read) && \
PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK(arr1, arr3, arr1_read, arr3_read) && \
Copy link
Member Author

@seberg seberg Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never care about overlap in read-only operands, so these are no-ops. (in the case of nin=2, nout=1 operations, which is the only place where we used this).

@seberg
Copy link
Member Author

seberg commented Apr 22, 2021

Example of the previously missed trivial-iteration case:

In [1]: arr = np.arange(10)

In [2]: arr2 = np.arange(20)[::2]

In [3]: %timeit np.add(arr, arr2)
343 ns 卤 0.406 ns per loop (mean 卤 std. dev. of 7 runs, 1000000 loops each)

Before the change:

785 ns 卤 4.66 ns per loop (mean 卤 std. dev. of 7 runs, 1000000 loops each)

EDIT: Other cases seem to be stable.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a lot nicer - I recall being confused about the macros but here the code is quite simple and nice.

Py_INCREF(dtypes[nin]);
op[nin] = (PyArrayObject *) PyArray_NewFromDescr(&PyArray_Type,
dtypes[nin], operation_ndim, operation_shape,
NULL, NULL, operation_order==NPY_ARRAY_F_CONTIGUOUS, NULL);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe the trickiest logic in the whole PR. And maybe it should use an & (the code in NewFromDescr is weird about this). But the test suit unsurprisingly finds many errors if you just pass 0 always here, so its fine.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many details of ufuncs that I don't have a firm grasp of, but it is much easier to grok try_trivial_single_output_loop than chunk it replaces 馃憤

numpy/core/src/umath/ufunc_object.c Outdated Show resolved Hide resolved
seberg and others added 2 commits April 23, 2021 08:31
Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Previsouly we unnecessarily required contiguous inputs for
1-D "trivially iterable".  That requirement was unnecessary, but
also guarded against certain self-overlaps.

This was not visible because a self-overlapping array is rarely
aligned (but for complex this happens typically).
@mattip mattip added this to In Progress in Ufunc refactor May 5, 2021
@mattip mattip moved this from In Progress to Ready for final review in Ufunc refactor May 6, 2021
@mattip mattip merged commit 7311af8 into numpy:main May 6, 2021
@mattip
Copy link
Member

mattip commented May 6, 2021

Thanks @seberg

@mattip mattip removed this from Ready for final review in Ufunc refactor May 6, 2021
@seberg seberg deleted the refactor-ufunc-fastpath branch May 6, 2021 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants