-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial sort() by support and complex sort() change #16700
Conversation
Some initial performance comparisions In [5]: carr = np.arange(27, dtype=np.complex128)[::-1].reshape(3,3,3)
# New keysort() c function internally handling complex numbers
In [6]: %timeit carr.sort()
11.2 µs ± 749 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [7]: carr = np.arange(27, dtype=np.complex128)[::-1].reshape(3,3,3)
# Complex sorting with using lexsort+ take_ along_axis
In [8]: %timeit np.take_along_axis(carr, np.lexsort((carr.imag, carr.real,)), axis=0)
14.9 µs ± 724 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like this approach much more. It could be interesting how it holds up with other use-cases, such as:
c = np.random.random(size=(1000, 1000)) * (1 - 1j)
%timeit np.sort(c, by=(c.real, c.imag), axis=0)
%timeit np.sort(c, by=(c.real, c.imag), axis=0)
goto fail; | ||
} | ||
rit = (PyArrayIterObject *) | ||
PyArray_IterAllButAxis((PyObject *)ret, &axis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can delete rit
entirely. It is only actually used in the !needcopy
copy, since in the other path indbuffer
is used, and there is no reason to actually copy it back. You can use indbuffer
identically in that branch, since we do not actually return the index array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting rit
includes deleting ret
entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was also thinking the same. Will get it done
|| PyArray_ISBYTESWAPPED(mps[j]) | ||
|| !(PyArray_FLAGS(mps[j]) & NPY_ARRAY_ALIGNED) | ||
|| (PyArray_STRIDES(mps[j])[axis] != (npy_intp)PyArray_DESCR(mps[j])->elsize) | ||
|| (selstride != sizeof(npy_intp)) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems incorrect, self can have any datatype, so it should be selstride != PyArray_DESCR(self)->elsize
.
char *slicebuffer = PyDataMem_NEW(N*aelsize); | ||
_unaligned_strided_byte_copy(slicebuffer, aelsize, sit->dataptr, | ||
selstride, N, aelsize); | ||
npy_intp dims[1] = {N}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit, I am curious if we cannot just replace the below code with a manual memmove
, you have to move elements (elsize
), we know that every item ends up in exactly one other place, and we do not have to check the indices for out-of-bound values...
so a simple for loop with a memmove
inside could be just as well. If it turns out a bit slow for some (smaller) dtypes, we could take the approach we have in npy_fastputmask
to have the compiler generate different versions for the most common dtype sizes.
After doing that, we could experiment with an in-place algorithm, which seems like it should be possible as long as we also modify indbuffer
at the same time.
In any case, right now I think a simple for-loop with a memmove
inside is probably just as fast (no overhead and no index checking) or aster. And probably even slightly easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah will, look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do we want to allow the sorting keys to be broadcastable?
numpy/core/fromnumeric.py
Outdated
if axis is None: | ||
# flatten returns (1, N) for np.matrix, so always use the last axis | ||
a = asanyarray(a).flatten() | ||
axis = -1 | ||
else: | ||
a = asanyarray(a).copy(order="K") | ||
a.sort(axis=axis, kind=kind, order=order) | ||
|
||
if by is not None and not isinstance(by, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if by is not None and not isinstance(by, tuple): | |
if by is not None and not isinstance(by, (tuple, list)): |
To match block
.
PyArray_DIMS(mps[0]), | ||
PyArray_NDIM(mps[0])))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
PyArray_DIMS(mps[0]), | |
PyArray_NDIM(mps[0])))) { | |
PyArray_DIMS(mps[0]), | |
PyArray_NDIM(mps[0])))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of notes, and a question, do we want the sorting keys to be broadcastable to the array shape?
@hameerabbasi I am not sure, if we want it to be broadcast able, thoughts?? @seberg |
Broadcastable sounds nice, but unless lexsort has it (which I do not think), it also doesn't strike me as a vital feature for sorting? |
8e00f51
to
69aabd9
Compare
On the most recent commit. Performance numbers In [1]: import numpy as np
In [2]: c = np.random.random(size=(1000, 1000)) * (1 - 1j)
...: %timeit np.sort(c, by=(c.real, c.imag), axis=0)
65.6 ms ± 556 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [3]: c = np.random.random(size=(1000, 1000)) * (1 - 1j)
In [4]: %timeit np.take_along_axis(c, np.lexsort((c.imag, c.real,)), axis=0)
70 ms ± 617 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) |
iptr++; | ||
} | ||
_unaligned_strided_byte_copy(sit->dataptr, selstride, sortedbuffer, | ||
aelsize, N, aelsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is not much faster, but I still like it a bit more, plus there is slightly more opportunity to speed it up if we ever want to, and less memory overhead.
Can't you do this in the for-loop directly? Maybe we could also make a small helper for it, since I think it should look identical in both paths of the needscopy
if/else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change on this
PyArray_DIMS(mps[0]), | ||
PyArray_NDIM(mps[0])))) { | ||
PyErr_SetString(PyExc_ValueError, | ||
"all keys need to be the same shape"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the shape of individual keys should be same as shape of the array we are sorting by ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they should, IIUC. We're not broadcasting. #16700 (comment)
} | ||
} | ||
if (!PyArray_DESCR(mps[i])->f->argsort[NPY_STABLESORT] | ||
&& !PyArray_DESCR(mps[i])->f->compare) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this do stable sort even if other sort kinds provided and is that expected behavior ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good point, we have to use stable sort (except for the first sort I guess), otherwise the approach does not work. We should maybe give an error if by
is passed together with the sort-kind (or sort-kind other than "stable").
int elsize; | ||
int aelsize; | ||
int maxelsize; | ||
int object = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we choose a better name to describe what this int is supposed to do ?
* for backwards compatibility reasons. | ||
*/ | ||
if (nd == 0 && (axis == 0 || axis == -1)) { | ||
/* TODO: can we deprecate this? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we want to deprecate this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was copied over from lexsort(?). 0-d arrays can't be reasonably sorted along an axis (except None
, which has ravel behavior), but we allow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vrakesh this also means that I think we should just go through with the deprecation. Which I think just means deleting the first if
block so that check_and_adjust_axis
is always run. np.array(0).sort(axis=None)
fails, so it should fail here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is important I think, and should get a test-case.
selstride, N, aelsize); | ||
|
||
iptr = (npy_intp *)indbuffer; | ||
for(i=0;i<N;i++){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
for(i=0;i<N;i++){ | |
for (i = 0; i < N; i++) { |
PyDataMem_FREE(indbuffer); | ||
goto fail; | ||
} | ||
for(i=0;i<N;i++){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for(i=0;i<N;i++){ | |
for (i = 0; i < N; i++) { |
numpy/core/fromnumeric.py
Outdated
by: tuple or object, optional | ||
With a given array `a` sort the elements not by arithmetic | ||
order, but by any arbitrary order defined by tuple, there can | ||
be multiple keys in a tuple form , or just a single object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help here to call out what order the keys are sorted in (similar to lexsort documentation). Since trying to follow lexsort it seems like to sort lexicograpphically with keys for complex numbers one should do arr.sort(by=(c.imag, c.real))
numpy/core/tests/test_multiarray.py
Outdated
msg = "Test sorting comple xnumbers by absolute value" | ||
carr = np.arange(4, dtype=np.complex128)[::-1].reshape(2, 2) | ||
carr_out = np.array([[2+0j, 3+0j], [0+0j, 1+0j]], dtype=np.complex128) | ||
carr.sort(axis=1, by=np.absolute(carr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also add the carr.real, carr.imag test mentioned above. Also, would help to check for different shapes of keys and mismatch between keys and arr shape for error assertions.
{ | ||
PyArrayObject **mps; | ||
PyArrayIterObject **its; | ||
PyArrayIterObject *sit = NULL; | ||
npy_intp n, N, size, i, j; | ||
npy_intp astride, *iptr, selstride; | ||
int nd; | ||
int needcopy = 0; | ||
int elsize; | ||
int aelsize; | ||
int maxelsize; | ||
int object = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we push these declarations down-file to where they are first used, C99-style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure on this one, every other function in the same file does not follow it, shouldn't be a problem changing, only question is uniformity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because other functions were written before the language allowed it. IMO we should write all new code in c99 style.
PyDataMem_FREE(indbuffer); | ||
goto fail; | ||
} | ||
swaps = malloc(NPY_LIKELY(n > 0) ? n * sizeof(int) : 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swaps = malloc(NPY_LIKELY(n > 0) ? n * sizeof(int) : 1); | |
swaps = malloc(NPY_LIKELY(n > 0) ? n * sizeof(*swaps) : 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the swaps variable can be removed, will do it.
} | ||
iptr = (npy_intp *)indbuffer; | ||
char *sortedbuffer = PyDataMem_NEW(N*aelsize); | ||
if(sortedbuffer == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(sortedbuffer == NULL) { | |
if (sortedbuffer == NULL) { |
/* Out of memory during sorting or buffer creation */ | ||
PyErr_NoMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be addressed at the point those things happen, not here
@@ -350,6 +350,8 @@ | |||
'PyArray_ResolveWritebackIfCopy': (302,), | |||
'PyArray_SetWritebackIfCopyBase': (303,), | |||
# End 1.14 API | |||
'PyArray_KeySort': (304,), | |||
# End of trial API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed. We try to err on the side of not changing the C API if possible.
numpy/core/fromnumeric.py
Outdated
a.sort(axis=axis, kind=kind, order=order) | ||
|
||
if by is not None and not isinstance(by, (tuple, list)): | ||
by = (by,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not doing this conversion. The argument is documented as as sequence.
#else | ||
#define PyArray_malloc malloc | ||
#define PyArray_free free | ||
#define PyArray_realloc realloc | ||
#define PyArray_calloc calloc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above comment
b114f5f
to
2396c7a
Compare
@seberg Got the in-place part done as well. Does this look good to go? |
break; | ||
} | ||
memmove(sit->dataptr +(current_idx*self_stride), sit->dataptr + (next_idx*self_stride), self_elsize); | ||
current_idx = next_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to refactor this into a small helper. Deduplicates the code/explains what is going on in this case (through the function name), and adds an obvious place to add a comment of whats going on, and maybe a reference for the approach (if you have).
I will look at the rest soon, sorry, please ping me if I forget...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will refactor it into a helper.
319fc3f
to
edaa2ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vrakesh sorry for taking so long to make another pass. I think this should be settling up with some smaller cleanups (mostly style nits), and removing the complex special path.
numpy/core/fromnumeric.py
Outdated
a.sort(axis=axis, kind=kind, order=order) | ||
|
||
if by is not None and not isinstance(by, (tuple, list)): | ||
raise TypeError("\'by\' is expected to be of a sequence type.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be unnecessary, because the check should live in ndarray.sort()
.
I am actually happy with this decision: I.e. force a tuple()
right now, and think about generalizing later. In which case I think we should go with explicitly tuple, and even disallow lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sebastian, will address these changes
@@ -14,6 +14,7 @@ | |||
|
|||
#include "npy_pycompat.h" | |||
|
|||
#include "mapping.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "mapping.h" |
Probably not necessary anymore.
* Sorts 1D slice based on argsort indices | ||
* takes in raw bytes of data, indices array | ||
* Number of elements, and stride of data | ||
* element size and temporary storage for an element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style nitpicks, like a space before the *
. I like to try and do some doxygen style comments nowadays, but no need (style nitpicks also apply below, always space before and after the *
).
for(npy_intp i = 0; i < N; i++){ | ||
/*Backup current element, end of loop place it in correct location. | ||
*Stride here handles both aligned and non-aligned case | ||
*When stride is same as elsize we are operating on aligned data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually mean "contiguous" here and not aligned (although aligned is also true). I think you can actually just remove the stride part of the comment, stride always means the same thing in the NumPy code base.
* element size and temporary storage for an element. | ||
*/ | ||
void reorder_by_index(char *data, char *current, npy_intp *index, npy_intp N, npy_intp stride, npy_intp elsize) { | ||
// Reorder the indices and relevant values cyclically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it helps without thinking too much about it, but maybe we can write: The following code reorders the data with respect to the index. The inner while
loop places a single element to the right place until it reaches a fully cycle (an already ordered element). Each element may be visited twice, but will be sorted on the first visit. The second visit finds it already sorted and immediately continues.
numpy/core/src/multiarray/methods.c
Outdated
return NULL; | ||
} | ||
// If keys are None and We have complex types use c.real, c.imag as keys for lexsort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still holds, this path should be removed, IMO (it can be replaced with a deprecation warning, but it is likely nicer to do that in a followup)
numpy/core/src/multiarray/methods.c
Outdated
sortkind = NPY_STABLESORT; | ||
} | ||
if (sortkind != NPY_STABLESORT) { | ||
PyErr_SetString(PyExc_ValueError, "When using by argument " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: No need for the \
and I would probably just put the text on the next line with 8 space indentation.
numpy/core/tests/test_multiarray.py
Outdated
assert_raises(ValueError, carr.sort, by=(carr.imag[0],)) | ||
# Only stable sort test | ||
carr = np.array([[1+1j, 1-1j]], dtype=np.complex128) | ||
assert_raises(ValueError, carr.sort, kind='quicksort') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should be transformed into one using by=
, lets not modify the complex right now.
* for backwards compatibility reasons. | ||
*/ | ||
if (nd == 0 && (axis == 0 || axis == -1)) { | ||
/* TODO: can we deprecate this? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is important I think, and should get a test-case.
numpy/core/src/multiarray/methods.c
Outdated
"sort-kind can be only \'stable\'."); | ||
return NULL; | ||
} | ||
if (!PyTuple_CheckExact(by) && !PyList_CheckExact(by)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add a test for this choice, and I think if we limit so much (which I like), we might as well limit it to only tuples. Otherwise "sequence" would be arbitrary python sequences.
numpy/__init__.pyi
Outdated
@@ -1018,6 +1018,7 @@ def sort( | |||
axis: Optional[int] = ..., | |||
kind: Optional[_SortKind] = ..., | |||
order: Union[None, str, Sequence[str]] = ..., | |||
by: Union[None, Sequence[ndarray], ndarray] = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by: Union[None, Sequence[ndarray], ndarray] = ..., | |
by: Optional[Sequence[ArrayLike]] = ..., |
Currently only sequences (i.e. lists and tuples) of array-like objects are allowed, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I am thinking if we should just make it tuple
here. I don't mind making it sequences, but not sure there is much reason for not being strict (but I may be missing some earlier discussion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will make the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going with tuple then Optional[Tuple[ArrayLike, ...]]
should do the trick.
edaa2ea
to
ab2214b
Compare
@seberg Addressed the changes, hopefully this is good to go |
I guess we should close this PR, unfortunately. We discussed this and maybe we should not block deprecating complex by having this new API. It was a nice idea, but likely complex sorting efficiently may not be important enough to avoid I don't hate an API like this though and this was a great push! But pushing this forward seems like a larger thing and probably not vital. |
This is the first step in addressing gh-15981
Related mailing list thread can be found here
As stated in the original thread, We need to start by having a sort() function for complex numbers that can do it based on keys, rather than plain arithmetic ordering.
There are two broad ways to approach a sorting function that supports keys (Not just for complex numbers).
key
kwarg to thesort()
(function and method). To support key based sorting on arrays.sortby(c_arr, key=(c_arr.real, c_arr.imag)
In this PR I have chosen approach 1 for the following reasons
Approach 1 means it is more easier to deal with both in-place method and the function. Since we can make the change in the c-sort function, have minimal change in the python layer. This I hope results, minimal impact on current code that handles complex sorting. One example within numpy is is
linalg
module'ssvd()
function.With approach 2 when we deprecate complex arithmetic ordering, existing methods using sort() for complex types, need to update their signature.
As it stands the PR does the following 3 things within the Python-C Array method implementation of sort
key
(When no key is passed) which mimics the current arithmethic ordering in Numpy .Py_LexSort
and generate indices.take_along_axis
via C call back and copy over the result to the original array (pseudo in-place).I am requesting feedback/help on implementing
take_along_axis
logic in C level in a in-place manner and the approach in general.This will further feed into
max()
andmin()
as well. Once we figure this out. Next step would be to deprecate arithmetic ordering for complex types (Which I think will be a PR on it's own)UPDATE:
The latest version uses A new Function PyArray_Keysort() to argsort a 1D slice of indices, and then use the indices to move contents of the same 1D slice