Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial sort() by support and complex sort() change #16700

Closed
wants to merge 6 commits into from

Conversation

vrakesh
Copy link
Member

@vrakesh vrakesh commented Jun 27, 2020

This is the first step in addressing gh-15981

Related mailing list thread can be found here

As stated in the original thread, We need to start by having a sort() function for complex numbers that can do it based on keys, rather than plain arithmetic ordering.

There are two broad ways to approach a sorting function that supports keys (Not just for complex numbers).

  1. Add a key kwarg to the sort() (function and method). To support key based sorting on arrays.
  2. Use a new function on the lines off sortby(c_arr, key=(c_arr.real, c_arr.imag)

In this PR I have chosen approach 1 for the following reasons

  1. Approach 1 means it is more easier to deal with both in-place method and the function. Since we can make the change in the c-sort function, have minimal change in the python layer. This I hope results, minimal impact on current code that handles complex sorting. One example within numpy is is linalg module's svd() function.

  2. With approach 2 when we deprecate complex arithmetic ordering, existing methods using sort() for complex types, need to update their signature.

As it stands the PR does the following 3 things within the Python-C Array method implementation of sort

  1. Checks for complex type- If array is of complex-type, it creates a default key(When no key is passed) which mimics the current arithmethic ordering in Numpy .
  2. Uses the keys to perform a Py_LexSort and generate indices.
  3. We perform the take_along_axis via C call back and copy over the result to the original array (pseudo in-place).

I am requesting feedback/help on implementing take_along_axis logic in C level in a in-place manner and the approach in general.

This will further feed into max() and min() as well. Once we figure this out. Next step would be to deprecate arithmetic ordering for complex types (Which I think will be a PR on it's own)

UPDATE:
The latest version uses A new Function PyArray_Keysort() to argsort a 1D slice of indices, and then use the indices to move contents of the same 1D slice

numpy/__init__.pyi Outdated Show resolved Hide resolved
@vrakesh
Copy link
Member Author

vrakesh commented Jul 15, 2020

Some initial performance comparisions

In [5]: carr = np.arange(27, dtype=np.complex128)[::-1].reshape(3,3,3)                                                                                        

# New keysort() c function internally handling complex numbers
In [6]: %timeit carr.sort()                                                                                                                                   
11.2 µs ± 749 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: carr = np.arange(27, dtype=np.complex128)[::-1].reshape(3,3,3)                                                                                        

# Complex sorting with using lexsort+ take_ along_axis
In [8]: %timeit np.take_along_axis(carr, np.lexsort((carr.imag, carr.real,)), axis=0)                                                                         
14.9 µs ± 724 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like this approach much more. It could be interesting how it holds up with other use-cases, such as:

c = np.random.random(size=(1000, 1000)) * (1 - 1j)
%timeit np.sort(c, by=(c.real, c.imag), axis=0)
%timeit np.sort(c, by=(c.real, c.imag), axis=0)

goto fail;
}
rit = (PyArrayIterObject *)
PyArray_IterAllButAxis((PyObject *)ret, &axis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can delete rit entirely. It is only actually used in the !needcopy copy, since in the other path indbuffer is used, and there is no reason to actually copy it back. You can use indbuffer identically in that branch, since we do not actually return the index array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting rit includes deleting ret entirely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was also thinking the same. Will get it done

|| PyArray_ISBYTESWAPPED(mps[j])
|| !(PyArray_FLAGS(mps[j]) & NPY_ARRAY_ALIGNED)
|| (PyArray_STRIDES(mps[j])[axis] != (npy_intp)PyArray_DESCR(mps[j])->elsize)
|| (selstride != sizeof(npy_intp)) ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect, self can have any datatype, so it should be selstride != PyArray_DESCR(self)->elsize.

char *slicebuffer = PyDataMem_NEW(N*aelsize);
_unaligned_strided_byte_copy(slicebuffer, aelsize, sit->dataptr,
selstride, N, aelsize);
npy_intp dims[1] = {N};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit, I am curious if we cannot just replace the below code with a manual memmove, you have to move elements (elsize), we know that every item ends up in exactly one other place, and we do not have to check the indices for out-of-bound values...

so a simple for loop with a memmove inside could be just as well. If it turns out a bit slow for some (smaller) dtypes, we could take the approach we have in npy_fastputmask to have the compiler generate different versions for the most common dtype sizes.

After doing that, we could experiment with an in-place algorithm, which seems like it should be possible as long as we also modify indbuffer at the same time.

In any case, right now I think a simple for-loop with a memmove inside is probably just as fast (no overhead and no index checking) or aster. And probably even slightly easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah will, look into this.

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we want to allow the sorting keys to be broadcastable?

if axis is None:
# flatten returns (1, N) for np.matrix, so always use the last axis
a = asanyarray(a).flatten()
axis = -1
else:
a = asanyarray(a).copy(order="K")
a.sort(axis=axis, kind=kind, order=order)

if by is not None and not isinstance(by, tuple):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if by is not None and not isinstance(by, tuple):
if by is not None and not isinstance(by, (tuple, list)):

To match block.

Comment on lines +1811 to +1839
PyArray_DIMS(mps[0]),
PyArray_NDIM(mps[0])))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
PyArray_DIMS(mps[0]),
PyArray_NDIM(mps[0])))) {
PyArray_DIMS(mps[0]),
PyArray_NDIM(mps[0])))) {

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of notes, and a question, do we want the sorting keys to be broadcastable to the array shape?

@vrakesh
Copy link
Member Author

vrakesh commented Jul 18, 2020

@hameerabbasi I am not sure, if we want it to be broadcast able, thoughts?? @seberg

@seberg
Copy link
Member

seberg commented Jul 18, 2020

Broadcastable sounds nice, but unless lexsort has it (which I do not think), it also doesn't strike me as a vital feature for sorting?

@vrakesh vrakesh force-pushed the complex_types branch 3 times, most recently from 8e00f51 to 69aabd9 Compare July 22, 2020 18:39
@vrakesh
Copy link
Member Author

vrakesh commented Jul 22, 2020

On the most recent commit. Performance numbers

In [1]: import numpy as np                                                                                                                                                                                        

In [2]: c = np.random.random(size=(1000, 1000)) * (1 - 1j) 
   ...: %timeit np.sort(c, by=(c.real, c.imag), axis=0)                                                                                                                                                           
65.6 ms ± 556 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: c = np.random.random(size=(1000, 1000)) * (1 - 1j)                                                                                                                                                        

In [4]: %timeit np.take_along_axis(c, np.lexsort((c.imag, c.real,)), axis=0)                                                                                                                                      
70 ms ± 617 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

iptr++;
}
_unaligned_strided_byte_copy(sit->dataptr, selstride, sortedbuffer,
aelsize, N, aelsize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not much faster, but I still like it a bit more, plus there is slightly more opportunity to speed it up if we ever want to, and less memory overhead.

Can't you do this in the for-loop directly? Maybe we could also make a small helper for it, since I think it should look identical in both paths of the needscopy if/else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change on this

PyArray_DIMS(mps[0]),
PyArray_NDIM(mps[0])))) {
PyErr_SetString(PyExc_ValueError,
"all keys need to be the same shape");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the shape of individual keys should be same as shape of the array we are sorting by ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they should, IIUC. We're not broadcasting. #16700 (comment)

}
}
if (!PyArray_DESCR(mps[i])->f->argsort[NPY_STABLESORT]
&& !PyArray_DESCR(mps[i])->f->compare) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this do stable sort even if other sort kinds provided and is that expected behavior ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good point, we have to use stable sort (except for the first sort I guess), otherwise the approach does not work. We should maybe give an error if by is passed together with the sort-kind (or sort-kind other than "stable").

int elsize;
int aelsize;
int maxelsize;
int object = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we choose a better name to describe what this int is supposed to do ?

* for backwards compatibility reasons.
*/
if (nd == 0 && (axis == 0 || axis == -1)) {
/* TODO: can we deprecate this? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we want to deprecate this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was copied over from lexsort(?). 0-d arrays can't be reasonably sorted along an axis (except None, which has ravel behavior), but we allow it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vrakesh this also means that I think we should just go through with the deprecation. Which I think just means deleting the first if block so that check_and_adjust_axis is always run. np.array(0).sort(axis=None) fails, so it should fail here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is important I think, and should get a test-case.

selstride, N, aelsize);

iptr = (npy_intp *)indbuffer;
for(i=0;i<N;i++){
Copy link
Member

@anirudh2290 anirudh2290 Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
for(i=0;i<N;i++){
for (i = 0; i < N; i++) {

PyDataMem_FREE(indbuffer);
goto fail;
}
for(i=0;i<N;i++){
Copy link
Member

@anirudh2290 anirudh2290 Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for(i=0;i<N;i++){
for (i = 0; i < N; i++) {

by: tuple or object, optional
With a given array `a` sort the elements not by arithmetic
order, but by any arbitrary order defined by tuple, there can
be multiple keys in a tuple form , or just a single object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help here to call out what order the keys are sorted in (similar to lexsort documentation). Since trying to follow lexsort it seems like to sort lexicograpphically with keys for complex numbers one should do arr.sort(by=(c.imag, c.real))

msg = "Test sorting comple xnumbers by absolute value"
carr = np.arange(4, dtype=np.complex128)[::-1].reshape(2, 2)
carr_out = np.array([[2+0j, 3+0j], [0+0j, 1+0j]], dtype=np.complex128)
carr.sort(axis=1, by=np.absolute(carr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also add the carr.real, carr.imag test mentioned above. Also, would help to check for different shapes of keys and mismatch between keys and arr shape for error assertions.

Comment on lines 1762 to 1773
{
PyArrayObject **mps;
PyArrayIterObject **its;
PyArrayIterObject *sit = NULL;
npy_intp n, N, size, i, j;
npy_intp astride, *iptr, selstride;
int nd;
int needcopy = 0;
int elsize;
int aelsize;
int maxelsize;
int object = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we push these declarations down-file to where they are first used, C99-style?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure on this one, every other function in the same file does not follow it, shouldn't be a problem changing, only question is uniformity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because other functions were written before the language allowed it. IMO we should write all new code in c99 style.

PyDataMem_FREE(indbuffer);
goto fail;
}
swaps = malloc(NPY_LIKELY(n > 0) ? n * sizeof(int) : 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
swaps = malloc(NPY_LIKELY(n > 0) ? n * sizeof(int) : 1);
swaps = malloc(NPY_LIKELY(n > 0) ? n * sizeof(*swaps) : 1);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the swaps variable can be removed, will do it.

}
iptr = (npy_intp *)indbuffer;
char *sortedbuffer = PyDataMem_NEW(N*aelsize);
if(sortedbuffer == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(sortedbuffer == NULL) {
if (sortedbuffer == NULL) {

Comment on lines 2010 to 2011
/* Out of memory during sorting or buffer creation */
PyErr_NoMemory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be addressed at the point those things happen, not here

@@ -350,6 +350,8 @@
'PyArray_ResolveWritebackIfCopy': (302,),
'PyArray_SetWritebackIfCopyBase': (303,),
# End 1.14 API
'PyArray_KeySort': (304,),
# End of trial API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed. We try to err on the side of not changing the C API if possible.

a.sort(axis=axis, kind=kind, order=order)

if by is not None and not isinstance(by, (tuple, list)):
by = (by,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not doing this conversion. The argument is documented as as sequence.

numpy/core/include/numpy/ndarraytypes.h Show resolved Hide resolved
#else
#define PyArray_malloc malloc
#define PyArray_free free
#define PyArray_realloc realloc
#define PyArray_calloc calloc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment

@vrakesh vrakesh force-pushed the complex_types branch 2 times, most recently from b114f5f to 2396c7a Compare August 13, 2020 02:17
@vrakesh
Copy link
Member Author

vrakesh commented Aug 13, 2020

@seberg Got the in-place part done as well. Does this look good to go?

break;
}
memmove(sit->dataptr +(current_idx*self_stride), sit->dataptr + (next_idx*self_stride), self_elsize);
current_idx = next_idx;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to refactor this into a small helper. Deduplicates the code/explains what is going on in this case (through the function name), and adds an obvious place to add a comment of whats going on, and maybe a reference for the approach (if you have).

I will look at the rest soon, sorry, please ping me if I forget...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will refactor it into a helper.

@vrakesh vrakesh force-pushed the complex_types branch 5 times, most recently from 319fc3f to edaa2ea Compare August 14, 2020 16:42
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vrakesh sorry for taking so long to make another pass. I think this should be settling up with some smaller cleanups (mostly style nits), and removing the complex special path.

a.sort(axis=axis, kind=kind, order=order)

if by is not None and not isinstance(by, (tuple, list)):
raise TypeError("\'by\' is expected to be of a sequence type.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be unnecessary, because the check should live in ndarray.sort().

I am actually happy with this decision: I.e. force a tuple() right now, and think about generalizing later. In which case I think we should go with explicitly tuple, and even disallow lists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sebastian, will address these changes

@@ -14,6 +14,7 @@

#include "npy_pycompat.h"

#include "mapping.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "mapping.h"

Probably not necessary anymore.

* Sorts 1D slice based on argsort indices
* takes in raw bytes of data, indices array
* Number of elements, and stride of data
* element size and temporary storage for an element.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style nitpicks, like a space before the *. I like to try and do some doxygen style comments nowadays, but no need (style nitpicks also apply below, always space before and after the *).

for(npy_intp i = 0; i < N; i++){
/*Backup current element, end of loop place it in correct location.
*Stride here handles both aligned and non-aligned case
*When stride is same as elsize we are operating on aligned data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually mean "contiguous" here and not aligned (although aligned is also true). I think you can actually just remove the stride part of the comment, stride always means the same thing in the NumPy code base.

* element size and temporary storage for an element.
*/
void reorder_by_index(char *data, char *current, npy_intp *index, npy_intp N, npy_intp stride, npy_intp elsize) {
// Reorder the indices and relevant values cyclically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it helps without thinking too much about it, but maybe we can write: The following code reorders the data with respect to the index. The inner while loop places a single element to the right place until it reaches a fully cycle (an already ordered element). Each element may be visited twice, but will be sorted on the first visit. The second visit finds it already sorted and immediately continues.

return NULL;
}
// If keys are None and We have complex types use c.real, c.imag as keys for lexsort
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still holds, this path should be removed, IMO (it can be replaced with a deprecation warning, but it is likely nicer to do that in a followup)

sortkind = NPY_STABLESORT;
}
if (sortkind != NPY_STABLESORT) {
PyErr_SetString(PyExc_ValueError, "When using by argument " \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: No need for the \ and I would probably just put the text on the next line with 8 space indentation.

assert_raises(ValueError, carr.sort, by=(carr.imag[0],))
# Only stable sort test
carr = np.array([[1+1j, 1-1j]], dtype=np.complex128)
assert_raises(ValueError, carr.sort, kind='quicksort')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be transformed into one using by=, lets not modify the complex right now.

* for backwards compatibility reasons.
*/
if (nd == 0 && (axis == 0 || axis == -1)) {
/* TODO: can we deprecate this? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is important I think, and should get a test-case.

"sort-kind can be only \'stable\'.");
return NULL;
}
if (!PyTuple_CheckExact(by) && !PyList_CheckExact(by)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add a test for this choice, and I think if we limit so much (which I like), we might as well limit it to only tuples. Otherwise "sequence" would be arbitrary python sequences.

@@ -1018,6 +1018,7 @@ def sort(
axis: Optional[int] = ...,
kind: Optional[_SortKind] = ...,
order: Union[None, str, Sequence[str]] = ...,
by: Union[None, Sequence[ndarray], ndarray] = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
by: Union[None, Sequence[ndarray], ndarray] = ...,
by: Optional[Sequence[ArrayLike]] = ...,

Currently only sequences (i.e. lists and tuples) of array-like objects are allowed, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I am thinking if we should just make it tuple here. I don't mind making it sequences, but not sure there is much reason for not being strict (but I may be missing some earlier discussion).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will make the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going with tuple then Optional[Tuple[ArrayLike, ...]] should do the trick.

@vrakesh
Copy link
Member Author

vrakesh commented Sep 5, 2020

@seberg Addressed the changes, hopefully this is good to go

Base automatically changed from master to main March 4, 2021 02:04
@InessaPawson InessaPawson added the triage review Issue/PR to be discussed at the next triage meeting label Dec 22, 2022
@seberg seberg added the 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. label Feb 8, 2023
@seberg
Copy link
Member

seberg commented Feb 8, 2023

I guess we should close this PR, unfortunately. We discussed this and maybe we should not block deprecating complex by having this new API. It was a nice idea, but likely complex sorting efficiently may not be important enough to avoid arr[np.lexsort(arr.real, arr.imag)].

I don't hate an API like this though and this was a great push! But pushing this forward seems like a larger thing and probably not vital.

@seberg seberg closed this Feb 8, 2023
@InessaPawson InessaPawson removed the triage review Issue/PR to be discussed at the next triage meeting label Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 - Maintenance 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. component: numpy._core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants