Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in np.lexsort axis kwarg #10521

Open
ogauthe opened this issue Feb 4, 2018 · 7 comments
Open

error in np.lexsort axis kwarg #10521

ogauthe opened this issue Feb 4, 2018 · 7 comments

Comments

@ogauthe
Copy link

ogauthe commented Feb 4, 2018

Hello,

I would like to sort an array according to lexical order, sorting by first the second column. np.lexsort is the function to use, but it seems not to handle the kwarg axis.

>>> import numpy as np
>>> a = np.array([[0,1],[1,0],[0,0],[0,-1],[0,1],[1,-1]])
>>> a.ndim
2
>>> np.lexsort(a.T,axis=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: axis(=1) out of bounds

Info version :
Debian Python 3.5.
>>> np.__version__
'1.12.1'

The same bug happens with Anaconda python and numpy.

The function np.argsort does not suffer from this problem, but is not what I want. In my case, I could use np.lexsort(a.T[::-1]) as a workaround.

@jaimefrio
Copy link
Member

Not sure, but may be related to #5782. Lexsort could use a serious rehaul...

@charris
Copy link
Member

charris commented Feb 4, 2018

The documentation is confusing, indeed, deceptive. The axis is for the case when each key is an array, in which case it the axis is taken from that array. I think the logic in the relevant function, PyArray_LexSort could use a close review. In your case, there are 6 keys stored in the 1D last axis. Now if you had passed keys in the shape `(6, 2, 2)' the keys would be 2D and the axis would be valid.

In [1]: a = np.array([[0,1],[1,0],[0,0],[0,-1],[0,1],[1,-1]])

In [2]: lexsort(a[:,:,None], axis=1)
Out[2]: 
array([[0],
       [0]])

Not sure why the result has that shape, but frankly, I have no idea what the intended use of that keyword is :(

@charris
Copy link
Member

charris commented Feb 4, 2018

Apropos the current problem, this may be what you want

In [3]: lexsort(a[None,...], axis=1)
Out[3]: 
array([[0, 1],
       [1, 0],
       [0, 1],
       [1, 0],
       [0, 1],
       [1, 0]])

Note that you probably need to remove the first dimension of the result.

@charris
Copy link
Member

charris commented Feb 4, 2018

And whatever is intended, I think it is buggy.

@ogauthe
Copy link
Author

ogauthe commented Feb 4, 2018

The syntax np.lexsort(a[None,...], axis=0) seems complicate and counterintuitive to me. More, its behaviour is not accurate:

>>> a[np.lexsort(a[None,...], axis=0)[:,0]]
array([[ 0,  1],
       [ 0,  0],
       [ 0, -1],
       [ 0,  1],
       [ 1,  0],
       [ 1, -1]])

Ok it sorts by first column, but then it forgets the second! Actually, it gives the same result as np.argsort(a,axis=0) here - which it should not.

To sort according to columns, the easy way is to translate, but I do not understand either why the sorting order is second then first line (hence the [::-1])

>>> a[np.lexsort(a.T[::-1])]
array([[ 0, -1],
       [ 0,  0],
       [ 0,  1],
       [ 0,  1],
       [ 1, -1],
       [ 1,  0]])

I would expect lexsort to let me choose the axis and then sort from 0 to N-1, with a simple, similar to argsort syntax:np.lexsort(a,axis=0), and give a 1D array of indices.

@adeak
Copy link
Contributor

adeak commented Feb 27, 2021

I just ran into this while poking lexsort. It took me 10 minutes to figure out why I was getting the error

>>> np.lexsort(np.arange(2*3).reshape(2, 3), axis=-1)  # default axis
array([0, 1, 2])
>>> np.lexsort(np.arange(2*3).reshape(2, 3), axis=1)  # 1 is -1 in case of 2d, right?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 5, in lexsort
numpy.AxisError: axis 1 is out of bounds for array of dimension 1

I now understand that lexsort always unpacks along the first dimension (interpreting arrays as mere iterables of arrays with one fewer dimension), and the axis keyword is understood in terms of these smaller arrays.

I haven't checked the blame in these past 3 years but the docstring is still confusing. There's no mention of the axis keyword beyond its own section in the docstring, and the rest of the docstring talks about 1d and 2d keys only (in which case the axis keyword is fairly redundant).

Should we try clarifying the docs? Looking at the above comments I'm not even sure the behaviour is always as expected? It would also be nice to be able to give a better error message, but I suspect that might need too much special-casing in the implementation (if I'm reading it correctly we'd have to patch somewhere around here).

@braindevices
Copy link

apparently this is still a problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants