When using structured arrays of size more than 2^31, sorting doesn't work. The
sort function returns immediately without sorting. Following is a simple test
import numpy as np
dt = "u4, u4"
a = np.empty(2 ** 31, dtype=dt)
a["f0"][:] = np.random.randint(1, 1e9, 2 ** 31)
a["f1"][:] = np.random.randint(1, 1e9, 2 ** 31)
for i in xrange(len(a) -1):
u1, v1 = a[i]
u2, v2 = a[i + 1]
assert u1 < u2 or (u1 == u2 and v1 <= v2)
The above has been tested using Python 2.7 and Numpy version 1.6.2 as well as
the Numpy Git version a72ce7e. The test was done on a 64bit linux system with
48G of memory.
I have the same problem with sort, and the all three kinds do not work.
It report a Segmentation fault.
Here is the gdb results:
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /usr/bin/python...(no debugging symbols found)...done.
(gdb) run mockgalaxy.py
Starting program: /usr/bin/python mockgalaxy.py
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffee4e3700 (LWP 22553)]
Program received signal SIGSEGV, Segmentation fault.
_new_argsort (op=, axis=0, which=) at numpy/core/src/multiarray/item_selection.c:880
880 *iptr++ = i;
Here is my solution:
change the file numpy-1.6.2/numpy/core/src/multiarray/item_selection.c
line 818 to
long long needcopy = 0, i;
rebuild your numpy, then reinstall numpy
It should be worked.
Nice catch! Would you mind preparing a pull request? The correct datatype for i in this context is npy_intp, just like N (i is used for iteration over keys too, which I think can safely be assumed to be int, that is fine though. A check on the sequence length would not hurt, though I admit giving 2**31 keys is pretty absurd -- this may be in lexsort not in _new_argsort). The Lexsort function seems to have the same problem.
I totally agree with the change of datatype of i. It is much safer with npy_intp.
Go ahead with the pull request.
Fixed in master and an open PR for backporting to 1.7.