Bug in sorting structured numpy array with more than 2^31 elements #427

Closed
parantapa opened this Issue Sep 5, 2012 · 5 comments

Comments

Projects
None yet
3 participants
@parantapa

When using structured arrays of size more than 2^31, sorting doesn't work. The
sort function returns immediately without sorting. Following is a simple test
case.

import numpy as np

dt = "u4, u4"
a = np.empty(2 ** 31, dtype=dt)
a["f0"][:] = np.random.randint(1, 1e9, 2 ** 31)
a["f1"][:] = np.random.randint(1, 1e9, 2 ** 31)

a.sort(order=["f0", "f1"])
for i in xrange(len(a) -1):
    u1, v1 = a[i]
    u2, v2 = a[i + 1]

    assert u1 < u2 or (u1 == u2 and v1 <= v2)

The above has been tested using Python 2.7 and Numpy version 1.6.2 as well as
the Numpy Git version a72ce7e. The test was done on a 64bit linux system with
48G of memory.

@ilaudy

This comment has been minimized.

Show comment
Hide comment
@ilaudy

ilaudy Jan 25, 2013

I have the same problem with sort, and the all three kinds do not work.
It report a Segmentation fault.
Here is the gdb results:

GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/bin/python...(no debugging symbols found)...done.
(gdb) run mockgalaxy.py
Starting program: /usr/bin/python mockgalaxy.py
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffee4e3700 (LWP 22553)]
1024 1105

Program received signal SIGSEGV, Segmentation fault.
_new_argsort (op=, axis=0, which=) at numpy/core/src/multiarray/item_selection.c:880
880 *iptr++ = i;

ilaudy commented Jan 25, 2013

I have the same problem with sort, and the all three kinds do not work.
It report a Segmentation fault.
Here is the gdb results:

GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/bin/python...(no debugging symbols found)...done.
(gdb) run mockgalaxy.py
Starting program: /usr/bin/python mockgalaxy.py
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffee4e3700 (LWP 22553)]
1024 1105

Program received signal SIGSEGV, Segmentation fault.
_new_argsort (op=, axis=0, which=) at numpy/core/src/multiarray/item_selection.c:880
880 *iptr++ = i;

@ilaudy

This comment has been minimized.

Show comment
Hide comment
@ilaudy

ilaudy Jan 25, 2013

Here is my solution:
change the file numpy-1.6.2/numpy/core/src/multiarray/item_selection.c
line 818 to
long long needcopy = 0, i;
rebuild your numpy, then reinstall numpy
It should be worked.
Enjoy!

ilaudy commented Jan 25, 2013

Here is my solution:
change the file numpy-1.6.2/numpy/core/src/multiarray/item_selection.c
line 818 to
long long needcopy = 0, i;
rebuild your numpy, then reinstall numpy
It should be worked.
Enjoy!

@seberg

This comment has been minimized.

Show comment
Hide comment
@seberg

seberg Jan 25, 2013

Member

Nice catch! Would you mind preparing a pull request? The correct datatype for i in this context is npy_intp, just like N (i is used for iteration over keys too, which I think can safely be assumed to be int, that is fine though. A check on the sequence length would not hurt, though I admit giving 2**31 keys is pretty absurd -- this may be in lexsort not in _new_argsort). The Lexsort function seems to have the same problem.

Member

seberg commented Jan 25, 2013

Nice catch! Would you mind preparing a pull request? The correct datatype for i in this context is npy_intp, just like N (i is used for iteration over keys too, which I think can safely be assumed to be int, that is fine though. A check on the sequence length would not hurt, though I admit giving 2**31 keys is pretty absurd -- this may be in lexsort not in _new_argsort). The Lexsort function seems to have the same problem.

@ilaudy

This comment has been minimized.

Show comment
Hide comment
@ilaudy

ilaudy Jan 25, 2013

I totally agree with the change of datatype of i. It is much safer with npy_intp.
Go ahead with the pull request.

ilaudy commented Jan 25, 2013

I totally agree with the change of datatype of i. It is much safer with npy_intp.
Go ahead with the pull request.

@seberg

This comment has been minimized.

Show comment
Hide comment
@seberg

seberg Feb 13, 2013

Member

Fixed in master and an open PR for backporting to 1.7.

Member

seberg commented Feb 13, 2013

Fixed in master and an open PR for backporting to 1.7.

@seberg seberg closed this Feb 13, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment