Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

HaoXJ · 2016-06-15T07:37:50Z

I read dataframe from hdf5 file and rank it. but It rise to a SIGSEGV.
Maybe it's a numpy or table bug..

Code Sample, a copy-pastable example if possible

import pandas as pd
def test_df_ranks(f):
df = pd.read_hdf(f, key="t")
print (df.shape)
print (type(df))
print (df)
s=df.non_current_asset_to_total_asset
#s.rank() # rank() work properly
s.rank(ascending=False) #rank(ascending=False) crash

Expected Output

I expected work well, but rise to me SIGSEGV error
the stacks as follows.
#7 OBJECT_compare (ip1=0x47a3ef4b2e420, ip2=0x7f5c5413f128, __NPY_UNUSED_TAGGEDap=0x7f5cd0100760) at numpy/core/src/multiarray/arraytypes.c.src:2753
#8 0x00007f5d0142c50e in npy_aquicksort (vv=vv@entry=0x7f5c5413f060, tosort=tosort@entry=0x7f5c5413cc80, num=num@entry=52, varr=varr@entry=0x7f5cd0100760) at numpy/core/src/npysort/quicksort.c.src:480
#9 0x00007f5d0139a78a in _new_argsortlike (op=op@entry=0x7f5cd0100760, axis=0, argsort=argsort@entry=0x7f5d0142c310 <npy_aquicksort>, argpart=argpart@entry=0x0, kth=kth@entry=0x0, nkth=nkth@entry=0)

at numpy/core/src/multiarray/item_selection.c:1035
#10 0x00007f5d0139dd7b in PyArray_ArgSort (op=op@entry=0x7f5cd0100760, axis=0, which=) at numpy/core/src/multiarray/item_selection.c:1309
#11 0x00007f5d013dd012 in array_argsort (self=0x7f5cd0100760, args=, kwds=) at numpy/core/src/multiarray/methods.c:1278
#12 0x00007f5cf4eef28f in __Pyx_PyObject_Call (func=0x7f5cd1a1acc8, arg=0x7f5d0f900048, kw=0x0) at pandas/algos.c:201388
#13 0x00007f5cf504e006 in __pyx_pf_6pandas_5algos_8rank_1d_generic (__pyx_v_in_arr=__pyx_v_in_arr@entry=0x7f5cd0100620, __pyx_v_retry=1, __pyx_v_ties_method=0x7f5cf6999768, __pyx_v_ascending=0x7f5d0f6bd700 <_Py_FalseStruct>,

__pyx_v_na_option=, __pyx_v_pct=0x7f5d0f6bd700 <_Py_FalseStruct>, __pyx_self=) at pandas/algos.c:14942
#14 0x00007f5cf5050481 in __pyx_pw_6pandas_5algos_9rank_1d_generic (__pyx_self=, __pyx_args=, __pyx_kwds=0x7f5cd8659488) at pandas/algos.c:14439
#15 0x00007f5d0f3b9477 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#16 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#17 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#18 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#19 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#20 0x00007f5d0f3b8e40 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#21 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#22 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#23 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#24 0x00007f5d0f32a4b3 in function_call () from /lib64/libpython3.4m.so.1.0
#25 0x00007f5d0f301dcc in PyObject_Call () from /lib64/libpython3.4m.so.1.0
#26 0x00007f5d0f3b57c9 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0

output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Darwin
OS-release: 14.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.10.4
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: 0.6.7.None
psycopg2: None
Jinja2: None

The h5 data in my github
https://github.com/HaoXJ/codefail/blob/master/data/test.h5

The text was updated successfully, but these errors were encountered:

jreback · 2016-06-15T11:27:20Z

pls show a minimal example that can be copy pasted, whether its the hdf5 reading or something else.

jreback · 2016-06-16T12:23:30Z

closing as not-reproducible. @HaoXJ if you post an example then pls comment.

HaoXJ · 2016-06-22T13:57:31Z

import sys
import pandas as pd
def series_rank_test(f):
    df = pd.read_hdf(f, key="bad_series")
    #df.rank()
    df.rank(ascending=False)
if __name__=="__main__":
    series_rank_test(sys.argv[1])

The data in https://github.com/HaoXJ/codefail/blob/master/data/test.h5

jorisvandenbossche · 2016-06-23T10:13:22Z

I can reproduce this.

However, I can't trim it down to a simple example. If I recreate the Series from a dict (the result of to_dict on the h4 file):

from math import nan

s = pd.Series({'000022.XSHE': nan,
 '000089.XSHE': nan,
 '000099.XSHE': nan,
 '000429.XSHE': nan,
 '000507.XSHE': nan,
 '000520.XSHE': nan,
 '000548.XSHE': nan,
 '000828.XSHE': nan,
 '000900.XSHE': nan,
 '000905.XSHE': nan,
 '000916.XSHE': nan,
 '002040.XSHE': nan,
 '600004.XSHG': nan,
 '600009.XSHG': nan,
 '600012.XSHG': nan,
 '600020.XSHG': nan,
 '600026.XSHG': nan,
 '600033.XSHG': nan,
 '600035.XSHG': nan,
 '600057.XSHG': nan,
 '600077.XSHG': nan,
 '600106.XSHG': nan,
 '600115.XSHG': nan,
 '600119.XSHG': nan,
 '600125.XSHG': nan,
 '600153.XSHG': nan,
 '600180.XSHG': nan,
 '600190.XSHG': nan,
 '600221.XSHG': nan,
 '600269.XSHG': nan,
 '600270.XSHG': nan,
 '600317.XSHG': nan,
 '600350.XSHG': nan,
 '600368.XSHG': nan,
 '600377.XSHG': nan,
 '600428.XSHG': nan,
 '600548.XSHG': nan,
 '600561.XSHG': nan,
 '600575.XSHG': nan,
 '600611.XSHG': nan,
 '600650.XSHG': nan,
 '600662.XSHG': nan,
 '600676.XSHG': nan,
 '600692.XSHG': nan,
 '600717.XSHG': nan,
 '600751.XSHG': nan,
 '600787.XSHG': nan,
 '600794.XSHG': nan,
 '600798.XSHG': nan,
 '600834.XSHG': nan,
 '600896.XSHG': nan,
 '600897.XSHG': nan}, name="non_current_asset_to_total_asset")

 s.astype(object).rank(ascending=False)

This also crashes python for me. In any case, it has something to do with the fact it is object dtype, and not float (the read_hdf function returns such a series when all values are NaN).

HaoXJ · 2016-06-29T13:43:53Z

So ,Bug exist in serial.rank() return nan in my expect .
Thanks for the answer.

jreback added Can't Repro Usage Question labels Jun 15, 2016

jreback closed this as completed Jun 16, 2016

jorisvandenbossche reopened this Jun 23, 2016

jorisvandenbossche removed Can't Repro Usage Question labels Jun 23, 2016

jorisvandenbossche added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jun 23, 2016

HaoXJ closed this as completed Jun 29, 2016

HaoXJ reopened this Jun 29, 2016

jreback added this to the Next Major Release milestone Jun 30, 2016

jreback added Difficulty Intermediate labels Jun 30, 2016

dsm054 added a commit to dsm054/pandas that referenced this issue Aug 16, 2016

BUG: Avoid sentinel-infinity comparison problems (pandas-dev#13445)

7d79370

dsm054 mentioned this issue Aug 16, 2016

BUG: Avoid sentinel-infinity comparison problems (#13445) #14006

Closed

4 tasks

jreback modified the milestones: 0.19.0, Next Major Release Aug 16, 2016

jreback closed this as completed in 5c27c02 Aug 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

HaoXJ commented Jun 15, 2016

jreback commented Jun 15, 2016

jreback commented Jun 16, 2016

HaoXJ commented Jun 22, 2016 •

edited by jorisvandenbossche

jorisvandenbossche commented Jun 23, 2016

HaoXJ commented Jun 29, 2016

Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

Comments

HaoXJ commented Jun 15, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Jun 15, 2016

jreback commented Jun 16, 2016

HaoXJ commented Jun 22, 2016 • edited by jorisvandenbossche

jorisvandenbossche commented Jun 23, 2016

HaoXJ commented Jun 29, 2016

output of `pd.show_versions()`

HaoXJ commented Jun 22, 2016 •

edited by jorisvandenbossche