Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

Closed
HaoXJ opened this issue Jun 15, 2016 · 5 comments
Closed

Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

HaoXJ opened this issue Jun 15, 2016 · 5 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug
Milestone

Comments

@HaoXJ
Copy link

HaoXJ commented Jun 15, 2016

I read dataframe from hdf5 file and rank it. but It rise to a SIGSEGV.
Maybe it's a numpy or table bug..

Code Sample, a copy-pastable example if possible

import pandas as pd
def test_df_ranks(f):
df = pd.read_hdf(f, key="t")
print (df.shape)
print (type(df))
print (df)
s=df.non_current_asset_to_total_asset
#s.rank() # rank() work properly
s.rank(ascending=False) #rank(ascending=False) crash

Expected Output

I expected work well, but rise to me SIGSEGV error
the stacks as follows.
#7 OBJECT_compare (ip1=0x47a3ef4b2e420, ip2=0x7f5c5413f128, __NPY_UNUSED_TAGGEDap=0x7f5cd0100760) at numpy/core/src/multiarray/arraytypes.c.src:2753
#8 0x00007f5d0142c50e in npy_aquicksort (vv=vv@entry=0x7f5c5413f060, tosort=tosort@entry=0x7f5c5413cc80, num=num@entry=52, varr=varr@entry=0x7f5cd0100760) at numpy/core/src/npysort/quicksort.c.src:480
#9 0x00007f5d0139a78a in _new_argsortlike (op=op@entry=0x7f5cd0100760, axis=0, argsort=argsort@entry=0x7f5d0142c310 <npy_aquicksort>, argpart=argpart@entry=0x0, kth=kth@entry=0x0, nkth=nkth@entry=0)

at numpy/core/src/multiarray/item_selection.c:1035
#10 0x00007f5d0139dd7b in PyArray_ArgSort (op=op@entry=0x7f5cd0100760, axis=0, which=) at numpy/core/src/multiarray/item_selection.c:1309
#11 0x00007f5d013dd012 in array_argsort (self=0x7f5cd0100760, args=, kwds=) at numpy/core/src/multiarray/methods.c:1278
#12 0x00007f5cf4eef28f in __Pyx_PyObject_Call (func=0x7f5cd1a1acc8, arg=0x7f5d0f900048, kw=0x0) at pandas/algos.c:201388
#13 0x00007f5cf504e006 in __pyx_pf_6pandas_5algos_8rank_1d_generic (__pyx_v_in_arr=__pyx_v_in_arr@entry=0x7f5cd0100620, __pyx_v_retry=1, __pyx_v_ties_method=0x7f5cf6999768, __pyx_v_ascending=0x7f5d0f6bd700 <_Py_FalseStruct>,

__pyx_v_na_option=, __pyx_v_pct=0x7f5d0f6bd700 <_Py_FalseStruct>, __pyx_self=) at pandas/algos.c:14942
#14 0x00007f5cf5050481 in __pyx_pw_6pandas_5algos_9rank_1d_generic (__pyx_self=, __pyx_args=, __pyx_kwds=0x7f5cd8659488) at pandas/algos.c:14439
#15 0x00007f5d0f3b9477 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#16 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#17 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#18 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#19 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#20 0x00007f5d0f3b8e40 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#21 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#22 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#23 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#24 0x00007f5d0f32a4b3 in function_call () from /lib64/libpython3.4m.so.1.0
#25 0x00007f5d0f301dcc in PyObject_Call () from /lib64/libpython3.4m.so.1.0
#26 0x00007f5d0f3b57c9 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Darwin
OS-release: 14.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.10.4
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: 0.6.7.None
psycopg2: None
Jinja2: None

The h5 data in my github
https://github.com/HaoXJ/codefail/blob/master/data/test.h5

@jreback
Copy link
Contributor

jreback commented Jun 15, 2016

pls show a minimal example that can be copy pasted, whether its the hdf5 reading or something else.

@jreback
Copy link
Contributor

jreback commented Jun 16, 2016

closing as not-reproducible. @HaoXJ if you post an example then pls comment.

@jreback jreback closed this as completed Jun 16, 2016
@HaoXJ
Copy link
Author

HaoXJ commented Jun 22, 2016

import sys
import pandas as pd
def series_rank_test(f):
    df = pd.read_hdf(f, key="bad_series")
    #df.rank()
    df.rank(ascending=False)
if __name__=="__main__":
    series_rank_test(sys.argv[1])

The data in https://github.com/HaoXJ/codefail/blob/master/data/test.h5

@jorisvandenbossche
Copy link
Member

I can reproduce this.

However, I can't trim it down to a simple example. If I recreate the Series from a dict (the result of to_dict on the h4 file):

from math import nan

s = pd.Series({'000022.XSHE': nan,
 '000089.XSHE': nan,
 '000099.XSHE': nan,
 '000429.XSHE': nan,
 '000507.XSHE': nan,
 '000520.XSHE': nan,
 '000548.XSHE': nan,
 '000828.XSHE': nan,
 '000900.XSHE': nan,
 '000905.XSHE': nan,
 '000916.XSHE': nan,
 '002040.XSHE': nan,
 '600004.XSHG': nan,
 '600009.XSHG': nan,
 '600012.XSHG': nan,
 '600020.XSHG': nan,
 '600026.XSHG': nan,
 '600033.XSHG': nan,
 '600035.XSHG': nan,
 '600057.XSHG': nan,
 '600077.XSHG': nan,
 '600106.XSHG': nan,
 '600115.XSHG': nan,
 '600119.XSHG': nan,
 '600125.XSHG': nan,
 '600153.XSHG': nan,
 '600180.XSHG': nan,
 '600190.XSHG': nan,
 '600221.XSHG': nan,
 '600269.XSHG': nan,
 '600270.XSHG': nan,
 '600317.XSHG': nan,
 '600350.XSHG': nan,
 '600368.XSHG': nan,
 '600377.XSHG': nan,
 '600428.XSHG': nan,
 '600548.XSHG': nan,
 '600561.XSHG': nan,
 '600575.XSHG': nan,
 '600611.XSHG': nan,
 '600650.XSHG': nan,
 '600662.XSHG': nan,
 '600676.XSHG': nan,
 '600692.XSHG': nan,
 '600717.XSHG': nan,
 '600751.XSHG': nan,
 '600787.XSHG': nan,
 '600794.XSHG': nan,
 '600798.XSHG': nan,
 '600834.XSHG': nan,
 '600896.XSHG': nan,
 '600897.XSHG': nan}, name="non_current_asset_to_total_asset")

 s.astype(object).rank(ascending=False)

This also crashes python for me. In any case, it has something to do with the fact it is object dtype, and not float (the read_hdf function returns such a series when all values are NaN).

@jorisvandenbossche jorisvandenbossche added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jun 23, 2016
@HaoXJ
Copy link
Author

HaoXJ commented Jun 29, 2016

So ,Bug exist in serial.rank() return nan in my expect .
Thanks for the answer.

@HaoXJ HaoXJ closed this as completed Jun 29, 2016
@HaoXJ HaoXJ reopened this Jun 29, 2016
@jreback jreback added this to the Next Major Release milestone Jun 30, 2016
@jreback jreback modified the milestones: 0.19.0, Next Major Release Aug 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants