Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when conctructing DataFrame with specified datetime dtype of one column #5191

Closed
agravier opened this issue Oct 12, 2013 · 3 comments · Fixed by #5192
Closed
Labels
Bug Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@agravier
Copy link

Description

When building a DataFrame with specified column names and dtypes, one might expect one of two possible behaviours:

  • The column names and dtypes specs are perfectly cromulent, and Pandas goes on to build the object.
  • The column names or dtypes don't match the data shape, or the dtypes are badly specified, and Pandas gives an error message.

Instead, I have encountered a segmentation fault.

Now, it is unclear to me whether my column names spec and dtypes are correctly written and if my data is proper too (see example below). But in any case, it should not crash.

Reproducing

To reproduce, please run:

import pandas as pd
import datetime as dt
import itertools as it

df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
                       columns=["A", "B", "C"],
                       dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])

Modes of failure

I have found that the above script always crashes on my machine (see next section for detailed configuration information). It does it in 2 possible ways:

First mode of failure: hanging

Python 2.7.5 (default, Sep  6 2013, 09:55:21) 
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>> 
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
...                        columns=["A", "B", "C"],
...                        dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
*** Error in `python': corrupted double-linked list: 0x0000000001bfd8e0 ***

After that line, the terminal is dead.

Second mode of failure: segfault

Python 2.7.5 (default, Sep  6 2013, 09:55:21)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>>
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
...                        columns=["A", "B", "C"],
...                        dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
*** Error in `python2': double free or corruption (!prev): 0x00000000027161d0 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x72ecf)[0x7f2bd7ab9ecf]
/usr/lib/libc.so.6(+0x7869e)[0x7f2bd7abf69e]
/usr/lib/libc.so.6(+0x79377)[0x7f2bd7ac0377]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(_field_transfer_data_free+0x2e)[0x7f2bd634d47e]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0x9a1c9)[0x7f2bd63a61c9]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xa4a3a)[0x7f2bd63b0a3a]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xab0a1)[0x7f2bd63b70a1]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xb838b)[0x7f2bd63c438b]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xb8643)[0x7f2bd63c4643]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4c2f)[0x7f2bd80ec2ef]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(+0x6dbdd)[0x7f2bd807cbdd]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(+0x5841d)[0x7f2bd806741d]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(+0x9de57)[0x7f2bd80ace57]
/usr/lib/libpython2.7.so.1.0(+0x9cbcf)[0x7f2bd80abbcf]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1321)[0x7f2bd80e89e1]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f2bd80ed392]
/usr/lib/libpython2.7.so.1.0(+0xf708f)[0x7f2bd810608f]
/usr/lib/libpython2.7.so.1.0(PyRun_InteractiveOneFlags+0x140)[0x7f2bd8107fb0]
/usr/lib/libpython2.7.so.1.0(PyRun_InteractiveLoopFlags+0x4e)[0x7f2bd810819e]
/usr/lib/libpython2.7.so.1.0(PyRun_AnyFileExFlags+0x3e)[0x7f2bd81087fe]
/usr/lib/libpython2.7.so.1.0(Py_Main+0xc7f)[0x7f2bd8118c2f]
/usr/lib/libc.so.6(__libc_start_main+0xf5)[0x7f2bd7a68bc5]
python2[0x400741]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:11 1886483                            /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
00600000-00601000 r--p 00000000 08:11 1886483                            /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
00601000-00602000 rw-p 00001000 08:11 1886483                            /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
012d1000-029b7000 rw-p 00000000 00:00 0                                  [heap]
7f2bced0d000-7f2bced11000 r-xp 00000000 08:01 923895                     /usr/lib/python2.7/lib-dynload/termios.so
7f2bced11000-7f2bcef10000 ---p 00004000 08:01 923895                     /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef10000-7f2bcef11000 r--p 00003000 08:01 923895                     /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef11000-7f2bcef13000 rw-p 00004000 08:01 923895                     /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef13000-7f2bcef26000 r-xp 00000000 08:11 57747                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcef26000-7f2bcf125000 ---p 00013000 08:11 57747                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf125000-7f2bcf126000 r--p 00012000 08:11 57747                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf126000-7f2bcf127000 rw-p 00013000 08:11 57747                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf127000-7f2bcf171000 r-xp 00000000 08:11 57858                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf171000-7f2bcf370000 ---p 0004a000 08:11 57858                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf370000-7f2bcf371000 r--p 00049000 08:11 57858                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf371000-7f2bcf376000 rw-p 0004a000 08:11 57858                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf376000-7f2bcf377000 rw-p 00000000 00:00 0
7f2bcf377000-7f2bcf3d9000 r-xp 00000000 08:01 798526                     /usr/lib/libssl.so.1.0.0
7f2bcf3d9000-7f2bcf5d8000 ---p 00062000 08:01 798526                     /usr/lib/libssl.so.1.0.0
7f2bcf5d8000-7f2bcf5dc000 r--p 00061000 08:01 798526                     /usr/lib/libssl.so.1.0.0
7f2bcf5dc000-7f2bcf5e3000 rw-p 00065000 08:01 798526                     /usr/lib/libssl.so.1.0.0
7f2bcf5e3000-7f2bcf5eb000 r-xp 00000000 08:01 923889                     /usr/lib/python2.7/lib-dynload/_ssl.soAborted (core dumped)

Configuration information

Python:

Python 2.7.5

uname -a:

Linux agravier-archvm 3.10.10-1-ARCH #1 SMP PREEMPT Fri Aug 30 11:30:06 CEST 2013 x86_64 GNU/Linux

pip freeze --local:

QSTK==0.2.6
matplotlib==1.3.0
nose==1.3.0
numpy==1.7.1
pandas==0.12.0
pyparsing==2.0.1
python-dateutil==2.1
pytz==2013.7
scikit-learn==0.14.1
scipy==0.12.1
six==1.4.1
yolk==0.4.3

Concluding remarks

Note that in the line that I use to create the data list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)), the number of rows has an influence on whether Python crashes. If less than 9, there is the output:

Python 2.7.5 (default, Sep  6 2013, 09:55:21)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>>
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 8)),
...                        columns=["A", "B", "C"],
...                        dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
>>> df_test
                     A                           B                            C
0  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
1  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
2  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
3  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
4  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
5  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
6  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
7  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)

Now, this output doesn't make much sense to me, it doesn't seem to respect the dtype spec that I give, but it's very possible that I don't understand the dtype spec well and that it's actually perfectly sensible output.

@jreback
Copy link
Contributor

jreback commented Oct 12, 2013

specifying a dtype will try to coerce to that dtype, but must be a singluar (not compound type), issue #4464 at some point may allow this

Works fine w/o specifing a dtype.

In [8]: df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
                       columns=["A", "B", "C"])

In [9]: df_test
Out[9]: 
                    A   B   C
0 2001-01-01 00:00:00  aa  20
1 2001-01-01 00:00:00  aa  20
2 2001-01-01 00:00:00  aa  20
3 2001-01-01 00:00:00  aa  20
4 2001-01-01 00:00:00  aa  20
5 2001-01-01 00:00:00  aa  20
6 2001-01-01 00:00:00  aa  20
7 2001-01-01 00:00:00  aa  20
8 2001-01-01 00:00:00  aa  20

In [10]: df_test.dtypes
Out[10]: 
A    datetime64[ns]
B            object
C             int64
dtype: object

You can cast a specific column if you want

In [11]: df_test['C'] = df_test['C'].astype(np.int32)

In [12]: df_test.dtypes
Out[12]: 
A    datetime64[ns]
B            object
C             int32
dtype: object

datetime64[h] is not a valid dtype in pandas, nor is it necessary. str is converted to object and is not necessary either.

that said, this shouldn't core.

@agravier
Copy link
Author

@jreback Thanks for the explanations. I suspected that my dtype specification was incorrect one way or another.

@jreback
Copy link
Contributor

jreback commented Oct 12, 2013

yep...thanks for the bug report....putting an error message for 0.13...so should raise NotImplementedError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants