Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with read_csv(io.StringIO("a\na"), float_precision="round_trip") #15140

Closed
Rufflewind opened this issue Jan 16, 2017 · 4 comments
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@Rufflewind
Copy link
Contributor

Rufflewind commented Jan 16, 2017

Code Sample, a copy-pastable example if possible

import io, pandas
pandas.read_csv(io.StringIO("a\na"), float_precision="round_trip")

The input needs to be at least two lines and must contain non-numerical data.

Experienced this problem on Arch Linux with Python 3.

Problem description

Why is the current behaviour a problem? (1) I can't parse a CSV file containing text with round_trip precision (2) Possible security vulnerability (3) It fills up my hard drive with core dumps

Expected Output

Nothing.

Actual Output

#0  0x00007ffff7440350 in PyErr_Restore () from /usr/lib/libpython3.6m.so.1.0
#1  0x00007ffff7440a62 in PyErr_FormatV () from /usr/lib/libpython3.6m.so.1.0
#2  0x00007ffff7440b24 in PyErr_Format () from /usr/lib/libpython3.6m.so.1.0
#3  0x00007ffff73c2c60 in PyOS_string_to_double ()
   from /usr/lib/libpython3.6m.so.1.0
#4  0x00007fffe5359a73 in ?? ()
   from /usr/lib/python3.6/site-packages/pandas/parser.cpython-36m-x86_64-linux-gnu.so
#5  0x00007fffe53675c6 in ?? ()
   from /usr/lib/python3.6/site-packages/pandas/parser.cpython-36m-x86_64-linux-gnu.so
#6  0x00007fffe535c74a in ?? ()
   from /usr/lib/python3.6/site-packages/pandas/parser.cpython-36m-x86_64-linux-gnu.so
#7  0x00007fffe537db89 in ?? ()
   from /usr/lib/python3.6/site-packages/pandas/parser.cpython-36m-x86_64-linux-gnu.so
#8  0x00007ffff74220b6 in PyCFunction_Call () from /usr/lib/libpython3.6m.so.1.0
#9  0x00007fffe534fe29 in ?? ()
   from /usr/lib/python3.6/site-packages/pandas/parser.cpython-36m-x86_64-linux-gnu.so
#10 0x00007fffe536f78e in ?? ()
   from /usr/lib/python3.6/site-packages/pandas/parser.cpython-36m-x86_64-linux-gnu.so
#11 0x00007fffe5344cf4 in ?? ()
   from /usr/lib/python3.6/site-packages/pandas/parser.cpython-36m-x86_64-linux-gnu.so
#12 0x00007ffff7421ddc in _PyCFunction_FastCallDict ()
   from /usr/lib/libpython3.6m.so.1.0
#13 0x00007ffff740c21f in ?? () from /usr/lib/libpython3.6m.so.1.0
#14 0x00007ffff73cd067 in _PyEval_EvalFrameDefault ()
   from /usr/lib/libpython3.6m.so.1.0
#15 0x00007ffff740ade9 in ?? () from /usr/lib/libpython3.6m.so.1.0
#16 0x00007ffff740bf9a in ?? () from /usr/lib/libpython3.6m.so.1.0
#17 0x00007ffff740c303 in ?? () from /usr/lib/libpython3.6m.so.1.0
#18 0x00007ffff73cd067 in _PyEval_EvalFrameDefault ()
   from /usr/lib/libpython3.6m.so.1.0
#19 0x00007ffff740aaa1 in ?? () from /usr/lib/libpython3.6m.so.1.0
#20 0x00007ffff740bf9a in ?? () from /usr/lib/libpython3.6m.so.1.0
#21 0x00007ffff740c303 in ?? () from /usr/lib/libpython3.6m.so.1.0
#22 0x00007ffff73cd067 in _PyEval_EvalFrameDefault ()
   from /usr/lib/libpython3.6m.so.1.0
#23 0x00007ffff740bd4a in ?? () from /usr/lib/libpython3.6m.so.1.0
#24 0x00007ffff740c303 in ?? () from /usr/lib/libpython3.6m.so.1.0
#25 0x00007ffff73cd067 in _PyEval_EvalFrameDefault ()
   from /usr/lib/libpython3.6m.so.1.0
#26 0x00007ffff740ade9 in ?? () from /usr/lib/libpython3.6m.so.1.0
#27 0x00007ffff740bf9a in ?? () from /usr/lib/libpython3.6m.so.1.0
#28 0x00007ffff740c303 in ?? () from /usr/lib/libpython3.6m.so.1.0
#29 0x00007ffff73cde87 in _PyEval_EvalFrameDefault ()
   from /usr/lib/libpython3.6m.so.1.0
#30 0x00007ffff740c757 in PyEval_EvalCodeEx () from /usr/lib/libpython3.6m.so.1.0
#31 0x00007ffff73ccd4b in PyEval_EvalCode () from /usr/lib/libpython3.6m.so.1.0
#32 0x00007ffff74ae112 in ?? () from /usr/lib/libpython3.6m.so.1.0
#33 0x00007ffff74b097d in PyRun_FileExFlags () from /usr/lib/libpython3.6m.so.1.0
#34 0x00007ffff74b0b67 in PyRun_SimpleFileExFlags ()
   from /usr/lib/libpython3.6m.so.1.0
#35 0x00007ffff74a4a91 in Py_Main () from /usr/lib/libpython3.6m.so.1.0
#36 0x0000000000400a5d in main ()

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.13-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 33.0.0 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 2.0.0rc2+2914.g1fa4dd705.dirty openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: 0.9.2 apiclient: 1.5.5 sqlalchemy: 1.1.4 pymysql: None psycopg2: None jinja2: 2.9.4 boto: None pandas_datareader: None
@Rufflewind
Copy link
Contributor Author

Rufflewind commented Jan 16, 2017

It appears to have died while trying to read the thread state:

oldtype = tstate->curexc_type;

tstate is null because the global interpreter lock has been released:

with nogil:
    error = _try_double_nogil(parser, col, line_start, line_end,
                              na_filter, na_hashset, use_na_flist,
                              na_fset, NA, data, &na_count)

This is problematic, as the round_trip converter does call back into Python:

double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
                  int skip_trailing) {
#if PY_VERSION_HEX >= 0x02070000
    return PyOS_string_to_double(p, q, 0);
#else
    return strtod(p, q);
#endif
}

The funny thing is that even if I delete with nogil:, it still fails with a Python exception, because the code in _try_double_nogil was never even designed to handle Python exceptions to begin with.

@Rufflewind
Copy link
Contributor Author

This appears to work around the problem (kept the GIL and silenced the Python exception):

diff --git a/pandas/parser.pyx b/pandas/parser.pyx
index bd793c98e..dc0292e5b 100644
--- a/pandas/parser.pyx
+++ b/pandas/parser.pyx
@@ -1699,10 +1699,9 @@ cdef _try_double(parser_t *parser, int col, int line_start, int line_end,
     result = np.empty(lines, dtype=np.float64)
     data = <double *> result.data
     na_fset = kset_float64_from_list(na_flist)
-    with nogil:
-        error = _try_double_nogil(parser, col, line_start, line_end,
-                                  na_filter, na_hashset, use_na_flist,
-                                  na_fset, NA, data, &na_count)
+    error = _try_double_nogil(parser, col, line_start, line_end,
+                              na_filter, na_hashset, use_na_flist,
+                              na_fset, NA, data, &na_count)
     kh_destroy_float64(na_fset)
     if error != 0:
         return None, None
diff --git a/pandas/src/parser/tokenizer.c b/pandas/src/parser/tokenizer.c
index 87e17fe5f..77c36ef8a 100644
--- a/pandas/src/parser/tokenizer.c
+++ b/pandas/src/parser/tokenizer.c
@@ -1774,7 +1774,9 @@ double precise_xstrtod(const char *str, char **endptr, char decimal, char sci,
 double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
                   int skip_trailing) {
 #if PY_VERSION_HEX >= 0x02070000
-    return PyOS_string_to_double(p, q, 0);
+    double r = PyOS_string_to_double(p, q, 0);
+    PyErr_Clear();
+    return r;
 #else
     return strtod(p, q);
 #endif

@jreback
Copy link
Contributor

jreback commented Jan 17, 2017

yeah this looks to be violating gil holding. Welcome for you to add a test / fix.

prob simply best just hold the gil if float precision is specified, since this is not the default.

@Rufflewind
Copy link
Contributor Author

I made a pull request here: #15148

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
`round_trip` calls back into Python, so the GIL must be held.  It also
fails to silence the Python exception, leading to spurious errors.
Closes pandas-dev#15140.

Author: Phil Ruffwind <rf@rufflewind.com>

Closes pandas-dev#15148 from Rufflewind/master and squashes the following commits:

c513d2e [Phil Ruffwind] BUG: Segfault due to float_precision='round_trip'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants