-
-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c-parser branch: Iteration over an open file handle makes the parser fail #2071
Comments
In general terms, it seems that any iteration on the open file handle breaks things: In [26]: with open("test.txt") as handle:
for line in handle:
print line
if "CCC" in line:
break
res = pandas.read_table(handle, squeeze=True, header=None)
....:
AAA
BBB
CCC
In [27]: res
Out[27]:
Empty DataFrame
Columns: array([], dtype=object)
Index: array([], dtype=object) While this is not the case for the Python parser: In [28]: with open("test.txt") as handle:
for line in handle:
print line
if "CCC" in line:
break
res = pandas.read_table(handle, squeeze=True, header=None, engine="python")
....:
AAA
BBB
CCC
In [29]: res
Out[29]:
0 DDD
1 EEE
2 FFF
3 GGG
4 None
Name: X0 |
Gave it a go again after the merge to master, this time the parser simply segfaults with this case (100% reproducible). |
And here's a backtrace: Program received signal SIGSEGV, Segmentation fault.
buffer_rd_bytes (source=0xf068c0, nbytes=<optimized out>, bytes_read=0x7fffffffb5c8, status=0x7fffffffb5c4) at pandas/src/parser/io.c:128
128 if (!PyBytes_Check(result)) {
(gdb) bt
#0 buffer_rd_bytes (source=0xf068c0, nbytes=<optimized out>, bytes_read=0x7fffffffb5c8, status=0x7fffffffb5c4) at pandas/src/parser/io.c:128
#1 0x00007fffeeba2637 in parser_buffer_bytes (self=self@entry=0x6e6c10, nbytes=<optimized out>) at pandas/src/parser/parser.c:493
#2 0x00007fffeeba2d1f in _tokenize_helper (self=0x6e6c10, nrows=nrows@entry=1, all=all@entry=0) at pandas/src/parser/parser.c:1188
#3 0x00007fffeeba2dd7 in tokenize_nrows (self=<optimized out>, nrows=nrows@entry=1) at pandas/src/parser/parser.c:1218
#4 0x00007fffeeb7f9bd in __pyx_f_6pandas_7_parser_10TextReader__tokenize_rows (__pyx_v_self=0x6dcd20, __pyx_v_nrows=1) at pandas/src/parser.c:5893
#5 0x00007fffeeb81a21 in __pyx_f_6pandas_7_parser_10TextReader__get_header (__pyx_v_self=0x6dcd20) at pandas/src/parser.c:4946
#6 0x00007fffeeb8306b in __pyx_pf_6pandas_7_parser_10TextReader___cinit__ (__pyx_v_verbose=0x7ffff7d90a80 <_Py_ZeroStruct>, __pyx_v_skip_footer=0x61e9b0, __pyx_v_skiprows=0x7fffef08fde8, __pyx_v_low_memory=0x7ffff7d90a60 <_Py_TrueStruct>,
__pyx_v_use_unsigned=0x7ffff7d90a80 <_Py_ZeroStruct>, __pyx_v_compact_ints=0x7ffff7d90a80 <_Py_ZeroStruct>, __pyx_v_na_values=0x7fffef08f878, __pyx_v_na_filter=0x7ffff7d90a60 <_Py_TrueStruct>,
__pyx_v_warn_bad_lines=0x7ffff7d90a60 <_Py_TrueStruct>, __pyx_v_error_bad_lines=0x7ffff7d90a60 <_Py_TrueStruct>, __pyx_v_usecols=0x7ffff7da4e20 <_Py_NoneStruct>, __pyx_v_dtype=0x7ffff7da4e20 <_Py_NoneStruct>,
__pyx_v_thousands=0x7ffff7da4e20 <_Py_NoneStruct>, __pyx_v_decimal=<optimized out>, __pyx_v_encoding=0x7ffff7da4e20 <_Py_NoneStruct>, __pyx_v_quoting=0x61e9b0, __pyx_v_quotechar=0x7ffff6b3ff08,
__pyx_v_doublequote=0x7ffff7d90a60 <_Py_TrueStruct>, __pyx_v_escapechar=0x7ffff7da4e20 <_Py_NoneStruct>, __pyx_v_skipinitialspace=0x7ffff7d90a80 <_Py_ZeroStruct>, __pyx_v_as_recarray=0x7ffff7d90a80 <_Py_ZeroStruct>,
__pyx_v_factorize=0x0, __pyx_v_converters=0xf4bf10, __pyx_v_compression=<optimized out>, __pyx_v_delim_whitespace=<optimized out>, __pyx_v_tokenize_chunksize=<optimized out>, __pyx_v_memory_map=<optimized out>,
__pyx_v_names=0x7ffff7da4e20 <_Py_NoneStruct>, __pyx_v_header=<optimized out>, __pyx_v_delimiter=0x7ffff7da4e20 <_Py_NoneStruct>, __pyx_v_source=0x7ffff7f385d0, __pyx_v_self=<optimized out>, __pyx_v_comment=<optimized out>,
__pyx_v_buffer_lines=<optimized out>) at pandas/src/parser.c:3601
#7 __pyx_pw_6pandas_7_parser_10TextReader_1__cinit__ (__pyx_v_self=__pyx_v_self@entry=0x6dcd20, __pyx_args=__pyx_args@entry=0x7ffff7eec310, __pyx_kwds=__pyx_kwds@entry=0xf4ccc0) at pandas/src/parser.c:2481
#8 0x00007fffeeb8699e in __pyx_tp_new_6pandas_7_parser_TextReader (t=<optimized out>, a=0x7ffff7eec310, k=0xf4ccc0) at pandas/src/parser.c:19587
|
Thanks. I'll have a look |
The underlying problem is that the new parser relies on being able to call
The case where calling
|
In short, doing like the example should be considered broken? I assume the Python parser did not rely upon read() directly, or did it work by pure chance? |
It worked only because the Python code used the |
In data giovedì 15 novembre 2012 08:03:53, Wes McKinney ha scritto:
That's good enough for now, I merely switched to the python engine all the Luca Beltrame - KDE Forums team |
An example is better than words:
This works with the python engine. Notice that the handle is not really iterated through: when debugging I noticed that after iterator usage, the handle keeps on staying at the same file line (IOW the parser is not iterating on it at all).
The text was updated successfully, but these errors were encountered: