Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Issues reading ragged CSV files #2981

Closed
wesm opened this Issue · 4 comments

3 participants

@y-p y-p closed this
@y-p y-p reopened this
@dsm054

I just came across another case where because the input data is very ragged, it's surprisingly difficult even to get the data into a DataFrame for processing. Entirely independent of performance, it'd be nice to have a canonical way which Just Worked(tm) for this.

@wesm wesm closed this in 900a552
@wesm
Owner

And we're in business ("just works" now):

In [5]: paste
data = """1,2,3
1,2,3,4
1,2,3,4,5
1,2
1,2,3,4"""

## -- End pasted text --

In [6]: pd.read_csv(StringIO(data), names=['a', 'b', 'c', 'd', 'e'])
Out[6]: 
   a  b   c   d   e
0  1  2   3 NaN NaN
1  1  2   3   4 NaN
2  1  2   3   4   5
3  1  2 NaN NaN NaN
4  1  2   3   4 NaN
@dsm054

I think the fix may have unintended consequences:

>>> import pandas as pd
>>> from StringIO import StringIO
>>> 
>>> data = """
... 1,2
... 3,4,5
... """.strip()
>>> 
>>> pd.read_csv(StringIO(data), header=None, names=range(3))
   0  1   2
0  1  2 NaN
1  3  4   5
>>> pd.read_csv(StringIO(data), header=None, names=range(20))
*** glibc detected *** python: realloc(): invalid next size: 0x0a58f158 ***

Not sure if it's worth opening a separate issue or not.

@y-p y-p reopened this
@wesm
Owner

Thanks. I will have a look at the C code

@wesm wesm closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.