Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv: usecols doesn't work if separator is not "," #2733

Closed
bmu opened this issue Jan 23, 2013 · 10 comments
Closed

read_csv: usecols doesn't work if separator is not "," #2733

bmu opened this issue Jan 23, 2013 · 10 comments
Labels
Bug Enhancement IO Data IO issues that don't fit into a more specific label

Comments

@bmu
Copy link

bmu commented Jan 23, 2013

If I have data (without a header) separated by "," usecols works, however if the data is separated by white space it doesn't seem to work:

In [30]: data = '1,2,3\n4,5,6\n7,8,9'

In [31]: pd.read_csv(StringIO(data), usecols=[0, 1], header=None)
Out[31]: 
   0  1
0  1  2
1  4  5
2  7  8

In [32]: data = '1 2 3\n4 5 6\n7 8 9'

In [33]: pd.read_csv(StringIO(data), sep='\s+', header=None)
Out[33]: 
   0  1  2
0  1  2  3
1  4  5  6
2  7  8  9

In [34]: pd.read_csv(StringIO(data), sep='\s+', usecols=[0, 1], header=None)
Out[34]: 
   0  1  2
0  1  2  3
1  4  5  6
2  7  8  9
@garaud
Copy link
Contributor

garaud commented Jan 23, 2013

I had a strange problem with the parameter usecols yesterday -- with a ';' separator. I'll try to give an example. I didn't get that it might be a separator problem.

@garaud
Copy link
Contributor

garaud commented Jan 23, 2013

I wrote a new test case in test_parsers:TestCParserLowMemory. This test fails according to the problem.

See garaud@fe36220

I don't get what it's wrong. I took a look on #2654. I think I'll read pandas/src/parser.pyx.

@wesm
Copy link
Member

wesm commented Jan 23, 2013

The C parser does not support multi-character and regex delimiters yet. Try delim_whitespace=True. The pure Python parser does not have usecols implemented but this shouldn't be too difficult to do

@garaud
Copy link
Contributor

garaud commented Jan 23, 2013

OK. Thanks ! Good to know. Fisrt, I'll try with the delim_whitespace parameter. Then, I'll propose a "usecols" implementation in the pure Python parser (with its test).

Edit:

In [24]: pd.read_csv(StringIO(data), delim_whitespace=True, header=None, usecols=(1,2))
Out[24]: 
   1  2
0  2  3
1  5  6
2  8  9

works fine.

@garaud
Copy link
Contributor

garaud commented Jan 24, 2013

See #2748

A test case with the delim_whitespace parameter.

@wesm
Copy link
Member

wesm commented Feb 10, 2013

i added an explicit error message for this case. pushing fixing it til later

@dsm054
Copy link
Contributor

dsm054 commented Mar 1, 2014

Is this still an issue?

>>> import pandas as pd
>>> from StringIO import StringIO
>>> pd.__version__
'0.13.1-343-g6efa4c1'
>>> data = '1 2 3\n4 5 6\n7 8 9'
>>> pd.read_csv(StringIO(data), sep='\s+', usecols=[0, 1], header=None)
   0  1
0  1  2
1  4  5
2  7  8

[3 rows x 2 columns]
>>> pd.read_csv(StringIO(data), sep='\s+', usecols=[0, 1], header=None, engine='c')
   0  1
0  1  2
1  4  5
2  7  8

[3 rows x 2 columns]
>>> pd.read_csv(StringIO(data), sep='\s+', usecols=[0, 1], header=None, engine='python') 
   0  1
0  1  2
1  4  5
2  7  8

[3 rows x 2 columns]
>>> pd.read_csv(StringIO(data.replace(" ",";")), sep=';', usecols=[1, 2], header=None, engine='c')
   1  2
0  2  3
1  5  6
2  8  9

[3 rows x 2 columns]
>>> pd.read_csv(StringIO(data.replace(" ",";")), sep=';', usecols=[1, 2], header=None, engine='python')
   1  2
0  2  3
1  5  6
2  8  9

[3 rows x 2 columns]

@jreback
Copy link
Contributor

jreback commented Mar 1, 2014

hmm seems ok

can u take a quick look and see if u can find a release note that fixed this?

@dsm054
Copy link
Contributor

dsm054 commented Mar 1, 2014

Maybe it's #5211?

@jreback
Copy link
Contributor

jreback commented Mar 1, 2014

closed by #5211

@jreback jreback closed this as completed Mar 1, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

5 participants