Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read csv thousands separator #4322

Closed
hayd opened this issue Jul 22, 2013 · 10 comments · Fixed by #4945
Closed

read csv thousands separator #4322

hayd opened this issue Jul 22, 2013 · 10 comments · Fixed by #4945
Labels
Bug IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Jul 22, 2013

From this SO question, with input to include thousands:


In [1]: s = '06.02.2013;13:00;1.000,215;0,215;0,185;0,205;0,00'

In [2]: pd.read_csv(StringIO(s), sep=';', header=None, parse_dates={'Dates': [0, 1]}, index_col=0, decimal=',')
Out[2]:
                              2      3      4      5  6
Dates
2013-06-02 13:00:00  10.000,215  0.215  0.185  0.205  0

In [3]: pd.read_csv(StringIO(s), sep=';', header=None, parse_dates={'Dates': [0, 1]}, index_col=0, decimal=',', thousands='.')
Out[3]:
                        2      3      4      5  6
Dates
6022013 13:00   1.000,215  0.215  0.185  0.205  0

Note: the Dates column (as well as the thousands not being converted.

@hayd
Copy link
Contributor Author

hayd commented Jul 31, 2013

Possibly this is a dupe of #2594

@jreback
Copy link
Contributor

jreback commented Jul 31, 2013

@hayd
Copy link
Contributor Author

hayd commented Jul 31, 2013

@jreback not sure what you're asking?

@jreback
Copy link
Contributor

jreback commented Jul 31, 2013

is this a bug? e.g. the dtypes not working with thousands or a bug in thousands sep? or a feature request (which is there already)?

@hayd
Copy link
Contributor Author

hayd commented Jul 31, 2013

@jreback this is a bug, thousand separator doesn't seem to work

in the docs you link to it says

For large integers that have been written with a thousands separator, you can set the thousands keyword to True so that integers will be parsed correctly:

in the docstring for read_csv it asks for:

thousands : str, default None
Thousands separator

Doesn't seem to be working with '.', as well as screwing up the dates.

@jreback
Copy link
Contributor

jreback commented Jul 31, 2013

ahh ok...i c now.....

guyrt added a commit to guyrt/pandas that referenced this issue Aug 23, 2013
Adds support for the thousands character in csv parser for floats.

Updated docs to reflect bug fix.
@jreback
Copy link
Contributor

jreback commented Aug 23, 2013

closed by #4598

@jreback jreback closed this as completed Aug 23, 2013
@hayd
Copy link
Contributor Author

hayd commented Aug 26, 2013

@jreback The date aspect of this pr is still not fixed. For some reason the thousands separator attacks the date and makes it a string.

@hayd
Copy link
Contributor Author

hayd commented Aug 26, 2013

Wow, so this is a very edge case... it's cos the date column is just 06.02.2013 which is read as a number 0602013... it's possible dates are sometimes written this way on the continent (along with . thousands): http://en.wikipedia.org/wiki/Date_and_time_notation_in_Europe

Not sure what solution is.

@jreback
Copy link
Contributor

jreback commented Aug 26, 2013

but it should ignore dates columns entirely (for thousands parsing...).....hmmm...why don't you open a separate issue and can cross-link it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants