Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read csv thousands separator #4322

Closed
hayd opened this issue Jul 22, 2013 · 10 comments · Fixed by #4945

Comments

@hayd
Copy link
Contributor

commented Jul 22, 2013

From this SO question, with input to include thousands:


In [1]: s = '06.02.2013;13:00;1.000,215;0,215;0,185;0,205;0,00'

In [2]: pd.read_csv(StringIO(s), sep=';', header=None, parse_dates={'Dates': [0, 1]}, index_col=0, decimal=',')
Out[2]:
                              2      3      4      5  6
Dates
2013-06-02 13:00:00  10.000,215  0.215  0.185  0.205  0

In [3]: pd.read_csv(StringIO(s), sep=';', header=None, parse_dates={'Dates': [0, 1]}, index_col=0, decimal=',', thousands='.')
Out[3]:
                        2      3      4      5  6
Dates
6022013 13:00   1.000,215  0.215  0.185  0.205  0

Note: the Dates column (as well as the thousands not being converted.

@hayd

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2013

Possibly this is a dupe of #2594

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2013

@hayd

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2013

@jreback not sure what you're asking?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2013

is this a bug? e.g. the dtypes not working with thousands or a bug in thousands sep? or a feature request (which is there already)?

@hayd

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2013

@jreback this is a bug, thousand separator doesn't seem to work

in the docs you link to it says

For large integers that have been written with a thousands separator, you can set the thousands keyword to True so that integers will be parsed correctly:

in the docstring for read_csv it asks for:

thousands : str, default None
Thousands separator

Doesn't seem to be working with '.', as well as screwing up the dates.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2013

ahh ok...i c now.....

guyrt added a commit to guyrt/pandas that referenced this issue Aug 23, 2013
BUG: fixes issue pandas-dev#4322
Adds support for the thousands character in csv parser for floats.

Updated docs to reflect bug fix.
@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2013

closed by #4598

@jreback jreback closed this Aug 23, 2013

@hayd

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2013

@jreback The date aspect of this pr is still not fixed. For some reason the thousands separator attacks the date and makes it a string.

@hayd

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2013

Wow, so this is a very edge case... it's cos the date column is just 06.02.2013 which is read as a number 0602013... it's possible dates are sometimes written this way on the continent (along with . thousands): http://en.wikipedia.org/wiki/Date_and_time_notation_in_Europe

Not sure what solution is.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2013

but it should ignore dates columns entirely (for thousands parsing...).....hmmm...why don't you open a separate issue and can cross-link it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.