Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle European decimal formats in parsers at a lower level #584

Closed
wesm opened this issue Jan 6, 2012 · 4 comments

Comments

@wesm
Copy link
Member

commented Jan 6, 2012

No description provided.

@wesm

This comment has been minimized.

Copy link
Member Author

commented Nov 27, 2012

Done in new parser engine

@wesm wesm closed this Nov 27, 2012

@keluc

This comment has been minimized.

Copy link

commented Dec 18, 2012

It seems that the decimal format works ok for the decimal sign or for the thousands but not combined.
Reopen the issue?

Example

import pandas as pd
from StringIO import StringIO
data = """A;B;C
0;0,11;0,11
1.000;1000,11;1.000,11
20.000;20000,22;20.000,22
300.000;300000,33;300.000,33
4.000.000;4000000,44;4.000.000,44
5.000.000.000;5000000000,55;5.000.000.000,55"""

df = pd.read_csv(StringIO(data), sep=';', thousands='.', decimal =',')
print df.dtypes
print df

Results in

A int64
B float64
C object
A B C
0 0 1.100000e-01 0,11
1 1000 1.000110e+03 1.000,11
2 20000 2.000022e+04 20.000,22
3 300000 3.000003e+05 300.000,33
4 4000000 4.000000e+06 4.000.000,44
5 5000000000 5.000000e+09 5.000.000.000,55

@wesm

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2012

I'll open a separate issues: currently thousands separators are not handled at all for floating point numbers

@matthias-ollig

This comment has been minimized.

Copy link

commented Jul 31, 2013

I wrote a converter that removes the thousand separator and tried to use that in combination with the dtype argument of read_csv without success. What does work though, is removing and casting at the same time:

# in your case you want to replace the dot as that is your thousand separator
rem_thousand_sep_and_cast_to_float = lambda x: pd.np.float(x.replace(",", "")) 

You can then use that function to convert the desired columns with the converters argument of read_csv. Let me know if that works for you.

Used in an example:

df = pd.io.parsers.read_csv("my.csv", sep=",", thousands=",",
                            converters={"a": rem_thousand_sep_and_cast_to_float,
                                        "b": rem_thousand_sep_and_cast_to_float})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.