-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use Pandas to easily convert numbers formatted in the Brazilian style (1.234,56
) into numeric types.
Currently, pd.to_numeric()
does not support this format, and users have to manually apply .str.replace(".", "").replace(",", ".")
, which is not intuitive.
This feature would simplify data handling for users in Brazil and other countries with similar numerical formats.
Feature Description
Add a new function to_numeric_br() to automatically convert strings with the Brazilian numeric format into floats.
Proposed Implementation (Pseudocode)
def to_numeric_br(series, errors="raise"):
"""
Converts Brazilian-style numeric strings (1.234,56) into float.
Parameters:
----------
series : pandas.Series
Data to be converted.
errors : str, default 'raise'
- 'raise' : Throws an error for invalid values.
- 'coerce' : Converts invalid values to NaN.
- 'ignore' : Returns the original data in case of error.
Returns:
-------
pandas.Series with numeric values.
"""
Expected Behavior
import pandas as pd
df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50"]})
df["converted_values"] = to_numeric_br(df["values"], errors="coerce")
print(df)
Expected Output:
values converted_values
0 1.234,56 1234.56
1 5.600,75 5600.75
2 100,50 100.50
Alternatively, instead of a standalone function, this could be implemented as an enhancement to pd.to_numeric()
, adding a locale="br"
parameter.
Alternative Solutions
Currently, users must manually apply string replacements before using pd.to_numeric()
, like this:
df["values"] = df["values"].str.replace(".", "", regex=True).str.replace(",", ".", regex=True)
df["values"] = pd.to_numeric(df["values"], errors="coerce")
While this works, it is not user-friendly, especially for beginners.
Another alternative is using third-party packages like babel, but this requires additional dependencies and is not built into Pandas.
Additional Context
- Similar requests have been made by users handling locale-specific number formats.
- Would the maintainers prefer a standalone function (
to_numeric_br()
) or alocale
parameter inpd.to_numeric()
? - Happy to implement this if maintainers approve!