Skip to content

ENH: Add to_numeric_br() function to convert Brazilian-formatted numbers #60998

@Veras-D

Description

@Veras-D

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use Pandas to easily convert numbers formatted in the Brazilian style (1.234,56) into numeric types.

Currently, pd.to_numeric() does not support this format, and users have to manually apply .str.replace(".", "").replace(",", "."), which is not intuitive.

This feature would simplify data handling for users in Brazil and other countries with similar numerical formats.

Feature Description

Add a new function to_numeric_br() to automatically convert strings with the Brazilian numeric format into floats.

Proposed Implementation (Pseudocode)

def to_numeric_br(series, errors="raise"):
    """
    Converts Brazilian-style numeric strings (1.234,56) into float.

    Parameters:
    ----------
    series : pandas.Series
        Data to be converted.
    errors : str, default 'raise'
        - 'raise' : Throws an error for invalid values.
        - 'coerce' : Converts invalid values to NaN.
        - 'ignore' : Returns the original data in case of error.

    Returns:
    -------
    pandas.Series with numeric values.
    """

Expected Behavior

import pandas as pd

df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50"]})
df["converted_values"] = to_numeric_br(df["values"], errors="coerce")

print(df)

Expected Output:

      values  converted_values
0  1.234,56          1234.56
1  5.600,75          5600.75
2    100,50           100.50

Alternatively, instead of a standalone function, this could be implemented as an enhancement to pd.to_numeric(), adding a locale="br" parameter.

Alternative Solutions

Currently, users must manually apply string replacements before using pd.to_numeric(), like this:

df["values"] = df["values"].str.replace(".", "", regex=True).str.replace(",", ".", regex=True)
df["values"] = pd.to_numeric(df["values"], errors="coerce")

While this works, it is not user-friendly, especially for beginners.

Another alternative is using third-party packages like babel, but this requires additional dependencies and is not built into Pandas.

Additional Context

  • Similar requests have been made by users handling locale-specific number formats.
  • Would the maintainers prefer a standalone function (to_numeric_br()) or a locale parameter in pd.to_numeric()?
  • Happy to implement this if maintainers approve!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsEnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions