Skip to content

ENH: Support for list of formats in pd.to_datetime() #55226

@NickiForte

Description

@NickiForte

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Frequently, pd.to_datetime() needs to be used to convert dates of mixed formats. Currently, it supports the format 'mixed', but as the documentation says, it is risky:

“mixed”, to infer the format for each element individually. This is risky, and you should probably use it along with dayfirst.

Wouldn't it be good if this mixed format interpretation could be done without being "risky"?

Feature Description

This mixed format interpretation could be done without ambiguity by letting format accept also lists of multiple formats, e.g. ["%Y-%m-%d", "%Y-%m-%d %H:%M:%S"]. Then the function would first convert all entries that can be converted using the first rule, and then proceed with trying to convert the remaining unparsed entries using the next rules in the list.

Example usage:

import pandas as pd
data = ['2023-09-21', '2023-09-22 14:30:00', '21/09/2023']
parsed_dates = pd.to_datetime(data, format=["%Y-%m-%d", "%Y-%m-%d %H:%M:%S", "%d/%m/%Y"])

Alternative Solutions

This example illustrates quite well how this could be implemented:

import pandas as pd

def to_datetime(series, format=None, errors="raise", **kwargs):
    temp_series = pd.Series(index=series.index, dtype="datetime64[ns]")
    for rule in format:
        temp_series.loc[temp_series.isnull()] = pd.to_datetime(
            series.loc[temp_series.isnull()], format=rule, errors="coerce", **kwargs
        )
        if temp_series.isnull().sum() == 0:
            series.loc[:] = temp_series
            break
    else:
        if errors == "raise":
            example_date = series[temp_series.isnull()].iloc[0]
            raise ValueError(
                f"Could not parse all dates with the provided rules, for example: {example_date}"
            )
        elif errors == "coerce":
            series.loc[:] = temp_series
    return series


to_datetime(
    pd.Series(["2020-01-01", "2020-01-02 00:00:01", "01.03.2020"]),
    format=[
        "%Y-%m-%d",
        "%Y-%m-%d %H:%M:%S",
    ],
    errors="coerce",
)

Additional Context

No response

Metadata

Metadata

Assignees

Labels

DatetimeDatetime data dtypeEnhancementNeeds DiscussionRequires discussion from core team before further action

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions