You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The read_csv has a parameter date_format which can can be "str or dict of columns", see documentation.
So for parsing date columns you can either use:
str = all columns have the same datetime format
dict = explicitly specify the datetime format for every individual datetime column
However, in practise a csv file with different datetime formats, usually it's just either a date or a datetime (or time) format. In other words the date format can differ a lot for different csv files, but usually it doesn't differ that much within one file. Theoretically there could be US and European date formats mixed in one csv file, but I work with a lot of csv data and I've never seen this. From my expericence this is a very uncommon use-case.
Feature Description
So for example, a csv file can have 10 date columns formatted like 01-05-2024 and 5 columns formatted like 05-05-2024 12:30.
Reading such a csv file with read_csv with many datetime columns, just the str parameter is not sufficient but the dict parameter is a bit overkill because you have to explicitly set the format for each column when basically there are just two groups, so it's not very practical.
So my feature request is:
Can the read_csv be updated so that the date_format parameter also accepts just a list of dateformat strings for the date columns? So for example date_format=['%d-%m-%Y', '%d-%m-%Y %H:%M:%S']
Alternative Solutions
Alternatively, I think it could be practical for most typical use-cases to give groups of dateformats. So instead of having to supply a parameter for each individual column, like this:
I am ok with this idea, but I think this can be accomplished in multiple ways without affecting practicality, so I'm not sure if this is necessary.
As for the alternative idea, I'm not really a fan of it, that dict format is not very readable in my opinion, and with just a few lines of code you can convert it to the appropriate dict format.
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
The
read_csv
has a parameterdate_format
which can can be "str or dict of columns", see documentation.So for parsing date columns you can either use:
However, in practise a csv file with different datetime formats, usually it's just either a date or a datetime (or time) format. In other words the date format can differ a lot for different csv files, but usually it doesn't differ that much within one file. Theoretically there could be US and European date formats mixed in one csv file, but I work with a lot of csv data and I've never seen this. From my expericence this is a very uncommon use-case.
Feature Description
So for example, a csv file can have 10 date columns formatted like
01-05-2024
and 5 columns formatted like05-05-2024 12:30
.Reading such a csv file with
read_csv
with many datetime columns, just thestr
parameter is not sufficient but thedict
parameter is a bit overkill because you have to explicitly set the format for each column when basically there are just two groups, so it's not very practical.So my feature request is:
Can the read_csv be updated so that the
date_format
parameter also accepts just a list of dateformat strings for the date columns? So for exampledate_format=['%d-%m-%Y', '%d-%m-%Y %H:%M:%S']
Alternative Solutions
Alternatively, I think it could be practical for most typical use-cases to give groups of dateformats. So instead of having to supply a parameter for each individual column, like this:
It could be changed so you have to supply groups like this, which is less code and more reflecting the actual situation:
Additional Context
See code examples below for typical csv files with date values (it is all randomly generated test data)
The text was updated successfully, but these errors were encountered: