You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pandas (>= 2.0) will infer the datetime format from the first non-missing example (%m%d%Y), try to apply this type to all the series, fail on 13-02-2000, and raise an error (before version 2.0, this would silently create a mixed type). I wish pandas could infer the right format from such a series, where only one format works for all rows.
Feature Description
Pseudo code
If using dayfirst=True and dayfirst=Falsedon't give the same format for guess_datetime_format on the first non missing example (i.e both works):
Try both formats on the Series (probably on a random subset for speed).
If one works for all rows, return this format.
If both work, trust the dayfirst parameter (and maybe raise a warning).
If none work and error="raise", raise an error. If errors = "coerce" or errors="ignore", one could either trust the dayfirst parameter, or see which of dayfirst value leads to the smallest number of non-parsed values.
Implementation
Change function _guess_datetime_format_for_array (in pandas.core.tools.datetimes) so that it tries both dayfirst=True and dayfirst=False on the first non-null example. In the same function, if both options give a different format, try array_strptime with both format on a random subset of the array (100?) with strict error, and check that one of the tries doesn't fail.
Alternative Solutions
I don't know.
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
If you run
pd.to_datetime
on the following Series:pandas (>= 2.0) will infer the datetime format from the first non-missing example (%m%d%Y), try to apply this type to all the series, fail on
13-02-2000
, and raise an error (before version 2.0, this would silently create a mixed type). I wish pandas could infer the right format from such a series, where only one format works for all rows.Feature Description
Pseudo code
If using
dayfirst=True
anddayfirst=False
don't give the same format forguess_datetime_format
on the first non missing example (i.e both works):Try both formats on the Series (probably on a random subset for speed).
If one works for all rows, return this format.
If both work, trust the dayfirst parameter (and maybe raise a warning).
If none work and
error="raise"
, raise an error. Iferrors = "coerce"
orerrors="ignore"
, one could either trust thedayfirst
parameter, or see which ofdayfirst
value leads to the smallest number of non-parsed values.Implementation
Change function
_guess_datetime_format_for_array
(inpandas.core.tools.datetimes
) so that it tries bothdayfirst=True
anddayfirst=False
on the first non-null example. In the same function, if both options give a different format, tryarray_strptime
with both format on a random subset of the array (100?) with strict error, and check that one of the tries doesn't fail.Alternative Solutions
I don't know.
Additional Context
No response
The text was updated successfully, but these errors were encountered: