string to date format ignored on apply #3669

Closed
hayd opened this Issue May 21, 2013 · 7 comments

Comments

Projects
None yet
2 participants
Contributor

hayd commented May 21, 2013

From the So question.

I think apply be passing on the format keyword argument:

In [1]: s = pd.Series(['12/1/2012', '30/01/2012'])

In [2]: s.apply(pd.to_datetime, format='%d/%m/%Y')
Out[2]:
0   2012-12-01 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]

In [3]: pd.to_datetime(s, format='%d/%m/%Y')
Out[3]:
0   2012-01-12 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]

Seems only to be the case for series, not for dataframes:

>>> import pandas as pd
>>> pd.__version__
'0.11.0'
>>> s = pd.Series(['12/1/2012', '30/01/2012'])
>>> s.apply(pd.to_datetime, format='%d/%m/%Y')
0   2012-12-01 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]
>>> df = pd.DataFrame(s)
>>> df
            0
0   12/1/2012
1  30/01/2012
>>> df.apply(pd.to_datetime, format='%d/%m/%Y')
                    0
0 2012-01-12 00:00:00
1 2012-01-30 00:00:00
>>> df[0].apply(pd.to_datetime, format='%d/%m/%Y')
0   2012-12-01 00:00:00
1   2012-01-30 00:00:00
Name: 0, dtype: datetime64[ns]
>>> df[[0]].apply(pd.to_datetime, format='%d/%m/%Y')
                    0
0 2012-01-12 00:00:00
1 2012-01-30 00:00:00
Contributor

hayd commented May 21, 2013

I'm thinking maybe this has something to do with dayfirst, perhaps it should default to None and we should check it (or is this an external, I'm sure I have looked into this/similar before). It seems to interfere here:

In [21]: pd.to_datetime(s[0], format='%d/%m/%Y')
Out[21]: datetime.datetime(2012, 12, 1, 0, 0)

In [22]: pd.to_datetime(s[0], format='%d/%m/%Y', dayfirst=True)
Out[22]: datetime.datetime(2012, 1, 12, 0, 0)

Not cool.

As you see in the example you gave, there is also a difference between a string and a series:

>>> pd.to_datetime(s, format='%d/%m/%Y')
0   2012-01-12 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]
>>> pd.to_datetime(s[0], format='%d/%m/%Y')
datetime.datetime(2012, 12, 1, 0, 0)

I looked in the code, and I can overlook something totally, but it seems that the format argument is not used when dealing with a string: https://github.com/pydata/pandas/blob/master/pandas/tseries/tools.py#L135, which could explain the behaviour when adding dayfirst.

Could that also be the reason that the s.apply(pd.to_datetime, format='%d/%m/%Y') from the original question does not work?

  • apply on Series -> individual values of series feeded to function -> strings -> format not used (dateutil.parse)
  • apply on DataFrame (column) -> Series feeded to function -> format is used (tslib.array_strptime(arg, format))

This can be closed I think. Solved by #3890

Contributor

hayd commented Jul 6, 2013

@jorisvandenbossche no, don't think so, master still shows same behaviour as first post.

Contributor

hayd commented Jul 6, 2013

@jorisvandenbossche I'm talking nonsense! You're right!

@hayd hayd closed this Jul 6, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment