Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/ENH: to_datetime dayfirst ought to be strict #3341

Closed
hayd opened this issue Apr 13, 2013 · 15 comments
Closed

BUG/ENH: to_datetime dayfirst ought to be strict #3341

hayd opened this issue Apr 13, 2013 · 15 comments

Comments

@hayd
Copy link
Contributor

hayd commented Apr 13, 2013

xref #12585

I think this ought to raise a ValueErrror:

In [1]: pd.to_datetime('01-13-2012', dayfirst=True)
Out[1]: datetime.datetime(2012, 1, 13, 0, 0)

This means if my data is bad (i.e. somehow I have a mixture of american and british string-dates), I won't be told there was an issue when parsing! I can't think of a usecase where this kind of precedence would be appropriate, and it means you'll end up with incorrect dates.

The precedence behaviour appears to be a feature from dateutil (which tslib.array_to_datetime calls):

In [2]: from dateutil.parser import parse as parse_date

In [3]: parse_date('1/12/2012', dayfirst=True)
Out[3]: datetime.datetime(2012, 12, 1, 0, 0)

In [4]: parse_date('1/13/2012', dayfirst=True)
Out[4]: datetime.datetime(2012, 1, 13, 0, 0)

It should raising like this:

In [5]: parse_date('14/14/2012', dayfirst=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-ca14fc16e421> in <module>()
----> 1 parse_date('14/14/2012', dayfirst=True)

/Library/Python/2.7/site-packages/dateutil/parser.pyc in parse(timestr, parserinfo, **kwargs)
    718         return parser(parserinfo).parse(timestr, **kwargs)
    719     else:
--> 720         return DEFAULTPARSER.parse(timestr, **kwargs)
    721
    722

/Library/Python/2.7/site-packages/dateutil/parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    315             if value is not None:
    316                 repl[attr] = value
--> 317         ret = default.replace(**repl)
    318         if res.weekday is not None and not res.day:
    319             ret = ret+relativedelta.relativedelta(weekday=res.weekday)

ValueError: month must be in 1..12

... :(

@hayd
Copy link
Contributor Author

hayd commented Apr 13, 2013

Another usecase, is if you call to_datetime (without setting dayfirst) on a list of british date-strings, atm it parses without error but does so "incorrectly" (using a mixture of dayfirst and not, depending). Which is confusing.

@ghost
Copy link

ghost commented Apr 13, 2013

I'm shocked that dateutil does that.

@hayd
Copy link
Contributor Author

hayd commented Jun 14, 2013

cc @jreback.

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

I think we ought to drop this parameter in favor of just using format; dateutil just does the wrong thing. as an alternative, could just have dayfirst imply format="%d/%m/%Y"....or some such (but not nearly as flexible)

until/unless we actually do more date parsing internally

thoughts?

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

but but but dayfirst is the best argument ever, it's very very useful for munging (it's great if you don't know the format or worse the format is mixed (!)).

It being not strict is sad, but not having it at all would be much worse. please please can we keep it!

I've looked before where in their code it's not being strict, but it's pretty messy and not sure how actively maintained it is...

I'm sure @wesm suggested another (C?) library fairly recently but can't find the issue.

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

how about a big warning in the docstring/docs? that you really ought to specify format instead?

I see the warning you have, but EVEN bigger ? (and then in docstrings too), http://pandas.pydata.org/pandas-docs/dev/timeseries.html#converting-to-timestamps

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

+1 on that, definitely users should know what they are letting themselves in for.

@jreback
Copy link
Contributor

jreback commented Jul 9, 2013

alright...so let's just make this a doc enhancement for now?

@hayd
Copy link
Contributor Author

hayd commented Jul 9, 2013

How's about (pr coming):

dayfirst : boolean, default False
    If True parses dates with the day first, eg 20/01/2005
    Warning: dayfirst=True is not strict, but will prefer to parse
    with day first (this is a known bug).

@jankatins
Copy link
Contributor

Switching from #11725, where this happened, e.g. daysfirst=False would parse month as days

>>> pd.to_datetime(["29.01.1945","1.3.1945", "02.03.1945"])
DatetimeIndex(['1945-01-29', '1945-01-03', '1945-02-03'], dtype='datetime64[ns]', freq=None)

right now, the doc only describes the problem for the other way:

Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).

As dayfirst is False, this makes no mention that the other way can also happen

@jorisvandenbossche
Copy link
Member

@JanSchulz Indeed .. Maybe also worth mentioning that if you want strict behaviour, you can use format=... if possible?

@jorisvandenbossche
Copy link
Member

I opened an issue for this at the dateutil tracker, as I couldn't find one: dateutil/dateutil#214

@WillAyd
Copy link
Member

WillAyd commented Jul 6, 2018

Not a pandas bug and documented accordingly

@WillAyd WillAyd closed this as completed Jul 6, 2018
@WillAyd WillAyd added the Docs label Jul 6, 2018
@WillAyd WillAyd modified the milestones: Someday, No action Jul 6, 2018
@jbrockmendel
Copy link
Member

jbrockmendel commented Jul 9, 2018 via email

@WillAyd
Copy link
Member

WillAyd commented Jul 9, 2018

@jbrockmendel I closed because there's nothing technically to be done on the pandas side and this is arguably duplicative of #12585. Feel free to reopen if you think it's this issue in particular adds value to tracking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants