BUG: to_datetime issue parsing non-zero padded month in 0.17.1 #11871

Closed
dpinte opened this Issue Dec 20, 2015 · 5 comments

Comments

Projects
None yet
4 participants

dpinte commented Dec 20, 2015

In pandas 0.16.2, the following date (non-zero padded month) was parsing correctly:

>>> import pandas
>>> pandas.__version__
'0.16.2'
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d')
Timestamp('2005-01-13 00:00:00')

With 0.17.1, it raises a ValueError:

>>> import pandas
>>> pandas.__version__
u'0.17.1'
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/util/decorators.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 276, in to_datetime
    unit=unit, infer_datetime_format=infer_datetime_format)
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 397, in _to_datetime
    return _convert_listlike(np.array([ arg ]), box, format)[0]
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 383, in _convert_listlike
    raise e
ValueError: time data '2005-1-13' does match format specified

Even if %m is supposed to be used for zero-padded month definitions, Python's strptime function parses them properly.

Is this a known issue?

dpinte commented Dec 20, 2015

It sounds like the following works :

>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d', infer_datetime_format=True)
Timestamp('2005-01-13 00:00:00')

This could be related to #11142 and considered as a regression. Having to guess the datetime_format when the given format is the appropriate one is overkilll:

>>> from pandas.tseries import tools
>>> tools._guess_datetime_format('2005-1-13')
'%Y-%m-%d'
Contributor

chris-b1 commented Dec 21, 2015

This PR (conveniently also mine) is a more likely cause for the problem - I'll take a look later.
pydata#10615

Contributor

chris-b1 commented Dec 22, 2015

This happens because there is a special fastpath (in C) for iso8601 formatted dates, but that code doesn't handle dates without leading 0s. As a workaround, you can just not specify the format -

To fix this, probably either need to:

  1. Let fastpath code fall back to the regular parser. This code is already pretty complex, and this would just make it more so.
  2. Update C code to handle dates without leadings 0s. Not sure if this can be done in a performance neutral way?

jorisvandenbossche added this to the 0.18.0 milestone Dec 22, 2015

dpinte commented Dec 29, 2015

@chris-b1 The second option is definitely the best one as it would keep the behaviour closer to the standard behaviour of strptime. Even if it is not performance neutral, it should not add a serious overhead to support no leading-zero's in the C code.

Contributor

jreback commented Dec 29, 2015

yes, more flexibility is good here. BTW this is quite straightforward to do as this is pretty straightforward c-code.

jreback changed the title from `to_datetime` issue parsing non-zero padded month in 0.17.1 to BUG: to_datetime issue parsing non-zero padded month in 0.17.1 Dec 29, 2015

jreback closed this in 5de6b84 Jan 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment