Skip to content

Commit

Permalink
BUG: fix TypeError for invalid integer dates %Y%m%d with errors='igno…
Browse files Browse the repository at this point in the history
…re' (# GH 26583)

array_strptime returned TypeError when trying to slice 'too long' integer for the given format %Y%m%d (for example 2121010101).
After parsing date in the first 8 symbols it tried to return the remaining symbols in ValueError message as a slice of integer which in turn caused TypeError.
Converted to string value is now used to make slice for that ValueError message.
In case of 20209911, it tried to parse 20209911 to datetime(2020, 9, 9) and had 11 unparsed.
  • Loading branch information
nathalier committed Jun 1, 2019
1 parent 8154efb commit d93d835
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 3 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,7 @@ Datetimelike
- Bug in :class:`Series` and :class:`DataFrame` repr where ``np.datetime64('NaT')`` and ``np.timedelta64('NaT')`` with ``dtype=object`` would be represented as ``NaN`` (:issue:`25445`)
- Bug in :func:`to_datetime` which does not replace the invalid argument with ``NaT`` when error is set to coerce (:issue:`26122`)
- Bug in adding :class:`DateOffset` with nonzero month to :class:`DatetimeIndex` would raise ``ValueError`` (:issue:`26258`)
- Bug in :func:`to_datetime` which raises ``TypeError`` for ``format='%Y%m%d'`` when called for invalid integer dates with length >= 6 digits with ``errors='ignore'``

Timedelta
^^^^^^^^^
Expand Down
6 changes: 3 additions & 3 deletions pandas/_libs/tslibs/strptime.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -140,13 +140,13 @@ def array_strptime(object[:] values, object fmt,
iresult[i] = NPY_NAT
continue
raise ValueError("time data %r does not match "
"format %r (match)" % (values[i], fmt))
"format %r (match)" % (val, fmt))
if len(val) != found.end():
if is_coerce:
iresult[i] = NPY_NAT
continue
raise ValueError("unconverted data remains: %s" %
values[i][found.end():])
val[found.end():])

# search
else:
Expand All @@ -156,7 +156,7 @@ def array_strptime(object[:] values, object fmt,
iresult[i] = NPY_NAT
continue
raise ValueError("time data %r does not match format "
"%r (search)" % (values[i], fmt))
"%r (search)" % (val, fmt))

iso_year = -1
year = 1900
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/indexes/datetimes/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,25 @@ def test_to_datetime_format_integer(self, cache):
result = to_datetime(s, format='%Y%m', cache=cache)
assert_series_equal(result, expected)

@pytest.mark.parametrize('int_date, expected', [
# valid date, length == 8
[20121030, datetime(2012, 10, 30)],
# short valid date, length == 6
[199934, datetime(1999, 3, 4)],
# long integer date partially parsed to datetime(2012,1,1), length > 8
[2012010101, 2012010101],
# invalid date partially parsed to datetime(2012,9,9), length == 8
[20129930, 20129930],
# short integer date partially parsed to datetime(2012,9,9), length < 8
[2012993, 2012993],
# short invalid date, length == 4
[2121, 2121]])
def test_int_to_datetime_format_YYYYMMDD_typeerror(self, int_date,
expected):
# GH 26583
result = to_datetime(int_date, format='%Y%m%d', errors='ignore')
assert result == expected

@pytest.mark.parametrize('cache', [True, False])
def test_to_datetime_format_microsecond(self, cache):

Expand Down

0 comments on commit d93d835

Please sign in to comment.