Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: Raise the correct error for to_datetime() #23969

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1313,12 +1313,14 @@ Datetimelike
- Bug in :class:`DatetimeIndex` where calling ``np.array(dtindex, dtype=object)`` would incorrectly return an array of ``long`` objects (:issue:`23524`)
- Bug in :class:`Index` where passing a timezone-aware :class:`DatetimeIndex` and `dtype=object` would incorrectly raise a ``ValueError`` (:issue:`23524`)
- Bug in :class:`Index` where calling ``np.array(dtindex, dtype=object)`` on a timezone-naive :class:`DatetimeIndex` would return an array of ``datetime`` objects instead of :class:`Timestamp` objects, potentially losing nanosecond portions of the timestamps (:issue:`23524`)
- Bug in :func:`to_datetime` which would raise an (incorrect) ``ValueError`` when called with a date far into the future and the ``format`` argument specified instead of raising ``OutOfBoundsDatetime`` (:issue:`23830`)
- Bug in :class:`Categorical.__setitem__` not allowing setting with another ``Categorical`` when both are undordered and have the same categories, but in a different order (:issue:`24142`)
- Bug in :func:`date_range` where using dates with millisecond resolution or higher could return incorrect values or the wrong number of values in the index (:issue:`24110`)
- Bug in :class:`DatetimeIndex` where constructing a :class:`DatetimeIndex` from a :class:`Categorical` or :class:`CategoricalIndex` would incorrectly drop timezone information (:issue:`18664`)
- Bug in :class:`DatetimeIndex` and :class:`TimedeltaIndex` where indexing with ``Ellipsis`` would incorrectly lose the index's ``freq`` attribute (:issue:`21282`)
- Clarified error message produced when passing an incorrect ``freq`` argument to :class:`DatetimeIndex` with ``NaT`` as the first entry in the passed data (:issue:`11587`)


Timedelta
^^^^^^^^^
- Bug in :class:`DataFrame` with ``timedelta64[ns]`` dtype division by ``Timedelta``-like scalar incorrectly returning ``timedelta64[ns]`` dtype instead of ``float64`` dtype (:issue:`20088`, :issue:`22163`)
Expand Down
12 changes: 4 additions & 8 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -599,11 +599,9 @@ cpdef array_to_datetime(ndarray[object] values, str errors='raise',
iresult[i] = NPY_NAT
continue
elif is_raise:
raise ValueError("time data {val} doesn't "
"match format specified"
.format(val=val))
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think think this case is actually an OOB error, rather a parsing error.

return values, tz_out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add extra whitespace

try:
py_dt = parse_datetime_string(val,
dayfirst=dayfirst,
Expand Down Expand Up @@ -661,7 +659,7 @@ cpdef array_to_datetime(ndarray[object] values, str errors='raise',
else:
raise TypeError("{typ} is not convertible to datetime"
.format(typ=type(val)))

except OutOfBoundsDatetime:
if is_coerce:
iresult[i] = NPY_NAT
Expand All @@ -671,9 +669,7 @@ cpdef array_to_datetime(ndarray[object] values, str errors='raise',
# dateutil parser will return incorrect result because
# it will ignore nanoseconds
if is_raise:
raise ValueError("time data {val} doesn't "
"match format specified"
.format(val=val))
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is legit. can you add a comment here about this, e.g. its OOB (even though its just a few lines above)

assert is_ignore
return values, tz_out
raise
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/indexes/datetimes/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -960,6 +960,17 @@ def test_to_datetime_barely_out_of_bounds(self):
with pytest.raises(OutOfBoundsDatetime):
to_datetime(arr)

def test_to_datetime_out_of_bounds_with_format_arg(self):
# GH#23830 raise the correct exception when
# the format argument is passed.
msg = r"Out of bounds nanosecond timestamp"
with pytest.raises(
OutOfBoundsDatetime,
match=msg
):
to_datetime("2417-10-27 00:00:00",
format="%Y-%m-%d %H:%M:%S")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this out as a separate test and reference the issue number.

Also, check the error here with the match parameter in pytest.raises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I'll move it out. However, my main issue here was that this test would pass locally but fail on the CI build (https://travis-ci.org/pandas-dev/pandas/jobs/460708541). My best guess was that it was due to tslib.pyx not being recompiled. Any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your build may not have had this added test, as I'm relatively certain that your change above was not correct.

@pytest.mark.parametrize('cache', [True, False])
def test_to_datetime_iso8601(self, cache):
result = to_datetime(["2012-01-01 00:00:00"], cache=cache)
Expand Down