ENH: raise OverflowError on datetime64/timedelta64 arithmetic overflow#31378
Conversation
Addition, subtraction, and integer multiplication of datetime64 and timedelta64 arrays previously wrapped silently on signed-integer overflow, often producing values indistinguishable from NaT. The affected ufunc loops now use the existing safe_add/safe_sub/safe_mul helpers from npy_extint128.h and raise OverflowError instead, matching the behavior of unit-conversion casts (numpyGH-31085). Results that land exactly on NPY_DATETIME_NAT (== INT64_MIN) are also treated as overflow, since they would otherwise be misinterpreted as NaT by downstream code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| with pytest.raises(OverflowError, match="Overflow"): | ||
| arr + np.timedelta64(2, "s") | ||
| with pytest.raises(OverflowError, match="Overflow"): | ||
| arr - np.timedelta64(2, "s") |
There was a problem hiding this comment.
Maybe it is a nit, but this test is quite long, and doing a lot of things. It may make sense to wait for feedback from NumPy devs on the C code first, but my intuition is that it might be nice to split this test into separate tests a bit.
For example, typing out with pytest.raises(OverflowError, match="Overflow") 10 separate times for various permutations of binary operations might be split out to a parametrized test with i.e., (x, y, operator).
Splits the two large overflow tests into five tests, each asserting one property: overflow-raises (add/sub), overflow-raises (multiply), result-equals-NaT corner case, NaT propagation, and valid-boundary regression guard. Addresses reviewer feedback on numpyGH-31378. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ngoldbaum
left a comment
There was a problem hiding this comment.
Overall looks good. A few comments inline.
| const npy_int64 result = safe_add(in1, in2, &overflow); | ||
| if (overflow || result == NPY_DATETIME_NAT) { | ||
| npy_gil_error(PyExc_OverflowError, | ||
| "Overflow in datetime64 addition"); |
There was a problem hiding this comment.
minor nit: here and elsewhere where you generate overflow errors, it would be nicer to include the type information context you have, so here you could say "Overflow in datetime64 + timedelta64 addition".
|
|
||
| small = np.timedelta64(3, "s") | ||
| assert small * np.int64(7) == np.timedelta64(21, "s") | ||
| assert np.int64(7) * small == np.timedelta64(21, "s") |
There was a problem hiding this comment.
Can you generate a C coverage report and scrutinize the C code you changed in this PR to make sure all code paths are covered, including the new error paths? You can generate a C coverage report with spin test --gcov --gcov-format=html if you have gcov installed. You can probably ask Claude to help you run the coverage tests and fill in coverage gaps. I asked for a review of this code and got the following comments, I didn't actually verify any of the claims from Claude:
Details
- safe_add overflow that does not land on NaT — none of the current add tests isolate this branch. Add e.g.:
INT64_MAX + 2 wraps to INT64_MIN + 1 (valid timedelta), so only safe_add can catch it
with pytest.raises(OverflowError, match="Overflow"):
np.timedelta64(big, "s") + np.timedelta64(2, "s")
- (Note: np.datetime64(big, "s") - dt_neg already isolates safe_sub — that test wraps to -3, not NaT — so subtract is fine.)
- Same for safe_mul — every multiply overflow test lands on NaT. Add:
with pytest.raises(OverflowError, match="Overflow"):
np.timedelta64(big, "s") * np.int64(2) - Negative-side overflow in add (a < 0 && b < NPY_MIN_INT64 - a branch of safe_add):
with pytest.raises(OverflowError, match="Overflow"):
np.timedelta64(-big, "s") + np.timedelta64(-2, "s") - Negative multiplier and INT64_MIN multiplier:
with pytest.raises(OverflowError, match="Overflow"):
np.timedelta64(big, "s") * np.int64(-2)
with pytest.raises(OverflowError, match="Overflow"):
np.timedelta64(2, "s") * np.int64(np.iinfo(np.int64).min) - NaT short-circuit beats overflow check — a regression guard:
assert np.isnat(np.datetime64('NaT', 's') + np.timedelta64(big, 's')) - Result-equals-NaT in the array path. Scalar coverage is good, but the strided loop should also raise, not wrap:
arr = np.array([0, -big], dtype="timedelta64[s]")
with pytest.raises(OverflowError, match="Overflow"):
arr + np.timedelta64(-1, "s") - Asymmetric array coverage. Array tests currently exercise only mm_m_add, mm_m_subtract, mq_m_multiply. Five of the eight loops have no
array-path coverage:
- DATETIME_Mm_M_add (arr_M + arr_m)
- DATETIME_mM_M_add (arr_m + arr_M)
- DATETIME_Mm_M_subtract (arr_M - arr_m)
- DATETIME_MM_m_subtract (arr_M - arr_M)
- TIMEDELTA_qm_m_multiply (int_arr * arr_m)
The comment claims the array case "exercises the actual strided ufunc loop," but that's only true for the loops that are exercised. At least one
array test per loop would close the gap.
Nothing here is a correctness blocker — the implementation is solid. The added-test items above mostly tighten the safety net rather than expose
bugs I can see by inspection.
|
Sorry, a fly-by question (the answer is probably just "nah"). Have you considered whether we should just return |
Short answer: nah. Longer answer: i could imagine this if either a) we had a second sentinel analogous to "inf" or b) it was behind a np.errstate flag. |
Each of the eight datetime64/timedelta64 add/sub/multiply loops now names both operand types in its OverflowError message (e.g. "datetime64 + timedelta64 addition" rather than the ambiguous "datetime64 addition" that previously appeared in two different loops). Also reword the section-header comment so it clearly applies to every overflow-checked loop below it, not only the next function. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the coverage added in the previous test split: * Isolate the safe_add and safe_mul bounds checks from the result == NPY_DATETIME_NAT short-circuit by adding overflow cases whose wrapped result is *not* INT64_MIN. * Cover the negative-side branch of safe_add (a < 0 && b < INT64_MIN - a) and the negative-multiplier / INT64_MIN-multiplier branches of safe_mul. * Add an array-path case for every signature so the strided ufunc kernel is exercised for Mm_M_add, mM_M_add, Mm_M_subtract, MM_m_subtract, and qm_m_multiply (which previously only had scalar coverage), plus an array case for the result-equals-NaT check. * Add a regression guard that the NaT short-circuit beats the overflow check, so a NaT operand combined with an otherwise-overflowing value still propagates NaT instead of raising. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OK, just wanted to bring it up in case you/others had thoughts. We can do that, but unless we hook it into the existing floats it doesn't matter (not great). So if the default would be an error (which seems fine to me, under the assumption that is a really weird niche thing, though) then this is good. (We can do that, but it's involved.) |
|
FWIW raising on overflow has been the pandas behavior for a while and we've never gotten any complaints about it. |
seberg
left a comment
There was a problem hiding this comment.
Since nobody voiced concerns about raising overflows more, let's get this in, thanks.
(I wonder slightly if we should make the NaT handling post-processing now to avoid the NaT checks on the fast-path, but only a potential follow up.)
Claude wrote this, I reviewed it manually. cc @seberg