BUG: NaT now casts to NaN float64 #11890

tylerjereddy · 2018-09-05T22:57:20Z

I think I used the "older" array iteration API if I'm reading the docs correctly; seems like this is still used in source though.

numpy/core/src/multiarray/methods.c

eric-wieser · 2018-09-06T07:37:56Z

numpy/core/src/multiarray/methods.c

+
+        /* confine the NaT to NaN substitution checks by type cast */
+        if ((PyArray_DESCR(self)->type_num == NPY_TIMEDELTA) &&
+            (PyArray_DESCR(ret)->type_num == NPY_FLOAT64)) {


Ought to handle other sizes of float too

tylerjereddy · 2018-09-06T18:22:21Z

I tried to revise accordingly. Some notes / thoughts:

DECREF: not sure if it was needed, but adding it doesn't seem to cause test failures or build warnings for me locally
expanding to all FLOAT types: I'm not sure if you actually meant all of them, even NPY_FLOAT256 & so on, but that's what I've done here, albeit protecting with preprocessor directives in case the system doesn't have those defined; I've adjusted the unit test to iterate over the IEEE standard 16, 32, and 64 -- is there a clean way to get pytest to conditionally iterate over the "less standard" ones that NumPy defines if they're available on the system? -- also, is the value of NaN well-established / defined for those more unusual float types?

eric-wieser · 2018-09-06T18:48:02Z

numpy/core/src/multiarray/methods.c

+            (PyArray_DESCR(ret)->type_num == NPY_FLOAT128) ||
+#endif
+#ifdef NPY_FLOAT256
+            (PyArray_DESCR(ret)->type_num == NPY_FLOAT256)


Just use NPY_HALF, NPY_FLOAT, NPY_DOUBLE, NPY_LONGDOUBLE here, or whatever the names really are. Never used the sized types when writing type-generic code.

Is it safe to dispense with the preprocessor guards if I do that?

Yes, I think it is - I think long double is always defined, even if its just the same size as double.

tylerjereddy · 2018-09-07T04:26:11Z

I updated the type specifications and tried removing the preprocessor guards -- that works locally anyway.

eric-wieser · 2018-09-07T04:32:28Z

numpy/core/src/multiarray/methods.c

+            while (self_iter->index < self_iter->size) {
+               /* GETITEM will retrieve Py_None for NaT */
+               item = PyArray_GETITEM(self, PyArray_ITER_DATA(self_iter));
+               Py_XDECREF(item);


This ought to be after you're done with it, not before (although strictly it doesn't matter)

numpy/core/src/multiarray/methods.c

tylerjereddy · 2018-09-07T16:33:35Z

Revised to move the decref, but I may need some guidance on the NULL comment.

tylerjereddy · 2018-09-25T17:41:39Z

May need clarification re: revision requested by @eric-wieser

tylerjereddy · 2018-10-06T21:25:58Z

Rebased & ping @shoyer for guidance

shoyer · 2018-10-09T21:10:06Z

I have two concerns here:

using the C Python API for this could make the inner loop here really slow. I wouldn't be surprised if something like np.where(np.isnat(source), np.nan, dest) is actually faster, just due to doing the loops at a lower level. It would be interesting to see a few micro-benchmark results.
this is a strange place to be putting this code -- ideally we would not be adding special cases for datetime64 in core numpy functions. At the least, we should split out this addition into a helper function.

tylerjereddy · 2018-10-14T04:01:29Z

Revised based on feedback from @shoyer -- I migrated the casting code right down to the inner type-type conversion loops that are code generated in "C++ style."

tylerjereddy · 2018-10-14T04:03:34Z

numpy/core/src/multiarray/arraytypes.c.src

@@ -1086,6 +1086,95 @@ TIMEDELTA_setitem(PyObject *op, void *ov, void *vap)

 /* Assumes contiguous, and aligned, from and to */

+/* special case TIMEDELTA NPY_DATETIME_NAT handling */


Note that a large amount of code duplication is introduced because of this specialization, but I think the choice is always going to be between highly-specialized loops for type-type conversion & more general loops where bystander type conversions suffer a performance hit from NaT checking.

numpy/core/src/multiarray/arraytypes.c.src

eric-wieser · 2018-10-14T04:51:03Z

numpy/core/src/multiarray/arraytypes.c.src

+            }
+            else if (*op == (npy_float)(*op)) {
+                *op = (float)NPY_NAN;
+            }


What are you trying to do with these ifs? You shouldn't ever read op, only write to it.

A form of type checking; I think I had issues using the template @type@ style things in comparison operators for some reason.

Is there a nice / clean way to inspect types on inner loops in our template language? I guess something involving PyArray_DESCR or similar might be cooked up, although the "ugly" way seems to work.

We could do with a better way of attaching traits to types at loop-expansion time, for sure.

One approach we've used elsewhere is just to introduce another parallel variable containing 1/0 for each branch.

Another approach I'd like to investigate, but we have no precedent for, is to use something like:

#define float64_is_float64

And then in the loop use

# ifdef @type@_is_float64 ... #endif

The highest performing approach is probably just to hard code the loops by type so that no check is needed at all, as alluded to above. Just means more code.

For now - can't you just write a separate loop for the non-floating point cases? I don't think that you need to distinguish between the different types of NaN anyway - C will cast for you just fine, although perhaps with a warning that makes travis unhappy.

Yeah, I think that's the debate I was having with myself in the comments above.

Right - but you were looking at it as a performance optimization, whereas in fact its required for correctness.

ahaldane · 2018-10-15T00:00:40Z

I was having a debate with myself in #8325 over whether we should even allow float <-> datetime casts at all.

But I guess after reading this issue, and the example of converting to float and dividing by 356 to get years, maybe it's useful.

Then it does seem like we should separate it out into a separate loop as suggested above, and strictly check that the float value is in the range MAX_INT64 to MIN_INT64 and set it to NaT if not.

tylerjereddy · 2018-10-15T19:20:30Z

Revised based on feedback to:

use separate inner loops for floating vs. non-floating casts
try to improve patch coverage by testing for float->m8 typecasts as well
added a test that probes i.e., MAX_INT64 limits as discussed above, but this already passes unless I'm misunderstanding--perhaps because there's a fair bit of overflow guard code littered throughout datetime64 infrastructure (albeit with various attached TODOs)

One other thought--I think we're limited to m8 (timedelta64) in this PR--not sure if we want to expand to datetime64 (can't remember if I had a good reason not to include that; maybe just to focus scope or something).

tylerjereddy · 2018-10-15T19:25:21Z

numpy/core/tests/test_datetime.py

+    ])
+    def test_float_cast_limits(self, float_type, cast_type):
+        arr = np.array([[np.iinfo(np.int64).max + 1],
+                        [np.iinfo(np.int64).min - 1]], dtype=float_type)


This test feels a bit tangential to my changes in this PR -- maybe I need an example for where usage of those max integer limits currently causes undesirable behavior in master or in my feature branch.

tylerjereddy · 2018-10-15T20:06:24Z

Architecture specific test failure -- interesting! Looks like the max integer related test.

charris · 2022-06-09T15:53:40Z

@tylerjereddy Do you want to pursue this? I suppose it is reasonable to ask why one would convert a datetime type to float when the units will disappear. Needs a rebase in any case.

tylerjereddy · 2022-06-10T23:48:52Z

@charris this one may be less exciting but I don't mind trying to wrap it up once the current SciPy 1.9.0 release candidates start moving out.

tylerjereddy · 2022-12-10T21:08:11Z

Let me close this, priority may not be high, and I should perhaps focus on smaller things on the NumPy side for now.

tylerjereddy added 00 - Bug component: numpy.datetime64 (and timedelta64) labels Sep 5, 2018

eric-wieser reviewed Sep 6, 2018

View reviewed changes

numpy/core/src/multiarray/methods.c Outdated Show resolved Hide resolved

eric-wieser reviewed Sep 6, 2018

View reviewed changes

tylerjereddy force-pushed the issue_8449 branch from 3ad7b75 to 91a1601 Compare September 6, 2018 18:16

eric-wieser reviewed Sep 6, 2018

View reviewed changes

tylerjereddy force-pushed the issue_8449 branch from 91a1601 to d1a727e Compare September 7, 2018 04:25

eric-wieser reviewed Sep 7, 2018

View reviewed changes

numpy/core/src/multiarray/methods.c Outdated Show resolved Hide resolved

tylerjereddy force-pushed the issue_8449 branch from d1a727e to 3c8afb0 Compare September 7, 2018 16:29

tylerjereddy mentioned this pull request Oct 6, 2018

remainder is not implemented for timedelta64 #12092

Closed

tylerjereddy force-pushed the issue_8449 branch from 3c8afb0 to 4726163 Compare October 6, 2018 21:24

tylerjereddy force-pushed the issue_8449 branch from 4726163 to 778d254 Compare October 14, 2018 03:59

tylerjereddy commented Oct 14, 2018

View reviewed changes

numpy/core/src/multiarray/arraytypes.c.src Show resolved Hide resolved

eric-wieser reviewed Oct 14, 2018

View reviewed changes

tylerjereddy mentioned this pull request Oct 14, 2018

np.nan incorrectly casted into datetime on powerpc, leading to failing tests of pandas #8325

Closed

tylerjereddy added 3 commits October 15, 2018 10:48

BUG: NaT now casts to NaN for floats.

e8734dc

MAINT: moved NaT->NaN cast to inner loops

e666d98

MAINT: address recent round of revisions

b8f6c75

tylerjereddy force-pushed the issue_8449 branch from 778d254 to b8f6c75 Compare October 15, 2018 19:11

tylerjereddy commented Oct 15, 2018

View reviewed changes

tylerjereddy mentioned this pull request Jan 25, 2019

numpy.distutils.exec_command is not thread safe on windows #7607

Closed

Base automatically changed from master to main March 4, 2021 02:04

tylerjereddy closed this Dec 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: NaT now casts to NaN float64 #11890

BUG: NaT now casts to NaN float64 #11890

tylerjereddy commented Sep 5, 2018

eric-wieser Sep 6, 2018

tylerjereddy commented Sep 6, 2018

eric-wieser Sep 6, 2018

tylerjereddy Sep 6, 2018

eric-wieser Sep 7, 2018

tylerjereddy commented Sep 7, 2018

eric-wieser Sep 7, 2018

tylerjereddy commented Sep 7, 2018

tylerjereddy commented Sep 25, 2018

tylerjereddy commented Oct 6, 2018

shoyer commented Oct 9, 2018 •

edited

tylerjereddy commented Oct 14, 2018

tylerjereddy Oct 14, 2018

eric-wieser Oct 14, 2018

tylerjereddy Oct 14, 2018

tylerjereddy Oct 14, 2018 •

edited

eric-wieser Oct 14, 2018

tylerjereddy Oct 14, 2018

eric-wieser Oct 14, 2018

tylerjereddy Oct 14, 2018

eric-wieser Oct 14, 2018

ahaldane commented Oct 15, 2018

tylerjereddy commented Oct 15, 2018

tylerjereddy Oct 15, 2018

tylerjereddy commented Oct 15, 2018

charris commented Jun 9, 2022

tylerjereddy commented Jun 10, 2022

tylerjereddy commented Dec 10, 2022

		@@ -1086,6 +1086,95 @@ TIMEDELTA_setitem(PyObject op, void ov, void *vap)

		/* Assumes contiguous, and aligned, from and to */

		/* special case TIMEDELTA NPY_DATETIME_NAT handling */

BUG: NaT now casts to NaN float64 #11890

BUG: NaT now casts to NaN float64 #11890

Conversation

tylerjereddy commented Sep 5, 2018

Choose a reason for hiding this comment

tylerjereddy commented Sep 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tylerjereddy commented Sep 7, 2018

Choose a reason for hiding this comment

tylerjereddy commented Sep 7, 2018

tylerjereddy commented Sep 25, 2018

tylerjereddy commented Oct 6, 2018

shoyer commented Oct 9, 2018 • edited

tylerjereddy commented Oct 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tylerjereddy Oct 14, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahaldane commented Oct 15, 2018

tylerjereddy commented Oct 15, 2018

Choose a reason for hiding this comment

tylerjereddy commented Oct 15, 2018

charris commented Jun 9, 2022

tylerjereddy commented Jun 10, 2022

tylerjereddy commented Dec 10, 2022

shoyer commented Oct 9, 2018 •

edited

tylerjereddy Oct 14, 2018 •

edited