ENH: linspace works with datetime64 and timedelta64. #14700

debsankha · 2019-10-15T09:21:54Z

This pull request is intended to partially fix the issue #10514.

The changes introduced in np.linspace does the following:

Calling linspace(start_datetime, stop_datetime, num, dtype=<dt>) with datetime64 and timedelta64 objects with a supplied dtype returns
without raising any exception.
However, if the dtype supplied has so low precision that num equispaced samples of that dtype cannot be drawn, then the array returned are not equispaced (just like the result of np.linspace(1, 10, 7, dtype=np.int64)) are not equispaced.

In [6]: d1 = np.datetime64(100, 'D')                                                                                                                          

In [7]: d2 = np.datetime64(200, 'D')                                                                                                                          

In [8]: np.linspace(d1, d2, 4, dtype='<M8[s]')                                                                                                                
Out[8]: 
array(['1970-04-11T00:00:00', '1970-05-14T08:00:00',
       '1970-06-16T16:00:00', '1970-07-20T00:00:00'],
      dtype='datetime64[s]')

In [9]: np.linspace(d1, d2, 4)                                                                                                                                
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-9-d64154514c2d> in <module>
----> 1 np.linspace(d1, d2, 4)

<__array_function__ internals> in linspace(*args, **kwargs)

~/src/numpy/build/testenv/lib/python3.5/site-packages/numpy/core/function_base.py in linspace(start, stop, num, endpoint, retstep, dtype, axis)
    137         # samples.
    138         assert dtype is not None, "linspace with {} objects cannot be called"\
--> 139             "with dtype=None".format(type(start))
    140         start = start.astype(dtype)
    141         stop = stop.astype(dtype)

AssertionError: linspace with datetime64[D]  objects cannot be called with dtype=None

eric-wieser · 2019-10-15T15:54:08Z

numpy/core/function_base.py

    if num > 1:
        step = delta / div
-        if _nx.any(step == 0):
+        if (not is_datelike) and _nx.any(step == 0):


I think this would be better as if np.issubdtype(step.dtype, np.floating), since that's the only case where denormal numbers apply

I tried that, and it fails test_equivalent_to_arange:

def test_equivalent_to_arange(self): for j in range(1000): assert_equal(linspace(0, j, j+1, dtype=int), > arange(j+1, dtype=int)) E AssertionError: E Arrays are not equal E E Mismatched elements: 1 / 23 (4.35%) E Max absolute difference: 1 E Max relative difference: 0.06666667 E x: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 16, E 17, 18, 19, 20, 21, 22]) E y: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, E 17, 18, 19, 20, 21, 22])

In the last line of linspace, the y[15] happens to be 14.999999999999998, which becomes 14 when cast to int, hence failing the test.

Hmmm, interesting. Although I wonder if it is generally true that doing the division first is more precise. But it is slower.

This line of code is no longer present in the PR any more, as I refactored it to use ndarray.view

eric-wieser · 2019-10-15T15:54:44Z

numpy/core/function_base.py

+    # excepting datetime64 and timedelta64
+    _mult_inplace = _nx.isscalar(delta) and (not is_datelike)


Why is this necessary?

Because we cannot do y *= delta when y is int and delta is timedelta64.

eric-wieser · 2019-10-15T15:57:16Z

numpy/core/function_base.py

-    # Convert float/complex array scalars to float, gh-3504
-    # and make sure one can use variables that have an __array_interface__, gh-6634
-    start = asanyarray(start) * 1.0
-    stop  = asanyarray(stop)  * 1.0
+    else:
+        # Convert float/complex array scalars to float, gh-3504
+        start = start * 1.0
+        stop  = stop  * 1.0
+        dt = result_type(start, stop, float(num))
+        delta = stop - start
+        y = _nx.arange(0, num, dtype=dt).reshape((-1,) + (1,) * ndim(delta))

-    dt = result_type(start, stop, float(num))
-    if dtype is None:
-        dtype = dt
+        if dtype is None:
+            dtype = dt

-    delta = stop - start


If would be nice if we could do this without any special casing. Does the following work?

start = asanyarray(start) stop = asanyarray(stop) dt = result_type(start, stop, float(num)) if dtype is None: dtype = dt delta = stop.astype(dt) - start.astype(dt)

Unfortunately not. result_type(start, stop, float(num)) raises TypeError if start/stop are datetime64 since type promotion between datetime64 and float are not allowed.

How about

start = asanyarray(start) stop = asanyarray(stop) delta = stop - start dt = result_type(delta, float(num)) delta = delta.astype(dt) if dtype is None: dtype = dt

Edit: this does not handle overflow correctly for signed integers.

This would have worked if type promotion between timedelta64 and float resulted in float, but it doesn't:

>>> np.result_type(np.timedelta64(1, 'D'), float(1)) dtype('<m8[D]')

Is this type promotion the intended behaviour? I do not know numpy's internals well enough to judge this.

@seberg : Could you please comment on this if you have time?

How about ndarray.view? Since timedelta64 is just int64 with units attached (essentially), we can get the desired float by, e.g., np.asarray(np.timedelta64(1, 'D')).view(np.int64). And then dividing by the appropriate amount if we want to normalise to e.g. seconds.

I think the result_type not returning float is definitely right. However, one could argue that it should just fail in the generic case (and only work as promotion in certain ufuncs). Hameers suggestion to use the knowledge that it is int64 may work. Although have to be careful with NaT.

@hameerabbasi @seberg : Thank you for your comments. I have tried the ndarray.view approach and it works. Adopting this led to reducing the number of if/else blocks and improved readability.

seberg · 2019-11-27T17:27:57Z

numpy/core/function_base.py

+        # that it is at all possible to draw num equispaced
+        # samples.
+        assert dtype is not None, "linspace with {} objects cannot be called "\
+            "with dtype=None".format(start.dtype)


Asserts should not be used for checks that users can run into. This should be a ValueError probably. I could imagine writing it without negation may be a tiny bit nicer.

Thank you for this comment. I have changed this assert into a ValueError.

seberg · 2019-11-27T17:29:39Z

numpy/core/function_base.py

+        start = start.astype(dtype)
+        stop = stop.astype(dtype)
+        # y cannot be cast to datetime64/timedelta64, but left
+        # as int64, since later on we will do y * delta, and


(just for you, technically, on 32bit platforms, arange probably returns int32. It does not matter though.)

Debsankha, do you know right away why making y a datetime/timedelta here fails later?

@seberg : Sorry for the late reply. Making y datetime/timedelta doesn't work because of line 170/172, where we do y *= delta or y = y * delta. delta is stop-start, so timedelta anyway, and if y is also datetime/timedelta, then the multiplication is not allowed.

I have adapted the PR following @hameerabbasi 's ndarray.view idea. Now these lines are no longer present.

seberg · 2019-11-27T17:33:14Z

numpy/core/function_base.py

-    # Convert float/complex array scalars to float, gh-3504
-    # and make sure one can use variables that have an __array_interface__, gh-6634
-    start = asanyarray(start) * 1.0
-    stop  = asanyarray(stop)  * 1.0
+    else:
+        # Convert float/complex array scalars to float, gh-3504
+        start = start * 1.0
+        stop  = stop  * 1.0
+        dt = result_type(start, stop, float(num))
+        delta = stop - start
+        y = _nx.arange(0, num, dtype=dt).reshape((-1,) + (1,) * ndim(delta))

-    dt = result_type(start, stop, float(num))
-    if dtype is None:
-        dtype = dt
+        if dtype is None:
+            dtype = dt

-    delta = stop - start


I think the result_type not returning float is definitely right. However, one could argue that it should just fail in the generic case (and only work as promotion in certain ufuncs). Hameers suggestion to use the knowledge that it is int64 may work. Although have to be careful with NaT.

seberg · 2019-11-27T17:45:46Z

numpy/core/function_base.py

    if num > 1:
        step = delta / div
-        if _nx.any(step == 0):
+        if (not is_datelike) and _nx.any(step == 0):


Hmmm, interesting. Although I wonder if it is generally true that doing the division first is more precise. But it is slower.

mattip · 2019-12-08T13:37:40Z

@debsankha do you need help moving forward with this?

…ode review

…passed.

debsankha · 2019-12-11T10:12:32Z

@debsankha do you need help moving forward with this?

Sorry for responding late to the code reviews. But I have adapted my PR taking into account the comments now and replied in the commend threads.

seberg · 2019-12-11T15:21:35Z

numpy/core/function_base.py

+                    "dtype=None".format(start.dtype))
+        if np.isnat(start):
+            raise ValueError("start is NaT, linspace won't work.")
+        if np.isnat(stop):


Unfortunately the spelling is _nx in this file. Will review more later, but this causes tests to fail.

Thanks for the comment. I have fixed it and the testsuite now passes.

seberg · 2019-12-11T16:33:15Z

numpy/core/function_base.py

+        # samples.
+        if dtype is None:
+            raise ValueError("linspace with {} objects cannot be called with"
+                    "dtype=None".format(start.dtype))


Sorry, more coming later maybe, so no need to jump on anything. A bit of nitpicking (I can fix it up also if you prefer):

The upper string needs a space at the end. Maybe linspace() (python does it, but I do not think we currenty do this really, so no matter). I would slightly prefer to write something like requires the `dtype` argument to be set to the desired output dtype.

The ValueErrors below could also be phrased a bit nicer, but I I think they probably should just be removed or did I forget why the check was there? linspace with NaN returns an array full of NaN.

Thanks for the review :)

The upper string needs a space at the end. Maybe linspace() (python does it, but I do not think we currenty do this really, so no matter). I would slightly prefer to write something like requires the `dtype` argument to be set to the desired output dtype.

Done.

The ValueErrors below could also be phrased a bit nicer,

Done

but I I think they probably should just be removed or did I forget why the check was there? linspace with NaN returns an array full of NaN.

If we remove those checks, we get:

In [14]: np.linspace(start=np.array('2019-11-12', 'M8[s]'), stop=np.array('NaT', 'M8[s]') num=3, dtype='M8[s]') Out[14]: array([ '2019-11-12T00:00:00', '146138514308-05-25T07:32:16', 'NaT'], dtype='datetime64[s]')

The reason is, stop(=NaT) gets converted to -9223372036854775808 via .view(np.int64).

Another fact:

In [24]: np.linspace(start=np.nan, stop=100, num=3) Out[24]: array([ nan, nan, 100.])

What do you think should be done? 'NaT' check for start and stop or no checks?

Hmmm, maybe we should jump through the hoops and set everything to NaT? I am not sure, an error could work for now I guess.

I am OK with both. If we are returning an array of NaT, I think for sake of consistency we should return an array of nan for the non-datetime equivalent case as well.

Therefore I would (slightly) prefer to just raise an error for now.

Sorry, I need to take a step back here (and probably mainly remember all the other things that were tricky). Why can't step have the correct timedelta (not datetime of course) dtype? timedelta + timedelta works, timedelta * int works and datetime + timedelta works.

Sorry, this is probably just me forgetting an old discussion :/.

Yes, y would need special casing to ensure it is integer (in other words, it also needs to be unitless in any case). But it seems to me delta does not itself need to be unit-less. And if it was not, y = y * delta would have the unit, and the NaT should drop out correctly (or at least identically to NaN?

You are right. delta does not need to be unitless and the NaT's are automatically taken care of analogously to the NaNs.

I got this error while trying this out:

In [15]: np.linspace(np.array('NaT', 'M8[s]'), t2, 10,dtype='M8[D]', retstep=True) build/testenv/lib/python3.6/site-packages/numpy/core/function_base.py:158: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. if _nx.any(step == 0): Out[15]: (array([ 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', '2019-11-12'], dtype='datetime64[D]'), numpy.timedelta64('NaT','s')) In [19]: np.array('NaT', 'M8[s]') == 0 runtests.py:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. #!/usr/bin/env python Out[19]: False

Any ideas what to do?

The warning is a consequence of the 0 not having the correct dtype (datetime with unit s). It would go away if we do == np.array(0, step.dtype). And since the ufuncs do that internally anyway, I guess the overhead, while annoying, may not matter too much.

Ok, then. I have refactored the stuff. The testsuite passes apart from one mysterious failure in test_all_modules_are_expected

@seberg : I will fix the merge conflicts and rebase to master if you are happy with the PR otherwise.

… in code review.

…tian's comments

mattip · 2019-12-16T22:25:21Z

There are merge conflicts

seberg

Looks much cleaner to me now, thanks. @eric-wieser any opinions? I guess we may have to ping the mailing list again, its unfortunate that it requires special casing for datetime (due to it not being compatible with floating point).

We will need to add a few extra tests for sure. One for the NaT behaviour(s) (maybe even multiple dimensions). And I would like one test of what happens for start and stop having different time units, and different time units from the dtype=... argument.

seberg · 2019-12-16T22:57:01Z

numpy/core/function_base.py

+                    "to be set to the desired output dtype".format(start.dtype))
+        # dt will be used as the dtype argument of arange after this if/else block ends,
+        # to generate array y. Therefore it must be unitless, not timedelta or datetime.
+        dt = _nx.integer


Should use _nx.int64 since, probably defaults to something wrong on windows or 32bit linux.

Ok, thanks for the suggestion!

seberg · 2019-12-16T23:02:50Z

numpy/core/function_base.py


    if endpoint and num > 1:
        y[-1] = stop

    if axis != 0:
        y = _nx.moveaxis(y, 0, axis)

+    if is_datelike:
+        y = y.astype(dtype)


Is this line still necessary? I wonder if it makes more sense to move the astype up to cast start already (or at least put it into that branch). I suppose since the unit could be different, it may be wrong to cast start first, so I guess casting later is correct?

You are right, I had put in the line there because before we made step unit-aware, this was necessary. Now it's redundant, since the return statement has the cast anyway. I remove it now.

Ah man, I messed up here I think :(. delta / div has to be a floating point number, but if delta is timedelta, it is not. That should break precision somewhere along the line. Since it happens anyway, I suppose we could cast delta to float directly, only that does currently not convert NaT to NaN, which is a bug...

I understand what you mean, but is that really a problem? I have tried this test, that passes:

def test_datetime_precision(self): t1 = array('1969-06-06 00:00:00', dtype='M8[s]') dt = timedelta64(1, 's') N = 100 t2 = t1 + dt*N assert_((linspace(t1, t2, N+1, dtype='M8[s]') == t1+dt*arange(N+1)).all())

Of course if N is so high that N equally spaced samples cannot be drawn between start and stop with dtype precision, it won't work.

Ohh no, I was wrong. There is a problem:

In [20]: t1 = np.array('2019-11-12 08:00:00', 'M8[s]') In [21]: t2 = np.array('2019-11-12 08:00:01', 'M8[s]') In [23]: np.linspace(t1, t2, 5, dtype='M8[ms]') Out[23]: array(['2019-11-12T08:00:00.000', '2019-11-12T08:00:00.000', '2019-11-12T08:00:00.000', '2019-11-12T08:00:00.000', '2019-11-12T08:00:01.000'], dtype='datetime64[ms]')

But this can be taken care of by casting start and stop to dtype at the beginning.
Is there anything else I am missing? I will in any case add these tests to the testsuite.

Even then t2 - t1 will be numpy.timedelta64(1000,'ms') if you have num=33: (t2 - t1) / div gives numpy.timedelta64(29,'ms'), but (t2 - t1).astype(np.int64) / 34 (which gives a float value) is 29.41176470588235, multiply that by 30 and you have a catastrophic failure?

I see there's a problem, but I am sorry to say I cannot figure out what the expected behaviour of linspace in this case should be. Considering:

In [15]: np.linspace(1, 10, 7, dtype=np.int64) Out[15]: array([ 1, 2, 4, 5, 7, 8, 10])

This datetime case yields, btw:

In [8]: np.linspace(t1, t2, 33, dtype='M8[ms]') Out[8]: array(['2019-11-12T08:00:00.000', '2019-11-12T08:00:00.031', '2019-11-12T08:00:00.062', '2019-11-12T08:00:00.093', '2019-11-12T08:00:00.124', '2019-11-12T08:00:00.155', '2019-11-12T08:00:00.186', '2019-11-12T08:00:00.217', '2019-11-12T08:00:00.248', '2019-11-12T08:00:00.279', '2019-11-12T08:00:00.310', '2019-11-12T08:00:00.341', '2019-11-12T08:00:00.372', '2019-11-12T08:00:00.403', '2019-11-12T08:00:00.434', '2019-11-12T08:00:00.465', '2019-11-12T08:00:00.496', '2019-11-12T08:00:00.527', '2019-11-12T08:00:00.558', '2019-11-12T08:00:00.589', '2019-11-12T08:00:00.620', '2019-11-12T08:00:00.651', '2019-11-12T08:00:00.682', '2019-11-12T08:00:00.713', '2019-11-12T08:00:00.744', '2019-11-12T08:00:00.775', '2019-11-12T08:00:00.806', '2019-11-12T08:00:00.837', '2019-11-12T08:00:00.868', '2019-11-12T08:00:00.899', '2019-11-12T08:00:00.930', '2019-11-12T08:00:00.961', '2019-11-12T08:00:01.000'], dtype='datetime64[ms]') In [12]: np.diff(np.linspace(t1, t2, 33, dtype='M8[ms]')) Out[12]: array([31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 39], dtype='timedelta64[ms]')

I am sorry if I failed to get your point.

No, the point is exactly that due to the step of 31 here, the intermediate values will be off by more than 1ULP for most values. This can always happen to some degree, but as is, for integers it behaves much better:

In [1]: np.diff(np.linspace(0, 1000, 33, dtype='int64')) Out[1]: array([31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32])

EDIT: And with no, I mean: yes of course that is exactly what I meant :)

I think your previous solution behaved better/correctly here :/. We are bound to bake in some annoying cases (which is not nice, since once it is in, it is difficult to fix), but weird NaT handling seems better than this to me...

Now I understand, thanks. So the options are:

NaT handling consistent with NaN, but outputs may in many cases be more off than necessary.

We must explicitly disallow either start or stop being NaT. Output as precise as dtype allows.

I find both these options annoying, but maybe not being able to call linspace with NaT's are not that big a disadvantage?

debsankha · 2019-12-17T07:35:35Z

We will need to add a few extra tests for sure. One for the NaT behaviour(s) (maybe even multiple dimensions). And I would like one test of what happens for start and stop having different time units, and different time units from the dtype=... argument.

These are all valid points. I will add the tests soon.

l-johnston · 2020-10-03T22:01:16Z

@rtatnet3 @seberg I have picked up this PR and submitted #17437 completing the work and with improvements:

Separated the datetime64/timedelta64 code path from the original improving readability
Eliminated the need to specify dtype in the call to linspace
Solved the step size issue discussed above
Implemented the tests @seberg requested

melissawm · 2022-01-31T18:51:08Z

@l-johnston this PR needs a rebase

@seberg can you take a look and see if this is close to merging? Thanks!

seberg · 2022-01-31T18:58:08Z

@melissawm I would prefer if someone familiar/heavy user of datetimes could check the past discussion and make a call/suggestion on any open questions (I think they still exists). I can see the weirder edge cases, but as a non-heavy user of this stuff, I find it difficult to make up my mind on them.

ENH: linspace works with datetime64 and timedelta64.

1830257

debsankha force-pushed the linspace-datetime branch from 5268629 to 1830257 Compare October 15, 2019 15:37

eric-wieser reviewed Oct 15, 2019

View reviewed changes

eric-wieser mentioned this pull request Nov 22, 2019

ENH: add datetime64 support to numpy.linspace. See #14918. #14957

Closed

mattip requested a review from seberg November 25, 2019 23:10

seberg reviewed Nov 27, 2019

View reviewed changes

Debsankha manik added 3 commits December 11, 2019 09:21

ENH: In linspace, replaced an assert by a ValueError in response to c…

857a7ae

…ode review

ENH: Refactored linspace taking into account code review in PR 14700

3bcd121

ENH: Refactored linspace, producing sensible error message if NaT is …

6491a3e

…passed.

seberg reviewed Dec 11, 2019

View reviewed changes

debsankha force-pushed the linspace-datetime branch from 45ce4ce to e0d483a Compare December 11, 2019 19:24

ENH: Improved error messages in linspace, as per sebastian's comments…

85be35b

… in code review.

debsankha force-pushed the linspace-datetime branch from e0d483a to 85be35b Compare December 11, 2019 19:26

ENH: Refactored linspace to no longer use unitless step, as per Sebas…

1d6a890

…tian's comments

debsankha force-pushed the linspace-datetime branch from 393e85c to 1d6a890 Compare December 16, 2019 22:21

seberg reviewed Dec 16, 2019

View reviewed changes

l-johnston mentioned this pull request Oct 3, 2020

ENH: add datetime64/timedelta64 support to linspace #17437

Closed

Base automatically changed from master to main March 4, 2021 02:04

github-actions bot added the 01 - Enhancement label Mar 4, 2021

seberg mentioned this pull request May 4, 2022

ENH: Support for datetimes in linspace #21332

Open

InessaPawson added the 52 - Inactive Pending author response label Apr 8, 2024

		# excepting datetime64 and timedelta64
		_mult_inplace = _nx.isscalar(delta) and (not is_datelike)

ENH: linspace works with datetime64 and timedelta64. #14700

Are you sure you want to change the base?

ENH: linspace works with datetime64 and timedelta64. #14700

Conversation

debsankha commented Oct 15, 2019 • edited

Choose a reason for hiding this comment

debsankha Oct 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Oct 15, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattip commented Dec 8, 2019

debsankha commented Dec 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

debsankha Dec 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattip commented Dec 16, 2019

seberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seberg Dec 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

debsankha commented Dec 17, 2019

l-johnston commented Oct 3, 2020

melissawm commented Jan 31, 2022

seberg commented Jan 31, 2022

debsankha commented Oct 15, 2019 •

edited

debsankha Oct 16, 2019 •

edited

eric-wieser Oct 15, 2019 •

edited

debsankha Dec 16, 2019 •

edited

seberg Dec 18, 2019 •

edited