ENH: add datetime64/timedelta64 support to linspace #17437

l-johnston · 2020-10-03T17:50:40Z

I picked up PR #14700 and completed the work with improvements:

Separated the datetime64/timedelta64 code path from the original improving readability
Eliminated the need to specify dtype in the call to linspace
Solved the step size issue discussed in ENH: linspace works with datetime64 and timedelta64. #14700
Implemented the tests @seberg requested

l-johnston · 2020-10-04T16:34:38Z

@seberg arange supports standard library datetime.

>>> from datetime import datetime, timedelta
>>> import numpy as np
>>> t1 = datetime(2020,1,1)
>>> t2 = datetime(2020,1,9)
>>> np.arange(t1, t2, timedelta(days=1))
array(['2020-01-01T00:00:00.000000', '2020-01-02T00:00:00.000000',
       '2020-01-03T00:00:00.000000', '2020-01-04T00:00:00.000000',
       '2020-01-05T00:00:00.000000', '2020-01-06T00:00:00.000000',
       '2020-01-07T00:00:00.000000', '2020-01-08T00:00:00.000000'],
      dtype='datetime64[us]')

Should I add this support to linspace?

numpy/core/function_base.py

l-johnston · 2020-10-14T13:58:55Z

@charris I added the release note. Is the note sufficient?

numpy/core/function_base.py

seberg

Looks good to me, I would feel slightly better if users of datetime64 could chime in (@jorisvandenbossche just in case you are interested and have time).

Do we worry about integer overflows which could create new NaTs or incorrect results? The integer code paths fall back to float64, so they will lose precision, but not give completely off results.
We do have the small thing that the timedelta64 step will lose precision compared to the internal float64. That is probably OK, I wonder if we should add it to Notes or so in the documentation?

@debsankha are you interested in having a look at this updated version of your old PR?

numpy/core/function_base.py

numpy/core/tests/test_function_base.py

numpy/core/function_base.py

numpy/core/tests/test_function_base.py

eric-wieser · 2020-10-15T11:35:03Z

I don't agree with that - linspace on floats behaves as:

In [6]: np.linspace(0, np.nan, 5)
Out[6]: array([nan, nan, nan, nan, nan])

so passing in NaT should just result in an array of NaTs

l-johnston · 2020-10-15T11:53:18Z

@eric-wieser When start is NaT, you'll get the expected result consistent with non-datetime types.

>>> np.linspace(np.datetime64("NaT"), np.datetime64("2020-01-01"), 3)
array([       'NaT',        'NaT', '2020-01-01'], dtype='datetime64[D]')
>>> np.linspace(np.nan, 3, 3)
array([nan, nan,  3.])

However, when stop is NaT, the returned array contains nonsense values.

>>> np.linspace(np.datetime64("2020-01-01"), np.datetime64("NaT"), 5)
array([              '2020-01-01',  '-6313183731939619-05-11',
       '-12626367463881258-09-19', '-18939551195822896-01-28',
                            'NaT'], dtype='datetime64[D]')

The internal computation of the sample deltas begins by casting delta, which is a timedelta64, to np.int64 and NaT casts to -9223372036854775808. There is no integer equivalent of NaN. This is the fundamental difference between the datetime code path and the non-datetime code path.

eric-wieser · 2020-10-15T11:55:57Z

It sounds like you have a bug to fix for when stop is nat then :)

numpy/core/function_base.py

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

l-johnston · 2020-10-15T19:00:42Z

@eric-wieser now the behavior is equivalent to non-datetime types:

>>> np.linspace(np.timedelta64(1, "s"), np.timedelta64("NaT"), 5)
array(['NaT', 'NaT', 'NaT', 'NaT', 'NaT'], dtype='timedelta64[s]')

l-johnston · 2020-11-13T16:41:18Z

@seberg is there anything else I need to do on this PR?

seberg · 2020-11-13T16:59:28Z

@l-johnston hmm, I guess I dropped the ball on this due to the retstep problem. We need to come to a decision for that. Whether that is an error, or just documenting that retstep has bigger precision issues with timedelta/datetime. Or maybe there is another solution?

l-johnston · 2020-11-13T18:47:34Z

@seberg In the case of integer output, the returned retstep is a float and has the same precision issue as date-like. For example:

>>> arr, step = np.linspace(0, 1000, 33, dtype="int64", retstep=True)
>>> step
31.25
>>> np.unique(np.diff(arr))
array([31, 32])

I propose documenting the precision issue.

l-johnston · 2022-02-01T20:27:44Z

@melissawm @seberg
I believe I have addressed the issues:
Closes #10514
Closes #14700

I picked up PR #14700 and completed the work with improvements:

Separated the datetime64/timedelta64 code path from the original improving readability
Eliminated the need to specify dtype in the call to linspace
Solved the step size issue discussed in ENH: linspace works with datetime64 and timedelta64. ENH: linspace works with datetime64 and timedelta64. #14700
Implemented the tests @seberg requested
Implemented the NaT behavior @eric-wieser requested
Showed that the retstep precision behavior is no different than for integer types
Rebased as requested and confirmed the feature still works

Expanding on the retstep issue. Pandas offers the convenience function pandas.timedelta_range() that gives equivalent precision behavior:

>>> import pandas as pd
>>> tdi = pd.timedelta_range("0 s", "1000 s", 33).astype("timedelta64[s]")
>>> np.unique(np.diff(tdi.values))
array([31., 32.])

The difference is pandas function internally coerces start and stop to timedelta64[ns], but the issue is there as well.

seberg · 2022-02-01T20:46:18Z

Ping @jbrockmendel, not sure this is up your alley, but if it is any input/opinion is appreciated.

About retstep, I agree that integers also return a float, and share the issue. But at least a float is still strictly numerical, a timedelta is not numerical anymore, so while we can maybe argue this is just fine, I do think there is a bigger surprise here.

l-johnston · 2022-02-01T21:08:03Z

@seberg Is this a matter of implementation or is date-like incompatible with np.linspace() such that it should never be included? It seems to me that np.arange() is used when exact step size matters and that 'np.linspace()` is used when an exact number of array elements matters.

jbrockmendel · 2022-02-01T21:13:27Z

I haven't worked much with linspace, but we have code similar to this in pandas analogous to arange in pandas.core.arrays._ranges. Most of the code there is handling various overflow cases.

akorman128 · 2022-05-04T18:22:44Z

Hello, has there been any more progress on this issue?

l-johnston · 2022-05-04T20:29:50Z

The PR has reached a stalemate regarding the retstep issue. My recommendation is to close the PR as rejected. If you're interested in the feature, you might consider proposing a new function to generate linearly spaced date-like points. Lee

…

On Wed, May 4, 2022 at 1:22 PM Alex Korman ***@***.***> wrote: Hello, has there been any more progress on this issue? — Reply to this email directly, view it on GitHub <#17437 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEH5CHCF3LQTNTE5JVPBRZ3VIK577ANCNFSM4SDE4DTQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

l-johnston · 2022-05-18T17:41:35Z

@seberg

About retstep, I agree that integers also return a float, and share the issue. But at least a float is still strictly numerical, a timedelta is not numerical anymore, so while we can maybe argue this is just fine, I do think there is a bigger surprise here.

What is meant by 'bigger surprise'?

seberg · 2022-05-18T17:44:59Z

Well, integer to float does not drop the "unit" information, timedelta to float does, which seems a bit weird to me? If retstep was really the only issue left and we can't resolve it easy (e.g. with others voicing a clear opinion), then can't we defer it and really just refuse to support retstep?

l-johnston · 2022-05-18T18:23:46Z

In the current implementation, the returned retstep is a timedelta64 with unit set by the arguments to the function.

In [1]: import numpy as np

In [3]: np.linspace(np.timedelta64(1, "s"), np.timedelta64(3, "s"), 10, dtype="timedelta64[ms]", retstep=True)
Out[3]:
(array([1000, 1222, 1444, 1666, 1888, 2111, 2333, 2555, 2777, 3000],
       dtype='timedelta64[ms]'),
 numpy.timedelta64(222,'ms'))

In Pandas, they convert the inputs to nanoseconds.

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: pd.timedelta_range("1000 ms", "3000 ms", periods=10)
Out[3]:
TimedeltaIndex([          '0 days 00:00:01', '0 days 00:00:01.222222222',
                '0 days 00:00:01.444444444', '0 days 00:00:01.666666666',
                '0 days 00:00:01.888888888', '0 days 00:00:02.111111111',
                '0 days 00:00:02.333333333', '0 days 00:00:02.555555555',
                '0 days 00:00:02.777777777',           '0 days 00:00:03'],
               dtype='timedelta64[ns]', freq=None)
In [4]: np.diff(Out[3])
Out[6]:
array([222222222, 222222222, 222222222, 222222222, 222222223, 222222222,
       222222222, 222222222, 222222223], dtype='timedelta64[ns]')

l-johnston · 2022-05-18T18:41:27Z

@seberg

then can't we defer it and really just refuse to support retstep?

By defer, would it be acceptable to return retstep=None for date-like inputs?

seberg · 2022-05-19T00:30:47Z

I though raising NotImplemented if retstep=True, is returning None more useful?

l-johnston · 2022-05-19T12:23:03Z

@seberg I agree NotImplementedError would be better:

In [1]: import numpy as np
In [2]: np.linspace(np.timedelta64(1, "s"), np.timedelta64(3, "s"), 10, dtype="timedelta64[ms]", retstep=True)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 np.linspace(np.timedelta64(1, "s"), np.timedelta64(3, "s"), 10, dtype="timedelta64[ms]", retstep=True)
File <__array_function__ internals>:180, in linspace(*args, **kwargs)
File .../numpy/numpy/core/function_base.py:130, in linspace(start, stop, num, endpoint, retstep, dtype, axis)
    128 if start.dtype.kind in "mM":
    129     if retstep:
--> 130         raise NotImplementedError("'step` output not supported for date-like inputs")
    131     from numpy.ma import MaskedArray, filled
    132     if dtype is None:
NotImplementedError: 'step` output not supported for date-like inputs

Should we solicit mail list feedback?

InessaPawson · 2022-08-08T02:08:16Z

@seberg I agree NotImplementedError would be better:
Should we solicit mail list feedback?

@l-johnston Thank you for working on this PR! If you'd like to get feedback from the NumPy mailing list, please feel free to go ahead.

ENH: add datetime64/timedelta64 support to linspace

3da8454

github-actions bot added the 01 - Enhancement label Oct 3, 2020

l-johnston mentioned this pull request Oct 3, 2020

ENH: linspace works with datetime64 and timedelta64. #14700

Open

Honor end user's output dtype specification

cbe9c56

seberg self-requested a review October 4, 2020 16:29

charris added 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes component: numpy._core labels Oct 11, 2020

l-johnston added 2 commits October 11, 2020 08:14

Add release note

3fd0abd

Add release note

e6438a0

eric-wieser reviewed Oct 11, 2020

View reviewed changes

numpy/core/function_base.py Show resolved Hide resolved

l-johnston requested a review from eric-wieser October 11, 2020 16:34

eric-wieser reviewed Oct 14, 2020

View reviewed changes

numpy/core/function_base.py Outdated Show resolved Hide resolved

Address review comment regarding step==0

e718b31

seberg reviewed Oct 14, 2020

View reviewed changes

numpy/core/function_base.py Outdated Show resolved Hide resolved

numpy/core/tests/test_function_base.py Outdated Show resolved Hide resolved

numpy/core/function_base.py Show resolved Hide resolved

numpy/core/tests/test_function_base.py Show resolved Hide resolved

l-johnston added 2 commits October 14, 2020 18:21

Address review comments regarding astype copy and retstep

7bba61f

Address review comment regarding NaT

dad8253

eric-wieser reviewed Oct 15, 2020

View reviewed changes

numpy/core/function_base.py Outdated Show resolved Hide resolved

l-johnston and others added 2 commits October 15, 2020 07:02

Update numpy/core/function_base.py

00c6877

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

Add NaT support using MaskedArray strategy

6135c4b

l-johnston requested review from seberg and eric-wieser October 21, 2020 12:09

Base automatically changed from master to main March 4, 2021 02:05

l-johnston added 2 commits January 31, 2022 14:39

Merge branch 'numpy:main' into issue_10514

4ebed20

Fix linter issues

a12c0a2

This was referenced May 4, 2022

Linspace for datetime64 #21440

Closed

ENH: Support for datetimes in linspace #21332

Open

l-johnston added 3 commits May 21, 2022 07:47

Raise NotImplementedError for retstep True

aa482de

Fix line length

3bc4c97

Fix line length to less than 79 characters!

a72119f

l-johnston closed this by deleting the head repository Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add datetime64/timedelta64 support to linspace #17437

ENH: add datetime64/timedelta64 support to linspace #17437

l-johnston commented Oct 3, 2020 •

edited

Loading

l-johnston commented Oct 4, 2020

l-johnston commented Oct 14, 2020

seberg left a comment •

edited

Loading

eric-wieser commented Oct 15, 2020

l-johnston commented Oct 15, 2020

eric-wieser commented Oct 15, 2020

l-johnston commented Oct 15, 2020

l-johnston commented Nov 13, 2020

seberg commented Nov 13, 2020

l-johnston commented Nov 13, 2020

l-johnston commented Feb 1, 2022

seberg commented Feb 1, 2022

l-johnston commented Feb 1, 2022

jbrockmendel commented Feb 1, 2022

akorman128 commented May 4, 2022

l-johnston commented May 4, 2022 via email

l-johnston commented May 18, 2022

seberg commented May 18, 2022

l-johnston commented May 18, 2022

l-johnston commented May 18, 2022

seberg commented May 19, 2022

l-johnston commented May 19, 2022

InessaPawson commented Aug 8, 2022

ENH: add datetime64/timedelta64 support to linspace #17437

ENH: add datetime64/timedelta64 support to linspace #17437

Conversation

l-johnston commented Oct 3, 2020 • edited Loading

l-johnston commented Oct 4, 2020

l-johnston commented Oct 14, 2020

seberg left a comment • edited Loading

Choose a reason for hiding this comment

eric-wieser commented Oct 15, 2020

l-johnston commented Oct 15, 2020

eric-wieser commented Oct 15, 2020

l-johnston commented Oct 15, 2020

l-johnston commented Nov 13, 2020

seberg commented Nov 13, 2020

l-johnston commented Nov 13, 2020

l-johnston commented Feb 1, 2022

seberg commented Feb 1, 2022

l-johnston commented Feb 1, 2022

jbrockmendel commented Feb 1, 2022

akorman128 commented May 4, 2022

l-johnston commented May 4, 2022 via email

l-johnston commented May 18, 2022

seberg commented May 18, 2022

l-johnston commented May 18, 2022

l-johnston commented May 18, 2022

seberg commented May 19, 2022

l-johnston commented May 19, 2022

InessaPawson commented Aug 8, 2022

l-johnston commented Oct 3, 2020 •

edited

Loading

seberg left a comment •

edited

Loading