-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add datetime64/timedelta64 support to linspace #17437
Conversation
@seberg
Should I add this support to linspace? |
@charris I added the release note. Is the note sufficient? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I would feel slightly better if users of datetime64 could chime in (@jorisvandenbossche just in case you are interested and have time).
Do we worry about integer overflows which could create new NaTs or incorrect results? The integer code paths fall back to float64, so they will lose precision, but not give completely off results.
We do have the small thing that the timedelta64 step will lose precision compared to the internal float64. That is probably OK, I wonder if we should add it to Notes or so in the documentation?
@debsankha are you interested in having a look at this updated version of your old PR?
I don't agree with that -
so passing in |
@eric-wieser When
However, when
The internal computation of the sample deltas begins by casting |
It sounds like you have a bug to fix for when |
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
@eric-wieser now the behavior is equivalent to non-datetime types:
|
@seberg is there anything else I need to do on this PR? |
@l-johnston hmm, I guess I dropped the ball on this due to the |
@seberg In the case of integer output, the returned
I propose documenting the precision issue. |
@melissawm @seberg I picked up PR #14700 and completed the work with improvements:
Expanding on the >>> import pandas as pd
>>> tdi = pd.timedelta_range("0 s", "1000 s", 33).astype("timedelta64[s]")
>>> np.unique(np.diff(tdi.values))
array([31., 32.]) The difference is pandas function internally coerces |
Ping @jbrockmendel, not sure this is up your alley, but if it is any input/opinion is appreciated. About |
@seberg Is this a matter of implementation or is |
I haven't worked much with linspace, but we have code similar to this in pandas analogous to arange in pandas.core.arrays._ranges. Most of the code there is handling various overflow cases. |
Hello, has there been any more progress on this issue? |
The PR has reached a stalemate regarding the retstep issue. My
recommendation is to close the PR as rejected. If you're interested in the
feature, you might consider proposing a new function to generate linearly
spaced date-like points.
Lee
…On Wed, May 4, 2022 at 1:22 PM Alex Korman ***@***.***> wrote:
Hello, has there been any more progress on this issue?
—
Reply to this email directly, view it on GitHub
<#17437 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEH5CHCF3LQTNTE5JVPBRZ3VIK577ANCNFSM4SDE4DTQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
What is meant by 'bigger surprise'? |
Well, integer to float does not drop the "unit" information, timedelta to float does, which seems a bit weird to me? If |
In the current implementation, the returned In [1]: import numpy as np
In [3]: np.linspace(np.timedelta64(1, "s"), np.timedelta64(3, "s"), 10, dtype="timedelta64[ms]", retstep=True)
Out[3]:
(array([1000, 1222, 1444, 1666, 1888, 2111, 2333, 2555, 2777, 3000],
dtype='timedelta64[ms]'),
numpy.timedelta64(222,'ms')) In Pandas, they convert the inputs to nanoseconds. In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: pd.timedelta_range("1000 ms", "3000 ms", periods=10)
Out[3]:
TimedeltaIndex([ '0 days 00:00:01', '0 days 00:00:01.222222222',
'0 days 00:00:01.444444444', '0 days 00:00:01.666666666',
'0 days 00:00:01.888888888', '0 days 00:00:02.111111111',
'0 days 00:00:02.333333333', '0 days 00:00:02.555555555',
'0 days 00:00:02.777777777', '0 days 00:00:03'],
dtype='timedelta64[ns]', freq=None)
In [4]: np.diff(Out[3])
Out[6]:
array([222222222, 222222222, 222222222, 222222222, 222222223, 222222222,
222222222, 222222222, 222222223], dtype='timedelta64[ns]') |
By defer, would it be acceptable to return |
I though raising |
@seberg I agree NotImplementedError would be better: In [1]: import numpy as np
In [2]: np.linspace(np.timedelta64(1, "s"), np.timedelta64(3, "s"), 10, dtype="timedelta64[ms]", retstep=True)
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 np.linspace(np.timedelta64(1, "s"), np.timedelta64(3, "s"), 10, dtype="timedelta64[ms]", retstep=True)
File <__array_function__ internals>:180, in linspace(*args, **kwargs)
File .../numpy/numpy/core/function_base.py:130, in linspace(start, stop, num, endpoint, retstep, dtype, axis)
128 if start.dtype.kind in "mM":
129 if retstep:
--> 130 raise NotImplementedError("'step` output not supported for date-like inputs")
131 from numpy.ma import MaskedArray, filled
132 if dtype is None:
NotImplementedError: 'step` output not supported for date-like inputs Should we solicit mail list feedback? |
@l-johnston Thank you for working on this PR! If you'd like to get feedback from the NumPy mailing list, please feel free to go ahead. |
Closes #10514
Closes #14700
I picked up PR #14700 and completed the work with improvements: