Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Timestamp and Timedelta .value changing in 2.0 #49076

Closed
Tracked by #46587
jbrockmendel opened this issue Oct 13, 2022 · 11 comments · Fixed by #50891
Closed
Tracked by #46587

API: Timestamp and Timedelta .value changing in 2.0 #49076

jbrockmendel opened this issue Oct 13, 2022 · 11 comments · Fixed by #50891
Labels
Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type Timestamp pd.Timestamp and associated methods
Milestone

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 13, 2022

import pandas as pd
import numpy as np

dt = np.datetime64("2016-01-01", "ms")
ts = pd.Timestamp(dt)

>>> ts.value
1451606400000        # <- 2.0/main
1451606400000000000  # <- 1.x

In previous versions .value has always been in nanoseconds. By changing the reso we get with some Timestamp/Timedelta inputs, we change the .value, which is technically public API.

One option would be just to document the change in .value behavior along with the other breaking changes docs.

Another would be to use e.g. ._value internally and keep .value as representing nanos.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 13, 2022
@mroeschke
Copy link
Member

Probably least disruptive to have .value continue to represent nanos.

Personally, I was never fond of the vague meaning of .value and could use this as an opportunity to deprecate in favor of Timestamp.to_epoch #14772 and Timedelta.to_something

@mroeschke mroeschke added Timedelta Timedelta data type Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timestamp pd.Timestamp and associated methods and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 14, 2022
@mroeschke mroeschke mentioned this issue Nov 7, 2022
5 tasks
@jorisvandenbossche jorisvandenbossche added this to the 2.0 milestone Jan 13, 2023
@jorisvandenbossche
Copy link
Member

The Arrow CI was failing because of this (in combination with the changes parsing of strings in Timestamp(..), #50704). Now, the failure was only because of using .value in a test to construct the expected result, and that is something easy to update.
However, I also looked into our actual conversion code, and it seems we do use .value to access the integer value, and we do assume that this is always nanoseconds at the moment (https://github.com/apache/arrow/blob/2b50694c10e09e4a1343b62c6b5f44ad4403d0e1/python/pyarrow/src/arrow/python/python_to_arrow.cc#L360-L365)

It's a bit of a corner case (and so apparently also not covered by a test, since we didn't get a failure for this), but can be triggered by converting a list of Timestamp objects (or object dtype array), and explicitly passing a nanoseconds timestamp type:

# using current pandas main branch
>>> pa.array([pd.Timestamp("2012-01-01 09:01:02")], type=pa.timestamp("ns"))
<pyarrow.lib.TimestampArray object at 0x7f46ce55ec80>
[
  1970-01-01 00:00:01.325408462
]

# triggering the pd.Timestamp to be nanosecond resolution -> correct result
>>> pa.array([pd.Timestamp("2012-01-01 09:01:02.000000000")], type=pa.timestamp("ns"))
<pyarrow.lib.TimestampArray object at 0x7f46ac585480>
[
  2012-01-01 09:01:02.000000000
]

This could be fixed on the pyarrow side by checking the unit of the Timestamp, and only using .value when it is actually nanoseconds, and otherwise falling back to interpreting it as datetime.datetime and getting the components that way.

But it would break current pyarrow releases, so if we want to change .value, ideally we would wait a bit longer with that to give time for pyarrow to update for this.

So to summarize, I agree with the above that the easiest would be to keep .value as returning nanoseconds, always, regardless the actual unit.

But long term it would be good to have some way to get the raw underlying integer (that is more efficient that building up this value from the components, as we do for datetime.datetime objects). But this could indeed also be a method, as suggested by @mroeschke

@MarcoGorelli
Copy link
Member

+1 on @mroeschke 's suggestion

I'll work on .value today

@jbrockmendel
Copy link
Member Author

Two bikeshed thoughts to what to use instead of .value internally: 1) using asi8 would allow for some code-sharing with DTA/DTI/TDA/TDI, 2) having something distinctive would make it easier to grep for places where we use it.

@MarcoGorelli
Copy link
Member

For now, in #50891, I just made it _value

I wasn't quite prepared for how many files it would involve changing

@MarcoGorelli
Copy link
Member

@jorisvandenbossche @jbrockmendel @mroeschke are we sure this is a good idea?

Because then the public-facing .value would be unavailable for old timestamps, e.g. Timestamp('300-01-01 00:00:00'), because it would overflow (and this is what motivated the introduction of non-nanosecond resolution)

@jbrockmendel
Copy link
Member Author

i guess we'd document that it is for nanos-only and direct users to use e.g. .asm8.view('i8') more generally

@MarcoGorelli
Copy link
Member

sure, I'll add that to the error message

Timedelta doesn't support non-nano reso yet, right? If so, perhaps I'll hold off timedelta changes for now then, that'll reduce the diff and make review a bit easier

@jbrockmendel
Copy link
Member Author

Timedelta supports non-nano, but doesn't do inference on strings

@MarcoGorelli
Copy link
Member

thanks - how do I set it? I'm seeing

In [6]: Timedelta(days=1, unit='s').unit
Out[6]: 'ns'

@MarcoGorelli
Copy link
Member

ah got it, nvm

In [2]: Timedelta(days=1, unit='s').as_unit('s').unit
Out[2]: 's'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants