New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: regression in Timedelta.__rfloordiv__ with integer data #19761

Closed
jorisvandenbossche opened this Issue Feb 19, 2018 · 12 comments

Comments

Projects
None yet
4 participants
@jorisvandenbossche
Member

jorisvandenbossche commented Feb 19, 2018

From a failing example in the docs (http://pandas-docs.github.io/pandas-docs-travis/timeseries.html#from-timestamps-to-epoch):

In [46]: stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='D')

In [47]: stamps.view('int64') // pd.Timedelta(1, unit='s')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-47-9893b9bae6cf> in <module>()
----> 1 stamps.view('int64') // pd.Timedelta(1, unit='s')

/home/joris/scipy/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.__rfloordiv__()

TypeError: Invalid dtype int64 for __floordiv__

This was working correctly in 0.22, and is mentioned in the docs explicitly as a way to get unix epoch data.

The previously working example (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#from-timestamps-to-epoch) looks like:

In [42]: stamps.view('int64') // pd.Timedelta(1, unit='s')
Out[42]: array([1349720105, 1349806505, 1349892905, 1349979305])

cc @jbrockmendel

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Feb 19, 2018

I think this was introduced (on purpose, tests were added) in #18961.
I agree the above example that was included in the docs is a bit strange, as when dividing 'int / timedelta', the timedelta is implicitly cast to an integer as it nanoseconds. But, on the other hand, this was clearly in our docs as a way to get epoch data, so we cannot just break it I think

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Mar 29, 2018

So we need to make a decision on this one. Options:

  1. Keep the behaviour on master (= keep limitation of behaviour of Timedelta.__rfloordiv__ ) and update the docs with an alternative solution. What would this alternative be?

  2. Undo the regression and re-enable the previous behaviour of Timedelta.__rfloordiv__

  3. Re-enable the previous behaviour of Timedelta.__rfloordiv__ but with a deprecation warning it will be removed in the future. Provide alternative solution in the warning and docs.

cc @jreback @TomAugspurger

Given that the method is not very intuitive, I would be in favor of option 3 if we have a good alternative.

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Mar 29, 2018

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Mar 29, 2018

For 1, the alternative solution would be to also use the integer value for the Timedelta

stamps.view('int64') // pd.Timedelta(1, unit='s').value

It's also a bit weird to have __floordiv__ work but not __div__ (as is the case in 0.22.0)

@jreback

This comment has been minimized.

Contributor

jreback commented Apr 15, 2018

so either use .value on the rhs operand (as @jbrockmendel suggest) or

In [12]: (stamps-pd.Timestamp(0)) // pd.Timedelta(1, 's')
Out[12]: Int64Index([1349720105, 1349806505, 1349892905, 1349979305], dtype='int64')

is a reasonable soln (This is more natural as you are dividing a TDI / Timedelta which is a bit more clear)

@jreback jreback modified the milestones: 0.23.0, 0.24.0 Apr 24, 2018

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.24.0, 0.23.0 Apr 24, 2018

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Apr 24, 2018

Are we comfortable with advertising the use of .value for Timestamp/Timedeltas ?

In that case, the stamps.view('int64') // pd.Timedelta(1, unit='s').value seems maybe best. Although we also had discussion regarding deprecating view (of course we can also use .astype('int') (but that does an extra copy) or .asi8)

@jreback

This comment has been minimized.

Contributor

jreback commented Apr 24, 2018

you can use .view if u want it’s ok here

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Apr 24, 2018

@jreback note that you actually proposed to deprecate view: #20251. If we still plan to do that, we should not use it here. If we think this is actually a good use case for view, we can also close that issue.

@jreback

This comment has been minimized.

Contributor

jreback commented Apr 24, 2018

i know

we have .asi8 which is slightly non standard
maybe just .astype(‘i8’) is canonical

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented May 14, 2018

I have a fix for this using .value, but I think we should just implement #14772 quick (to_epoch). Will give that a shot now.

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented May 14, 2018

We'll still need the warning for backwards compat, but we'll recommend to_epoch instead of doing a floordiv. I assume epoch conversion is the main use case for integer array / Timedelta.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 14, 2018

Warn on ndarray[int] // timedelta
Closes pandas-dev#19761.

```python
In [2]: pd.DatetimeIndex(['1931', '1970', '2017']).view('i8') // pd.Timedelta(1, unit='s')
pandas-dev/bin/ipython:1: FutureWarning: Floor division between integer array and Timedelta is
deprecated. Use 'array // timedelta.value' instead.

Out[2]: array([-1230768000,           0,  1483228800])
```

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue May 14, 2018

Warn on ndarray[int] // timedelta
Closes pandas-dev#19761.

```python
In [2]: pd.DatetimeIndex(['1931', '1970', '2017']).view('i8') // pd.Timedelta(1, unit='s')
pandas-dev/bin/ipython:1: FutureWarning: Floor division between integer array and Timedelta is
deprecated. Use 'array // timedelta.value' instead.

Out[2]: array([-1230768000,           0,  1483228800])
```
@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented May 14, 2018

I was thinking, would actually allowing a `Timestamp(..) / Timedelta(..) = number" operation be another option?
Probably not, because it is not a reversible operation, and it also not necessarily clear that such an operation interprets the Timestamp as timedelta since 1970, but just throwing out here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment