BUG: boxing Timedeltas on .apply #11349

Closed
amelio-vazquez-reina opened this Issue Oct 16, 2015 · 2 comments

Comments

Projects
None yet
3 participants
Contributor

amelio-vazquez-reina commented Oct 16, 2015

Consider the following Series:

object_id
0CKVYKjyFn    76 days
0CrPL2QKH3   -15 days
0CrVStlVrg    23 days
0Cc5ZvS67u    76 days
0CTOk5OdtI    76 days
0CTSWtTzBa    76 days
0CwBqVeNCX    76 days
0CIRJFIOcD    58 days
0CRQPCxzQe   350 days
0CAq4m9Nru    15 days
0C617yvXBj    76 days
0CzUUJNKX9   -16 days
Name: days_left, dtype: timedelta64[ns]

I am hoping to convert the above to hours.

If I do:

my_series.dt.hours

I get:

AttributeError: 'Series' object has no attribute 'hours

What's even more strange is that if I do:

> my_series[0].total_seconds()/3600
1824.0

it works for one element, but if I do:

> my_series.apply(lambda x: x.total_seconds())

I get:

AttributeError: 'numpy.timedelta64' object has no attribute 'total_seconds'

I thought apply would run the function I pass it item by item in the series. Why does total_seconds() work for a single item, but not with apply?

Contributor

chris-b1 commented Oct 17, 2015

As outlined in the docs the way to do conversions is via astype (which truncates units) or by dividing by the appropriate delta (which doesn't)

In [8]: s.astype('m8[h]')
Out[8]: 
0     1824
1     -360
2      552
3     1824
4     1824
5     1824
6     1824
7     1392
8     8400
9      360
10    1824
11    -384
Name: 1, dtype: float64

In [9]: s / np.timedelta64(1, 'h')
Out[9]: 
0     1824
1     -360
2      552
3     1824
4     1824
5     1824
6     1824
7     1392
8     8400
9      360
10    1824
11    -384
Name: 1, dtype: float64

You're seeing that result with apply because a single element is boxed in a Timedelta when accessed (which has extra properties), but the underlying storage is a np.timedelta64 array, which doesn't.

Contributor

jreback commented Oct 17, 2015

In [5]: s = Series(pd.timedelta_range('1 day 1 s',periods=5,freq='h'))

In [6]: s
Out[6]: 
0   1 days 00:00:01
1   1 days 01:00:01
2   1 days 02:00:01
3   1 days 03:00:01
4   1 days 04:00:01
dtype: timedelta64[ns]

In [7]: s.dt.components
Out[7]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     1      0        0        1             0             0            0
1     1      1        0        1             0             0            0
2     1      2        0        1             0             0            0
3     1      3        0        1             0             0            0
4     1      4        0        1             0             0            0

In [8]: s.dt.
s.dt.components      s.dt.days            s.dt.freq            s.dt.microseconds    s.dt.nanoseconds     s.dt.seconds         s.dt.to_pytimedelta  s.dt.total_seconds   

@amelio-vazquez-reina the reason we don't support hour/minutes is for compatibility to datetime.timedelta and to make it slightly less confusing.

datetime.timedelta give you days,seconds,microseconds which are the TOTAL amount (which IMHO is actually confusing, but that is what the API is).

.components will give you the 'displayed' values (e.g. the components of the timedeltas), which you can then access.

so

s.apply(....) should actually box these into Timedelta objects (and not just leave them as np.timedelta64), as we do similarly for .apply with a datetime64[ns]

In [9]: s.apply(lambda x: type(x))
Out[9]: 
0    <type 'numpy.timedelta64'>
1    <type 'numpy.timedelta64'>
2    <type 'numpy.timedelta64'>
3    <type 'numpy.timedelta64'>
4    <type 'numpy.timedelta64'>
dtype: object

In [10]: Series(pd.date_range('20130101',periods=3)).apply(lambda x: type(x))
Out[10]: 
0    <class 'pandas.tslib.Timestamp'>
1    <class 'pandas.tslib.Timestamp'>
2    <class 'pandas.tslib.Timestamp'>
dtype: object

So this is a bug here
should be something like what is happening in __iter__ where the needs_i8_conversion and i8_boxer is called. I am going to repurpose this issue.

pull-requests welcome!

jreback added this to the 0.17.1 milestone Oct 17, 2015

jreback changed the title from Operations with Series holding Timedeltas to BUG: boxing Timedeltas on .apply Oct 17, 2015

@jreback jreback modified the milestone: Next Major Release, 0.17.1 Nov 13, 2015

@jreback jreback modified the milestone: 0.18.0, Next Major Release Dec 30, 2015

jreback closed this in #11564 Dec 31, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment