BUG: output of a transform is cast to dtype of input #10972

Closed
TomAugspurger opened this Issue Sep 2, 2015 · 4 comments

Comments

Projects
None yet
3 participants
Contributor

TomAugspurger commented Sep 2, 2015 edited by jreback

xref #11444, #13046 for addtl tests

In [27]: df = pd.DataFrame({'a': np.random.randint(0, 5, 365), 'b': pd.date_range('2015-01-01', periods=365, freq='D')})

In [28]: df.head()
Out[28]:
   a          b
0  3 2015-01-01
1  3 2015-01-02
2  4 2015-01-03
3  2 2015-01-04
4  4 2015-01-05

In [29]: df.groupby('a').b.transform(lambda x: x.dt.dayofweek - x.dt.dayofweek.mean()).head()
Out[29]:
0   1970-01-01 00:00:00.000000000
1   1970-01-01 00:00:00.000000001
2   1970-01-01 00:00:00.000000001
3   1970-01-01 00:00:00.000000002
4   1969-12-31 23:59:59.999999997
Name: b, dtype: datetime64[ns]

I expected a float. No idea how difficult this will be so I marked it for 0.18. I won't have time to get to it any earlier, but if someone else wants to...

TomAugspurger added this to the 0.18.0 milestone Sep 2, 2015

Contributor

jreback commented Sep 2, 2015

This is only a problem with transform; apply does this kind of inference

In [6]: df.groupby('a').b.apply(lambda x: x.dt.dayofweek - x.dt.dayofweek.mean()).head()
Out[6]: 
0    0.214286
1    1.054795
2    1.837209
3    2.837209
4   -3.162791
dtype: float64

@jreback jreback modified the milestone: Next Major Release, 0.18.0 Sep 2, 2015

Contributor

TomAugspurger commented Sep 2, 2015

Yeah, I've switched to apply for now. My actual case was transforming an integer to categorical (which raised an exception).

Contributor

jreback commented Sep 2, 2015

doesn't make sense to transform int->cat, rather just .astype

Contributor

TomAugspurger commented Sep 2, 2015

Not that simple in my case. Have to groupby a level and do some shift / diff logic to get my result.

@jreback jreback modified the milestone: 0.18.1, Next Major Release Mar 12, 2016

@jreback jreback modified the milestone: 0.18.1, 0.18.2 Apr 26, 2016

@jorisvandenbossche jorisvandenbossche modified the milestone: 0.20.0, 0.19.0 Aug 21, 2016

@jreback jreback added a commit to jreback/pandas that referenced this issue Feb 27, 2017

@jreback jreback BUG: GH15429 transform result of timedelta from datetime
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes #10972

Author: Jeff Reback <jeff@reback.net>
Author: Stephen Rauch <stephen.rauch+github@gmail.com>

Closes #15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
f5f244a

@jreback jreback added a commit to jreback/pandas that referenced this issue Feb 27, 2017

@jreback jreback BUG: GH15429 transform result of timedelta from datetime
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes #10972

Author: Jeff Reback <jeff@reback.net>
Author: Stephen Rauch <stephen.rauch+github@gmail.com>

Closes #15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
787feba

jreback closed this in 251826f Feb 27, 2017

@AnkurDedania AnkurDedania added a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017

@jreback @AnkurDedania jreback + AnkurDedania BUG: GH15429 transform result of timedelta from datetime
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes #10972

Author: Jeff Reback <jeff@reback.net>
Author: Stephen Rauch <stephen.rauch+github@gmail.com>

Closes #15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
ca8c53c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment