Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GH15429 transform result of timedelta from datetime #15430

Conversation

stephenrauch
Copy link
Contributor

@stephenrauch stephenrauch commented Feb 16, 2017

The transform() operation needs to return a like-indexed. To facilitate this, transform starts with a copy of the original series. Then, after the computation for each group, sets the appropriate elements of the copied series equal to the result. At that point is does a type comparison, and discovers that the timedelta is not cast-able to a datetime.

@codecov-io
Copy link

codecov-io commented Feb 17, 2017

Codecov Report

Merging #15430 into master will decrease coverage by -0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15430      +/-   ##
==========================================
- Coverage   90.36%   90.36%   -0.01%     
==========================================
  Files         136      136              
  Lines       49553    49555       +2     
==========================================
+ Hits        44780    44781       +1     
- Misses       4773     4774       +1
Impacted Files Coverage Δ
pandas/core/groupby.py 94.99% <100%> (ø)
pandas/util/testing.py 81.91% <0%> (-0.19%)
pandas/types/cast.py 85.65% <0%> (+0.23%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb7dc7d...c3b0dd0. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Feb 17, 2017

instead of this, use pandas.type.cast._find_common_type rather than np.common_type; that is some old code.

@jreback
Copy link
Contributor

jreback commented Feb 17, 2017

your bug report is the same as this: #10972

so please add these as tests cases.

@jreback jreback added Bug Groupby Dtype Conversions Unexpected or buggy dtype conversions labels Feb 17, 2017
@@ -579,6 +579,8 @@ Bug Fixes



- Bug in ``groupby.transform`` where calculating a ``timedelta`` from a ``datetime`` caused a ``ValueError`` (:issue:`15429`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add 10972 as well (you can leave the other issue as well)

@jreback
Copy link
Contributor

jreback commented Feb 23, 2017

can you update

@stephenrauch stephenrauch force-pushed the group-by-transform-timedelta-from-datetime branch from 4e790f6 to e0004e7 Compare February 27, 2017 08:06
@stephenrauch
Copy link
Contributor Author

I tried to use pandas.type.cast._find_common_type rather than np.common_type in the function where this work was done, but one of the other test cases broke. I looked at it, but didn't undertsand it well enough to know how to fix.

@stephenrauch stephenrauch force-pushed the group-by-transform-timedelta-from-datetime branch from e0004e7 to eb0f498 Compare February 27, 2017 08:13
@@ -624,7 +624,8 @@ Bug Fixes
- Bug in ``pd.concat()`` in which concatting with an empty dataframe with ``join='inner'`` was being improperly handled (:issue:`15328`)



- Bug in ``groupby.transform`` where calculating a ``timedelta`` from a ``datetime`` caused a ``ValueError`` (:issue:`15429`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do a single note with the issue number. make this simpler, something like: .transform incorrectly changing the dtype of the output.

@@ -2906,8 +2906,15 @@ def transform(self, func, *args, **kwargs):
common_type = np.common_type(np.array(res), result)
if common_type != result.dtype:
result = result.astype(common_type)
except:
pass
except Exception as exc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? this is way complicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was a line I put into to see why the change to _find_common_type was causing problems.

@jorisvandenbossche
Copy link
Member

I looked at it, but didn't undertsand it well enough to know how to fix.

@stephenrauch Which test broke when you used _find_common_type? Can you try to describe the problem a bit more in detail?
(also, if you add new commits to a PR instead of amending it to the first, it is easier to see what you changed and to comment on those steps)

@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

@stephenrauch I have a fix for this which avoids all of this complication. I will push soon.

@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

@stephenrauch ok (aside from a flake issue) I think this should work. But if you'd run the perf suite for groupby to see that wasn't changed too much. IIRC that's the reason we did the inplace indexing; and I think that was before we did the fast_transform anyhow (which is now a different path), so this maybe be ok.

stephenrauch and others added 3 commits February 27, 2017 10:39
The transform() operation needs to return a like-indexed. To facilitate this, transform starts with a copy of the original series. Then, after the computation for each group, sets the appropriate elements of the copied series equal to the result. At that point is does a type comparison, and discovers that the timedelta is not cast-able to a datetime.
@jreback jreback force-pushed the group-by-transform-timedelta-from-datetime branch from 05d573e to c3b0dd0 Compare February 27, 2017 15:39
jreback added a commit to jreback/pandas that referenced this pull request Feb 27, 2017
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes pandas-dev#10972

Author: Jeff Reback <jeff@reback.net>
Author: Stephen Rauch <stephen.rauch+github@gmail.com>

Closes pandas-dev#15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
@jreback jreback added this to the 0.20.0 milestone Feb 27, 2017
jreback added a commit to jreback/pandas that referenced this pull request Feb 27, 2017
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes pandas-dev#10972

Author: Jeff Reback <jeff@reback.net>
Author: Stephen Rauch <stephen.rauch+github@gmail.com>

Closes pandas-dev#15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
@jreback jreback closed this in 251826f Feb 27, 2017
@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

thanks @stephenrauch I merged this (I actually needed some more fixes for making sure .agg didn't do the same thing). And perf testing was about the same.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes pandas-dev#10972

Author: Jeff Reback <jeff@reback.net>
Author: Stephen Rauch <stephen.rauch+github@gmail.com>

Closes pandas-dev#15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: output of a transform is cast to dtype of input
4 participants