groupby.apply datetime bug affecting 0.17 #11324

hadjmic · 2015-10-14T11:52:40Z

Exception is raised when
a) the original dataframe has a datetime column
b) the groupby.apply function returns a series object with a new datetime column

Code to reproduce:

import pandas as pd
import datetime

df = pd.DataFrame([['1', datetime.datetime.today()], 
                   ['2', datetime.datetime.today()],
                   ['2', datetime.datetime(2010, 1, 1)]],
                   columns=['record', 'date'])
dd = df.groupby('record').apply(lambda x: pd.Series({'max_date': x['date'].max()}))

This is a new issue affecting 0.17

The text was updated successfully, but these errors were encountered:

jreback · 2015-10-14T11:59:42Z

canonical way of selection a max column (and way way more efficient)

In [59]: df.groupby('record').date.max()
Out[59]: 
record
1   2015-10-14 07:59:04.094327
2   2015-10-14 07:59:04.094343
Name: date, dtype: datetime64[ns]

jreback · 2015-10-14T12:03:47Z

I guess this a bug. You are doing a really really odd thing here though.

hadjmic · 2015-10-14T12:40:40Z

Imagine it in the following context:
you have a massive apache event log that you import in a pandas dataframe. The dataframe has as columns:
event_id, user_identifier, event_type, timestamp, other stuff

Objective is to create a new dataframe to see what the users have done. Thus, need to group by the user_identifier and somehow aggregate the events of each user. One of the things you need to find is the first and last timestmap the user interacted with the server.

Hope this clarifies things a bit.

Pandas is awesome by the way, you guys rule.

TomAugspurger · 2015-10-14T12:47:41Z

@hadjmic would

df.groupby(['user_identifier']).timestamp.agg(['min, 'max'])

work for you? You can also control the naming with .timestamp.agg({'max_date': 'max', 'min_date': 'min'}) (I might have the keys and values of that dictionary backward).
That will give the first (min) and last (max) timestamp per user. I guess you said that's just one of the things you need.

jreback · 2015-10-14T12:47:57Z

did my example in [59] not clarify? my point is technically using apply like this is ok, but canonically it is quite confusing.

hadjmic · 2015-10-14T13:12:23Z

Perhaps it would have been clearer if I said I have a processUserEvents function. The function takes a dataframe of user events as input (i.e. each group of the groupby operation) and returns back a Series with specific user characteristics. Among those, are the min and max of the timestamp, but there are a lot of other stuff involved, such as values extracted from url paths, query strings, flow paths, etc.

…das-dev#11324) Addressed PR comments Added comments and updated whatsnew

) Addressed PR comments Added comments and updated whatsnew

jreback · 2015-11-13T21:48:37Z

closed by #11548

jreback added Bug Prio-low Groupby labels Oct 14, 2015

jreback added this to the Next Major Release milestone Oct 14, 2015

jreback mentioned this issue Oct 28, 2015

Exception thrown on groupby() when non-grouping field has time type #11457

Closed

robdmc mentioned this issue Oct 28, 2015

implemented fix for groupby date bug, #11324 #11460

Closed

jreback modified the milestones: 0.17.1, Next Major Release Oct 28, 2015

robdmc added a commit to robdmc/pandas that referenced this issue Nov 4, 2015

Fixed groupby().apply(func) bug when working with time colums (GH pan…

4791e15

…das-dev#11324) Addressed PR comments Added comments and updated whatsnew

jreback mentioned this issue Nov 8, 2015

BUG: groupby with datetime columns #11548

Closed

jreback pushed a commit that referenced this issue Nov 13, 2015

Fixed groupby().apply(func) bug when working with time colums (GH #11324

5df693f

) Addressed PR comments Added comments and updated whatsnew

jreback closed this as completed Nov 13, 2015

gwpdt mentioned this issue Mar 14, 2017

BUG: Group-by numeric type-coercion with datetime #15680

Closed

endremborza mentioned this issue Nov 14, 2019

BUG: groupby apply with head(1) raises keyerror with datetime grouper #29617

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby.apply datetime bug affecting 0.17 #11324

groupby.apply datetime bug affecting 0.17 #11324

hadjmic commented Oct 14, 2015

jreback commented Oct 14, 2015

jreback commented Oct 14, 2015

hadjmic commented Oct 14, 2015

TomAugspurger commented Oct 14, 2015

jreback commented Oct 14, 2015

hadjmic commented Oct 14, 2015

jreback commented Nov 13, 2015

groupby.apply datetime bug affecting 0.17 #11324

groupby.apply datetime bug affecting 0.17 #11324

Comments

hadjmic commented Oct 14, 2015

jreback commented Oct 14, 2015

jreback commented Oct 14, 2015

hadjmic commented Oct 14, 2015

TomAugspurger commented Oct 14, 2015

jreback commented Oct 14, 2015

hadjmic commented Oct 14, 2015

jreback commented Nov 13, 2015