GroupBy.apply type upcasting regression #3911

Closed
wesm opened this Issue Jun 15, 2013 · 9 comments

Comments

Projects
None yet
3 participants
Owner

wesm commented Jun 15, 2013

Blocker for 0.11.1. This got broken sometime between 0.11.0 and master; I haven't had time to bisect yet. Essentially, write a mixed type DataFrame and a groupby function that extracts a row. This used to do type inference and convert object back to numeric in the columns of the resulting DataFrame. Didn't have a test so I don't blame anyone for breaking it by accident =)

In [151]: cafdata.dtypes
Out[151]:
id            int64
food         object
fgroup       object
nutrient     object
ngroup       object
units        object
value       float64
dtype: object

In [152]:
def max_value(group):
    return group.ix[group['value'].idxmax()]

max_value(cafdata)

Out[152]:
id                                      14366
food        Tea, instant, unsweetened, powder
fgroup                              Beverages
nutrient                             Caffeine
ngroup                                  Other
units                                      mg
value                                    3680
Name: 336702, dtype: object

In [153]: cafdata.groupby('fgroup').apply(max_value).dtypes
Out[153]:
id          object
food        object
fgroup      object
nutrient    object
ngroup      object
units       object
value       object
dtype: object
Owner

wesm commented Jun 15, 2013

Actually not sure this is a regression, here's with 0.11.0:

In [166]:

grouped = data.groupby(['nutrient', 'fgroup'])
results = grouped.apply(max_value)
results.dtypes
Out[166]:
id          object
food        object
fgroup      object
nutrient    object
ngroup      object
units       object
value       object
dtype: object

for some reason making a bar plot of the value column fails in master. i will investigate when i can

Member

cpcloud commented Jun 15, 2013

is the failure a type error? plotting object Series was deprecated by pydata#3572. should do results.value.convert_objects().plot(kind='bar')

Owner

wesm commented Jun 15, 2013

Um, really? In this case the resulting plot is totally fine because all of the objects inside are floats. Why not attempt to convert_objects for dtype object and raise if it can't be converted to numeric?

Owner

wesm commented Jun 15, 2013

I'm strongly 👎 on that change

Member

cpcloud commented Jun 15, 2013

ok then will change it now

Member

cpcloud commented Jun 15, 2013

@wesm cpcloud/pandas@fe3f01f if u want to check while waiting for travis...

Member

cpcloud commented Jun 15, 2013

@wesm also is that the only issue here? not clear to me if there's still a groupby issue from ur comments above

Member

cpcloud commented Jun 15, 2013

oh i c. still looks like casting back to original types is an issue

jreback closed this in #3913 Jun 15, 2013

Contributor

jreback commented Jun 15, 2013

seldom used case, but test and fixed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment