Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy.apply type upcasting regression #3911

Closed
wesm opened this issue Jun 15, 2013 · 9 comments · Fixed by #3913
Closed

GroupBy.apply type upcasting regression #3911

wesm opened this issue Jun 15, 2013 · 9 comments · Fixed by #3913
Milestone

Comments

@wesm
Copy link
Member

wesm commented Jun 15, 2013

Blocker for 0.11.1. This got broken sometime between 0.11.0 and master; I haven't had time to bisect yet. Essentially, write a mixed type DataFrame and a groupby function that extracts a row. This used to do type inference and convert object back to numeric in the columns of the resulting DataFrame. Didn't have a test so I don't blame anyone for breaking it by accident =)

In [151]: cafdata.dtypes
Out[151]:
id            int64
food         object
fgroup       object
nutrient     object
ngroup       object
units        object
value       float64
dtype: object

In [152]:
def max_value(group):
    return group.ix[group['value'].idxmax()]

max_value(cafdata)

Out[152]:
id                                      14366
food        Tea, instant, unsweetened, powder
fgroup                              Beverages
nutrient                             Caffeine
ngroup                                  Other
units                                      mg
value                                    3680
Name: 336702, dtype: object

In [153]: cafdata.groupby('fgroup').apply(max_value).dtypes
Out[153]:
id          object
food        object
fgroup      object
nutrient    object
ngroup      object
units       object
value       object
dtype: object
@wesm
Copy link
Member Author

wesm commented Jun 15, 2013

Actually not sure this is a regression, here's with 0.11.0:

In [166]:

grouped = data.groupby(['nutrient', 'fgroup'])
results = grouped.apply(max_value)
results.dtypes
Out[166]:
id          object
food        object
fgroup      object
nutrient    object
ngroup      object
units       object
value       object
dtype: object

for some reason making a bar plot of the value column fails in master. i will investigate when i can

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

is the failure a type error? plotting object Series was deprecated by #3572. should do results.value.convert_objects().plot(kind='bar')

@wesm
Copy link
Member Author

wesm commented Jun 15, 2013

Um, really? In this case the resulting plot is totally fine because all of the objects inside are floats. Why not attempt to convert_objects for dtype object and raise if it can't be converted to numeric?

@wesm
Copy link
Member Author

wesm commented Jun 15, 2013

I'm strongly 👎 on that change

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

ok then will change it now

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

@wesm cpcloud/pandas@fe3f01f if u want to check while waiting for travis...

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

@wesm also is that the only issue here? not clear to me if there's still a groupby issue from ur comments above

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

oh i c. still looks like casting back to original types is an issue

@jreback
Copy link
Contributor

jreback commented Jun 15, 2013

seldom used case, but test and fixed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants