Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groupby aggregations could ignore non-numeric columns when axis=1 #3688

Closed
hayd opened this issue May 23, 2013 · 6 comments
Closed

Groupby aggregations could ignore non-numeric columns when axis=1 #3688

hayd opened this issue May 23, 2013 · 6 comments
Labels
Bug Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@hayd
Copy link
Contributor

hayd commented May 23, 2013

Perhaps the following groupby aggregation should work only the numeric columns, as they would when using the dataframe:

In [1]: df = pd.DataFrame({'bar': {0: 1, 1: 1, 2: 1}, 'foo': {0: 0, 1: 1, 2: 2}, 'foo1': {0: 1, 1: 2, 2: 3}, 'hello': {0: 'a', 1: 'a', 2: 'a'}}, columns=['bar', 'foo', 'foo', 'hello'])

In [2]: df
Out[2]:
   bar  foo  foo hello
0    1    0    1     a
1    1    1    2     a
2    1    2    3     a

In [3]: df.mean()  # hello is ignored
Out[13]:
bar    1
foo    1
foo    2
dtype: float64

In [4]: df.groupby(level=0, axis=1).mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-4-7c2612a8fbda> in <module>()
----> 1 df.groupby(level=0, axis=1).mean()

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
    351         """
    352         try:
--> 353             return self._cython_agg_general('mean')
    354         except GroupByError:
    355             raise

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
   1569
   1570     def _cython_agg_general(self, how, numeric_only=True):
-> 1571         new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   1572         return self._wrap_agged_blocks(new_blocks)
   1573

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
   1616
   1617         if len(new_blocks) == 0:
-> 1618             raise DataError('No numeric types to aggregate')
   1619
   1620         return new_blocks

DataError: No numeric types to aggregate

From this SO question, where I gave very hacky workaround.

cc #3683 @jreback was this the question you were talking about? This ones related but in the sense of coming up against non unique problems... Thought I should mention it here anyway.

@jreback
Copy link
Contributor

jreback commented May 23, 2013

this is the question, but still breaks (even on my new branch).... will look at this soon

@jreback
Copy link
Contributor

jreback commented May 23, 2013

linking to #3679

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 11, 2014
@keir
Copy link

keir commented Apr 29, 2014

Any update on this @jreback? Did your branch ever go anywhere?

@jreback
Copy link
Contributor

jreback commented Apr 29, 2014

you are welcome to do a pr if u would like

@WillAyd
Copy link
Member

WillAyd commented Jul 6, 2018

Still a problem, though note that this only fails due to axis=1

@WillAyd WillAyd changed the title Groupby aggregations could ignore non-numeric columns Groupby aggregations could ignore non-numeric columns when axis=1 Jul 6, 2018
@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Sep 20, 2020
@mroeschke mroeschke added Bug and removed API Design labels Apr 11, 2021
@jbrockmendel jbrockmendel added the Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply label Oct 29, 2021
@NumberPiOso
Copy link
Contributor

Nowadays running the example results in a FutureWarning at both operations.

In [3]: df.mean()
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.


In [4]:  df.groupby(level=0, axis=1).mean() 
FutureWarning: Dropping invalid columns in DataFrameGroupBy.mean is deprecated. In a future version, a TypeError will be raised. Before calling .mean, select only columns which should be valid for the function.
  df.groupby(level=0, axis=1).mean()
Out[4]: 
Empty DataFrame
Columns: [bar, foo, hello]
Index: []

This change was introduced by PR #41480. So, nowadays groupby aggregations should NOT ignore non-numeric columns. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

8 participants