Error when key-only Grouper is passed to groupby in a list #14334

jonmmease · 2016-10-03T00:13:34Z

Overview

A Grouper object configured with only a key specification may be passed to groupby to group a DataFrame by a particular column. For example:

In [19]: import pandas as pd

In [20]: df = pd.DataFrame({'A': [0, 0, 0, 1, 1, 1], 
    ...:                    'B': [1, 1, 2, 2, 3, 3],
    ...:                    'C': [1, 2, 3, 4, 5, 6]})

In [26]: df.groupby(pd.Grouper(key='A')).count()
Out[26]: 
   B  C
A      
0  3  3
1  3  3

However, when this key-only Grouper is passed to groupby inside a list, a TypeError is thrown. The following example is for a scalar list containing the same Grouper object as in the previous example.

In [27]: df.groupby([pd.Grouper(key='A')]).count()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-f4f86763ebfc> in <module>()
----> 1 df.groupby([pd.Grouper(key='A')]).count()

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   3776         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   3777                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 3778                        **kwargs)
   3779 
   3780     def asfreq(self, freq, method=None, how=None, normalize=False):

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in groupby(obj, by, **kwds)
   1425         raise TypeError('invalid type: %s' % type(obj))
   1426 
-> 1427     return klass(obj, by, **kwds)
   1428 
   1429 

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    352                                                     level=level,
    353                                                     sort=sort,
--> 354                                                     mutated=self.mutated)
    355 
    356         self.obj = obj

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _get_grouper(obj, key, axis, level, sort, mutated)
   2400                         sort=sort,
   2401                         in_axis=in_axis) \
-> 2402             if not isinstance(gpr, Grouping) else gpr
   2403 
   2404         groupings.append(ping)

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, index, grouper, obj, name, level, sort, in_axis)
   2197 
   2198                 # get the new grouper
-> 2199                 grouper = self.grouper._get_binner_for_grouping(self.obj)
   2200                 self.obj = self.grouper.obj
   2201                 self.grouper = grouper

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _get_binner_for_grouping(self, obj)
    290         group_axis = obj._get_axis(self.axis)
    291         return Grouping(group_axis, None, obj=obj, name=self.key,
--> 292                         level=self.level, sort=self.sort, in_axis=False)
    293 
    294     @property

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, index, grouper, obj, name, level, sort, in_axis)
   2213                     t = self.name or str(type(self.grouper))
   2214                     raise ValueError("Grouper for '%s' not 1-dimensional" % t)
-> 2215                 self.grouper = self.index.map(self.grouper)
   2216                 if not (hasattr(self.grouper, "__len__") and
   2217                         len(self.grouper) == len(self.index)):

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py in map(self, mapper)
   2238         applied : array
   2239         """
-> 2240         return self._arrmap(self.values, mapper)
   2241 
   2242     def isin(self, values, level=None):

pandas/src/generated.pyx in pandas.algos.arrmap_int64 (pandas/algos.c:94003)()

TypeError: 'NoneType' object is not callable

The same error occurs for lists containing multiple key-only Groupers. e.g. [pd.Grouper(key='A'), pd.Grouper(key='B')]

Expected Output

In [27]: df.groupby([pd.Grouper(key='A')]).count()
Out[27]: 
   B  C
A      
0  3  3
1  3  3

Output of `pd.show_versions()`

## INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: 1.4.1
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

chris-b1 · 2016-10-03T13:40:20Z

This probably should work, thanks for the report! But I'm curious, why are using a Grouper in this case instead of df.groupby(['A']), df.groupby(['A','B']), etc?

jonmmease · 2016-10-03T15:04:08Z

Yeah, good question.

I'm in the process of working on implementing #5677 which will allow the A in df.groupby(['A']) to refer to an index level in df named A if df has no column named A. Per the conversation in #5677, If a frame has both a column named A and an index level named A then df.groupby(['A']) will group by the column (this maintains backwards compatibility). I came across this bug in the process of writing tests to verify that this ambiguity is resolved properly.

…d to groupby in a list (GH14334) closes pandas-dev#14334 Author: Jon M. Mease <jon.mease@jhuapl.edu> Closes pandas-dev#14342 from jmmease/bug_14334 and squashes the following commits: 5e96797 [Jon M. Mease] Add tests for grouping on two columns cee5ce6 [Jon M. Mease] Added bug description to new test case f9ef05b [Jon M. Mease] Moved whatsnew to 0.19.1 and clarified description 14a4ae6 [Jon M. Mease] Added whatsnew for GH 14334 9805c30 [Jon M. Mease] Fix for GH 14334 dfd3e09 [Jon M. Mease] Added test case for GH 14334 (cherry picked from commit 4852008)

chris-b1 added Bug Groupby labels Oct 3, 2016

jonmmease mentioned this issue Oct 4, 2016

Bug: Error when key-only Grouper is passed to groupby in a list (GH14334) #14342

Closed

4 tasks

jonmmease mentioned this issue Oct 15, 2016

ENH: Allow the groupby by param to handle columns and index levels (GH5677) #14432

Merged

8 tasks

jreback added this to the 0.19.1 milestone Oct 19, 2016

jreback closed this as completed in 4852008 Oct 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when key-only Grouper is passed to groupby in a list #14334

Error when key-only Grouper is passed to groupby in a list #14334

jonmmease commented Oct 3, 2016

chris-b1 commented Oct 3, 2016

jonmmease commented Oct 3, 2016

Error when key-only Grouper is passed to groupby in a list #14334

Error when key-only Grouper is passed to groupby in a list #14334

Comments

jonmmease commented Oct 3, 2016

Overview

Expected Output

Output of pd.show_versions()

chris-b1 commented Oct 3, 2016

jonmmease commented Oct 3, 2016

Output of `pd.show_versions()`