Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when key-only Grouper is passed to groupby in a list #14334

Closed
jonmmease opened this issue Oct 3, 2016 · 2 comments · Fixed by #14432
Closed

Error when key-only Grouper is passed to groupby in a list #14334

jonmmease opened this issue Oct 3, 2016 · 2 comments · Fixed by #14432
Milestone

Comments

@jonmmease
Copy link
Contributor

Overview

A Grouper object configured with only a key specification may be passed to groupby to group a DataFrame by a particular column. For example:

In [19]: import pandas as pd

In [20]: df = pd.DataFrame({'A': [0, 0, 0, 1, 1, 1], 
    ...:                    'B': [1, 1, 2, 2, 3, 3],
    ...:                    'C': [1, 2, 3, 4, 5, 6]})

In [26]: df.groupby(pd.Grouper(key='A')).count()
Out[26]: 
   B  C
A      
0  3  3
1  3  3

However, when this key-only Grouper is passed to groupby inside a list, a TypeError is thrown. The following example is for a scalar list containing the same Grouper object as in the previous example.

In [27]: df.groupby([pd.Grouper(key='A')]).count()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-f4f86763ebfc> in <module>()
----> 1 df.groupby([pd.Grouper(key='A')]).count()

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   3776         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   3777                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 3778                        **kwargs)
   3779 
   3780     def asfreq(self, freq, method=None, how=None, normalize=False):

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in groupby(obj, by, **kwds)
   1425         raise TypeError('invalid type: %s' % type(obj))
   1426 
-> 1427     return klass(obj, by, **kwds)
   1428 
   1429 

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    352                                                     level=level,
    353                                                     sort=sort,
--> 354                                                     mutated=self.mutated)
    355 
    356         self.obj = obj

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _get_grouper(obj, key, axis, level, sort, mutated)
   2400                         sort=sort,
   2401                         in_axis=in_axis) \
-> 2402             if not isinstance(gpr, Grouping) else gpr
   2403 
   2404         groupings.append(ping)

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, index, grouper, obj, name, level, sort, in_axis)
   2197 
   2198                 # get the new grouper
-> 2199                 grouper = self.grouper._get_binner_for_grouping(self.obj)
   2200                 self.obj = self.grouper.obj
   2201                 self.grouper = grouper

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _get_binner_for_grouping(self, obj)
    290         group_axis = obj._get_axis(self.axis)
    291         return Grouping(group_axis, None, obj=obj, name=self.key,
--> 292                         level=self.level, sort=self.sort, in_axis=False)
    293 
    294     @property

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, index, grouper, obj, name, level, sort, in_axis)
   2213                     t = self.name or str(type(self.grouper))
   2214                     raise ValueError("Grouper for '%s' not 1-dimensional" % t)
-> 2215                 self.grouper = self.index.map(self.grouper)
   2216                 if not (hasattr(self.grouper, "__len__") and
   2217                         len(self.grouper) == len(self.index)):

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py in map(self, mapper)
   2238         applied : array
   2239         """
-> 2240         return self._arrmap(self.values, mapper)
   2241 
   2242     def isin(self, values, level=None):

pandas/src/generated.pyx in pandas.algos.arrmap_int64 (pandas/algos.c:94003)()

TypeError: 'NoneType' object is not callable

The same error occurs for lists containing multiple key-only Groupers. e.g. [pd.Grouper(key='A'), pd.Grouper(key='B')]

Expected Output

In [27]: df.groupby([pd.Grouper(key='A')]).count()
Out[27]: 
   B  C
A      
0  3  3
1  3  3

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: 1.4.1
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@chris-b1
Copy link
Contributor

chris-b1 commented Oct 3, 2016

This probably should work, thanks for the report! But I'm curious, why are using a Grouper in this case instead of df.groupby(['A']), df.groupby(['A','B']), etc?

@jonmmease
Copy link
Contributor Author

Yeah, good question.

I'm in the process of working on implementing #5677 which will allow the A in df.groupby(['A']) to refer to an index level in df named A if df has no column named A. Per the conversation in #5677, If a frame has both a column named A and an index level named A then df.groupby(['A']) will group by the column (this maintains backwards compatibility). I came across this bug in the process of writing tests to verify that this ambiguity is resolved properly.

@jreback jreback added this to the 0.19.1 milestone Oct 19, 2016
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue Nov 2, 2016
…d to groupby in a list (GH14334)

closes pandas-dev#14334

Author: Jon M. Mease <jon.mease@jhuapl.edu>

Closes pandas-dev#14342 from jmmease/bug_14334 and squashes the following commits:

5e96797 [Jon M. Mease] Add tests for grouping on two columns
cee5ce6 [Jon M. Mease] Added bug description to new test case
f9ef05b [Jon M. Mease] Moved whatsnew to 0.19.1 and clarified description
14a4ae6 [Jon M. Mease] Added whatsnew for GH 14334
9805c30 [Jon M. Mease] Fix for GH 14334
dfd3e09 [Jon M. Mease] Added test case for GH 14334

(cherry picked from commit 4852008)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment