New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when key-only Grouper is passed to groupby in a list #14334

Closed
jonmmease opened this Issue Oct 3, 2016 · 2 comments

Comments

Projects
None yet
3 participants
@jonmmease
Contributor

jonmmease commented Oct 3, 2016

Overview

A Grouper object configured with only a key specification may be passed to groupby to group a DataFrame by a particular column. For example:

In [19]: import pandas as pd

In [20]: df = pd.DataFrame({'A': [0, 0, 0, 1, 1, 1], 
    ...:                    'B': [1, 1, 2, 2, 3, 3],
    ...:                    'C': [1, 2, 3, 4, 5, 6]})

In [26]: df.groupby(pd.Grouper(key='A')).count()
Out[26]: 
   B  C
A      
0  3  3
1  3  3

However, when this key-only Grouper is passed to groupby inside a list, a TypeError is thrown. The following example is for a scalar list containing the same Grouper object as in the previous example.

In [27]: df.groupby([pd.Grouper(key='A')]).count()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-f4f86763ebfc> in <module>()
----> 1 df.groupby([pd.Grouper(key='A')]).count()

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   3776         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   3777                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 3778                        **kwargs)
   3779 
   3780     def asfreq(self, freq, method=None, how=None, normalize=False):

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in groupby(obj, by, **kwds)
   1425         raise TypeError('invalid type: %s' % type(obj))
   1426 
-> 1427     return klass(obj, by, **kwds)
   1428 
   1429 

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    352                                                     level=level,
    353                                                     sort=sort,
--> 354                                                     mutated=self.mutated)
    355 
    356         self.obj = obj

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _get_grouper(obj, key, axis, level, sort, mutated)
   2400                         sort=sort,
   2401                         in_axis=in_axis) \
-> 2402             if not isinstance(gpr, Grouping) else gpr
   2403 
   2404         groupings.append(ping)

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, index, grouper, obj, name, level, sort, in_axis)
   2197 
   2198                 # get the new grouper
-> 2199                 grouper = self.grouper._get_binner_for_grouping(self.obj)
   2200                 self.obj = self.grouper.obj
   2201                 self.grouper = grouper

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in _get_binner_for_grouping(self, obj)
    290         group_axis = obj._get_axis(self.axis)
    291         return Grouping(group_axis, None, obj=obj, name=self.key,
--> 292                         level=self.level, sort=self.sort, in_axis=False)
    293 
    294     @property

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, index, grouper, obj, name, level, sort, in_axis)
   2213                     t = self.name or str(type(self.grouper))
   2214                     raise ValueError("Grouper for '%s' not 1-dimensional" % t)
-> 2215                 self.grouper = self.index.map(self.grouper)
   2216                 if not (hasattr(self.grouper, "__len__") and
   2217                         len(self.grouper) == len(self.index)):

/Users/measejm1/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py in map(self, mapper)
   2238         applied : array
   2239         """
-> 2240         return self._arrmap(self.values, mapper)
   2241 
   2242     def isin(self, values, level=None):

pandas/src/generated.pyx in pandas.algos.arrmap_int64 (pandas/algos.c:94003)()

TypeError: 'NoneType' object is not callable

The same error occurs for lists containing multiple key-only Groupers. e.g. [pd.Grouper(key='A'), pd.Grouper(key='B')]

Expected Output

In [27]: df.groupby([pd.Grouper(key='A')]).count()
Out[27]: 
   B  C
A      
0  3  3
1  3  3

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: 1.4.1
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@chris-b1

This comment has been minimized.

Show comment
Hide comment
@chris-b1

chris-b1 Oct 3, 2016

Contributor

This probably should work, thanks for the report! But I'm curious, why are using a Grouper in this case instead of df.groupby(['A']), df.groupby(['A','B']), etc?

Contributor

chris-b1 commented Oct 3, 2016

This probably should work, thanks for the report! But I'm curious, why are using a Grouper in this case instead of df.groupby(['A']), df.groupby(['A','B']), etc?

@jonmmease

This comment has been minimized.

Show comment
Hide comment
@jonmmease

jonmmease Oct 3, 2016

Contributor

Yeah, good question.

I'm in the process of working on implementing #5677 which will allow the A in df.groupby(['A']) to refer to an index level in df named A if df has no column named A. Per the conversation in #5677, If a frame has both a column named A and an index level named A then df.groupby(['A']) will group by the column (this maintains backwards compatibility). I came across this bug in the process of writing tests to verify that this ambiguity is resolved properly.

Contributor

jonmmease commented Oct 3, 2016

Yeah, good question.

I'm in the process of working on implementing #5677 which will allow the A in df.groupby(['A']) to refer to an index level in df named A if df has no column named A. Per the conversation in #5677, If a frame has both a column named A and an index level named A then df.groupby(['A']) will group by the column (this maintains backwards compatibility). I came across this bug in the process of writing tests to verify that this ambiguity is resolved properly.

@jreback jreback added this to the 0.19.1 milestone Oct 19, 2016

@jreback jreback closed this in 4852008 Oct 25, 2016

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this issue Nov 2, 2016

[Backport #14342] Bug: Error when key-only Grouper is passed to group…
…by in a list (GH14334)

closes #14334

Author: Jon M. Mease <jon.mease@jhuapl.edu>

Closes #14342 from jmmease/bug_14334 and squashes the following commits:

5e96797 [Jon M. Mease] Add tests for grouping on two columns
cee5ce6 [Jon M. Mease] Added bug description to new test case
f9ef05b [Jon M. Mease] Moved whatsnew to 0.19.1 and clarified description
14a4ae6 [Jon M. Mease] Added whatsnew for GH 14334
9805c30 [Jon M. Mease] Fix for GH 14334
dfd3e09 [Jon M. Mease] Added test case for GH 14334

(cherry picked from commit 4852008)

amolkahat added a commit to amolkahat/pandas that referenced this issue Nov 26, 2016

Bug: Error when key-only Grouper is passed to groupby in a list (GH14…
…334)

closes #14334

Author: Jon M. Mease <jon.mease@jhuapl.edu>

Closes #14342 from jmmease/bug_14334 and squashes the following commits:

5e96797 [Jon M. Mease] Add tests for grouping on two columns
cee5ce6 [Jon M. Mease] Added bug description to new test case
f9ef05b [Jon M. Mease] Moved whatsnew to 0.19.1 and clarified description
14a4ae6 [Jon M. Mease] Added whatsnew for GH 14334
9805c30 [Jon M. Mease] Fix for GH 14334
dfd3e09 [Jon M. Mease] Added test case for GH 14334
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment