New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.groupby() interprets tuple as list of keys #17979

Closed
toobaz opened this Issue Oct 25, 2017 · 0 comments

Comments

Projects
None yet
2 participants
@toobaz
Member

toobaz commented Oct 25, 2017

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame([[1, 2, 3, 4], [3, 4, 5, 6], [1, 4, 2, 3]],
   ...:                           columns=pd.MultiIndex.from_arrays([['a', 'b', 'b', 'c'],
   ...:                                                              [1, 1, 2, 2]]))

In [3]: df.groupby(('b', 1))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-ee2e2124876f> in <module>()
----> 1 df.groupby(('b', 1))

/home/nobackup/repo/pandas/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   5205         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   5206                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 5207                        **kwargs)
   5208 
   5209     def asfreq(self, freq, method=None, how=None, normalize=False,

/home/nobackup/repo/pandas/pandas/core/groupby.py in groupby(obj, by, **kwds)
   1757         raise TypeError('invalid type: %s' % type(obj))
   1758 
-> 1759     return klass(obj, by, **kwds)
   1760 
   1761 

/home/nobackup/repo/pandas/pandas/core/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    390                                                     level=level,
    391                                                     sort=sort,
--> 392                                                     mutated=self.mutated)
    393 
    394         self.obj = obj

/home/nobackup/repo/pandas/pandas/core/groupby.py in _get_grouper(obj, key, axis, level, sort, mutated, validate)
   2861                         sort=sort,
   2862                         in_axis=in_axis) \
-> 2863             if not isinstance(gpr, Grouping) else gpr
   2864 
   2865         groupings.append(ping)

/home/nobackup/repo/pandas/pandas/core/groupby.py in __init__(self, index, grouper, obj, name, level, sort, in_axis)
   2611                 if getattr(self.grouper, 'ndim', 1) != 1:
   2612                     t = self.name or str(type(self.grouper))
-> 2613                     raise ValueError("Grouper for '%s' not 1-dimensional" % t)
   2614                 self.grouper = self.index.map(self.grouper)
   2615                 if not (hasattr(self.grouper, "__len__") and

ValueError: Grouper for 'b' not 1-dimensional

Problem description

('b', 1) is a valid key and should be interpreted as such: instead, it is interpreted as ['b', 1].

This is related to #17977 , but the fix should be pretty easy.

Expected Output

In [4]: df.groupby([('b', 1)])
Out[4]: <pandas.core.groupby.DataFrameGroupBy object at 0x7fa35bf27780>

Output of pd.show_versions()

INSTALLED VERSIONS

commit: b539298
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.21.0rc1+30.gb539298ca
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@jreback jreback modified the milestones: 0.21.0, 0.21.1, 0.22.0 Oct 27, 2017

@jreback jreback closed this in 1719437 Nov 1, 2017

1kastner added a commit to 1kastner/pandas that referenced this issue Nov 5, 2017

BUG: DataFrame.groupby() interprets tuple as list of keys
closes pandas-dev#17979

Author: sfoo <sfoohei@gmail.com>
Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#17996 from GuessWhoSamFoo/groupby_tuples and squashes the following commits:

afb0031 [Jeff Reback] TST: separate out grouping-type tests
c52b2a8 [sfoo] Moved notes to 0.22; created is_axis_multiindex var - pending internal use
fb52c1c [sfoo] Added whatsnew; checked match_axis_length
99ebc4e [sfoo] Cast groupby tuple as list when multiindex

No-Stream added a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

BUG: DataFrame.groupby() interprets tuple as list of keys
closes pandas-dev#17979

Author: sfoo <sfoohei@gmail.com>
Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#17996 from GuessWhoSamFoo/groupby_tuples and squashes the following commits:

afb0031 [Jeff Reback] TST: separate out grouping-type tests
c52b2a8 [sfoo] Moved notes to 0.22; created is_axis_multiindex var - pending internal use
fb52c1c [sfoo] Added whatsnew; checked match_axis_length
99ebc4e [sfoo] Cast groupby tuple as list when multiindex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment