New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrameGroupBy.aggregate can not work with `tuple` as an argument #18079

Closed
sinanonur opened this Issue Nov 2, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@sinanonur

sinanonur commented Nov 2, 2017

The following code raises ValueError

grouped_df = df.groupby(group_by_attributes, as_index=False).aggregate(tuple)

Here is a more replicatable version:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 3), columns=list('ABC'))
grouped_df = df.groupby(['A', 'B'], as_index=False).aggregate(tuple)

Problem description

The statement above does not work because tuple is not a function. It throws:
ValueError: no results

Workaround

use the following groupby statement instead

grouped_df = df.groupby(['A', 'B'], as_index=False).aggregate(lambda x: tuple(x))

This was issued as a result of the following discussion: PyCQA/pylint#1709 (comment)

Expected Output

Should be able to work without raising a ValueError

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-97-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 8.1.2
setuptools: 28.2.0
Cython: 0.24.1
numpy: 1.13.3
scipy: 0.18.1
xarray: None
IPython: 5.4.1
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

@bobhaffner

This comment has been minimized.

Contributor

bobhaffner commented Nov 2, 2017

Hi @sinanonur, your example works in the newly released 0.21.0. Please upgrade when you can

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 3), columns=list('ABC'))
grouped_df = df.groupby(['A', 'B'], as_index=False).aggregate(tuple)
print(pd.__version__)
print(grouped_df.head())

0.21.0
          A         B                      C
0 -2.159796 -1.233800  (1.0704251018658486,)
1 -1.947438  2.082122  (0.5849118717358551,)
2 -1.738639  0.653051  (1.1259850203053805,)
3 -1.638240 -0.799216  (0.3626490086583796,)
4 -1.562435 -0.232689  (-1.120885109955278,)

@gfyoung gfyoung added the Groupby label Nov 3, 2017

@gfyoung

This comment has been minimized.

Member

gfyoung commented Nov 3, 2017

@bobhaffner : Cool! Mind adding that test so that we can close this issue?

@gfyoung gfyoung added the Testing label Nov 3, 2017

@jreback jreback added this to the 0.22.0 milestone Nov 3, 2017

@bobhaffner

This comment has been minimized.

Contributor

bobhaffner commented Nov 3, 2017

Hi @gfyoung Yes, sure thing.

@bobhaffner

This comment has been minimized.

Contributor

bobhaffner commented Nov 3, 2017

So this is interesting. I was building a test for this and everything started as expected

import pandas as pd
print(pd.__version__)
0.21.0
df = pd.DataFrame({'A' : [1, 1, 3], 'B' :  [1, 1, 4], 'C' :  [1, 3, 4]})
result = df.groupby(['A', 'B']).aggregate(tuple)
print(result)
               C
A   B        
1   1         (1, 3)
3   4         (4,)

But then I tried this

df = pd.DataFrame({'A' : [1, 1, 3], 'B' :  [1, 2, 4]})
result = df.groupby('A').aggregate(tuple)

and got this ValueError: Shape of passed values is (2, 2), indices imply (2, 1)

Am I missing something obvious? I'll try to dig into this more later on.

Click to see full traceback --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs) 3636 result = self._aggregate_multiple_funcs( -> 3637 [arg], _level=_level, _axis=self.axis) 3638 result.columns = Index(

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
595 if not len(results):
--> 596 raise ValueError("no results")
597

ValueError: no results

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
4632 blocks = form_blocks(arrays, names, axes)
-> 4633 mgr = BlockManager(blocks, axes)
4634 mgr._consolidate_inplace()

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/internals.py in init(self, blocks, axes, do_integrity_check, fastpath)
3027 if do_integrity_check:
-> 3028 self._verify_integrity()
3029

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/internals.py in _verify_integrity(self)
3238 if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3239 construction_error(tot_items, block.shape[1:], self.axes)
3240 if len(self.items) != tot_items:

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
4602 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4603 passed, implied))
4604

ValueError: Shape of passed values is (2, 2), indices imply (2, 1)

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in ()
1 df = pd.DataFrame({'A' : [1, 1, 3], 'B' : [1, 2, 4]})
----> 2 result = df.groupby('A').aggregate(tuple)
3 result

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
4187 versionadded=''))
4188 def aggregate(self, arg, *args, **kwargs):
-> 4189 return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
4190
4191 agg = aggregate

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
3640 name=self._selected_obj.columns.name)
3641 except:
-> 3642 result = self._aggregate_generic(arg, *args, **kwargs)
3643
3644 if not self.as_index:

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/groupby.py in _aggregate_generic(self, func, *args, **kwargs)
3675 result[name] = data.apply(wrapper, axis=axis)
3676
-> 3677 return self._wrap_generic_output(result, obj)
3678
3679 def _wrap_aggregated_output(self, output, names=None):

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/groupby.py in _wrap_generic_output(self, result, obj)
4225 if self.axis == 0:
4226 return DataFrame(result, index=obj.columns,
-> 4227 columns=result_index).T
4228 else:
4229 return DataFrame(result, index=obj.index,

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/frame.py in init(self, data, index, columns, dtype, copy)
328 dtype=dtype, copy=copy)
329 elif isinstance(data, dict):
--> 330 mgr = self._init_dict(data, index, columns, dtype=dtype)
331 elif isinstance(data, ma.MaskedArray):
332 import numpy.ma.mrecords as mrecords

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
459 arrays = [data[k] for k in keys]
460
--> 461 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
462
463 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
6138 axes = [_ensure_index(columns), _ensure_index(index)]
6139
-> 6140 return create_block_manager_from_arrays(arrays, arr_names, axes)
6141
6142

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
4635 return mgr
4636 except ValueError as e:
-> 4637 construction_error(len(arrays), arrays[0].shape, axes, e)
4638
4639

~/miniconda3/envs/dev35/lib/python3.5/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
4601 raise ValueError("Empty data passed with indices specified.")
4602 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4603 passed, implied))
4604
4605

ValueError: Shape of passed values is (2, 2), indices imply (2, 1)

@gfyoung

This comment has been minimized.

Member

gfyoung commented Nov 3, 2017

That's odd, as if we define the following function:

def f(x):
   return tuple(x)

The code works. Have a look to see what's going on there.

@bobhaffner

This comment has been minimized.

Contributor

bobhaffner commented Nov 4, 2017

Back on this again, but not having much luck figuring out the root cause.

Below is an excerpt from the aggregate method of NDFrameGroupBy

When I do df.groupby(['A', 'B']).aggregate(tuple) it predictably follows the if

When I do df.groupby('A').aggregate(tuple) it follows the else

@gfyoung's successful tuple func also follows the else

Not sure why the if is needed in the first place?

Also, this issue warns about using agg with classes

Any guidance is appreciated.

if self.grouper.nkeys > 1:
    return self._python_agg_general(arg, *args, **kwargs)
else:

    # try to treat as if we are passing a list
    try:
         assert not args and not kwargs
         result = self._aggregate_multiple_funcs([arg], _level=_level, _axis=self.axis)
         result.columns = Index(result.columns.levels[0],
         name=self._selected_obj.columns.name)
     except Exception:
         result = self._aggregate_generic(arg, *args, **kwargs)
@gfyoung

This comment has been minimized.

Member

gfyoung commented Nov 4, 2017

Not sure why the if is needed in the first place?

Try removing and see what happens. When there's a bug in our code, all bets are (almost) off 😄

@bobhaffner bobhaffner referenced this issue Nov 18, 2017

Merged

BUG fixes tuple agg issue 18079 #18354

4 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment