Different return type when using groupby with nunique #21090

RaulPL · 2018-05-16T17:59:26Z

Code Sample

I have the following code

import pandas as pd
print(pd.__version__) # 0.22
df = pd.DataFrame(
    {'A': ['Jane', 'Jane', 'Charles', 'Charles'], 
     'B': ['red', 'blue', 'green', 'green']})

# here I would like to group by one of the columns (A in this case), and aggregate the other. 
# These two lines return a pandas DataFrame
df.groupby('A', as_index=False).agg({'B': pd.Series.count})  # pd.DataFrame
df.groupby('A', as_index=False).agg({'B': pd.Series.nunique})  # pd.DataFrame

# But when I do it in this way I don't know why I am getting a pandas Series in the last line
df.groupby('A', as_index=False).B.count()  # pd.DataFrame
df.groupby('A', as_index=False).B.nunique()  # pd.Series

Problem description

I am getting a pandas Series when trying to aggregate using "col.nunique()" notation with as_index set to False. Also, the pandas Series that is returned drops the values of the grouped column.

Expected Output

I think that the last line of code should return a pandas DataFrame in order to be consistent.

I am happy to help with this issue if its possible, I am not an expert but I would like to contribute.

Thanks a lot, this is an awesome library =).

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-41-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.3
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.4
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-05-16T21:49:16Z

Interesting indeed. I think the pattern is internally Cython implemented aggregation functions (i.e. sum, count, min, max, etc...) work, but functions that go down the apply route do not. To illustrate further:

In [38]: df.groupby('A', as_index=False).B.min()  # Cythonized min func
Out[38]: 
         A      B
0  Charles  green
1     Jane   blue

In [39]: df.groupby('A', as_index=False).B.apply(min) # apply route
Out[39]: 
0    green
1     blue
dtype: object

Admittedly might be tough for a first contribution but if you want to give it a shot the details of this implementation will be in pandas.core.groupby.groupby.py

RaulPL · 2018-05-17T14:04:00Z

What do you mean with the apply route? I will start reading about it

jbrockmendel added the Groupby label Aug 1, 2018

rhshadrach mentioned this issue May 5, 2020

BUG: groupby with as_index=False shouldn't modify grouping columns #34012

Merged

6 tasks

jreback added this to the 1.1 milestone May 14, 2020

jreback closed this as completed in #34012 May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different return type when using groupby with nunique #21090

Different return type when using groupby with nunique #21090

RaulPL commented May 16, 2018

INSTALLED VERSIONS

WillAyd commented May 16, 2018 •

edited

Loading

RaulPL commented May 17, 2018

Different return type when using groupby with nunique #21090

Different return type when using groupby with nunique #21090

Comments

RaulPL commented May 16, 2018

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented May 16, 2018 • edited Loading

RaulPL commented May 17, 2018

Output of `pd.show_versions()`

WillAyd commented May 16, 2018 •

edited

Loading