You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importpandasaspdprint(pd.__version__) # 0.22df=pd.DataFrame(
{'A': ['Jane', 'Jane', 'Charles', 'Charles'],
'B': ['red', 'blue', 'green', 'green']})
# here I would like to group by one of the columns (A in this case), and aggregate the other. # These two lines return a pandas DataFramedf.groupby('A', as_index=False).agg({'B': pd.Series.count}) # pd.DataFramedf.groupby('A', as_index=False).agg({'B': pd.Series.nunique}) # pd.DataFrame# But when I do it in this way I don't know why I am getting a pandas Series in the last linedf.groupby('A', as_index=False).B.count() # pd.DataFramedf.groupby('A', as_index=False).B.nunique() # pd.Series
Problem description
I am getting a pandas Series when trying to aggregate using "col.nunique()" notation with as_index set to False. Also, the pandas Series that is returned drops the values of the grouped column.
Expected Output
I think that the last line of code should return a pandas DataFrame in order to be consistent.
I am happy to help with this issue if its possible, I am not an expert but I would like to contribute.
Interesting indeed. I think the pattern is internally Cython implemented aggregation functions (i.e. sum, count, min, max, etc...) work, but functions that go down the apply route do not. To illustrate further:
In [38]: df.groupby('A', as_index=False).B.min() # Cythonized min funcOut[38]:
AB0Charlesgreen1JaneblueIn [39]: df.groupby('A', as_index=False).B.apply(min) # apply routeOut[39]:
0green1bluedtype: object
Admittedly might be tough for a first contribution but if you want to give it a shot the details of this implementation will be in pandas.core.groupby.groupby.py
Code Sample
I have the following code
Problem description
I am getting a pandas Series when trying to aggregate using "col.nunique()" notation with as_index set to False. Also, the pandas Series that is returned drops the values of the grouped column.
Expected Output
I think that the last line of code should return a pandas DataFrame in order to be consistent.
I am happy to help with this issue if its possible, I am not an expert but I would like to contribute.
Thanks a lot, this is an awesome library =).
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-41-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.3
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.4
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: