Skip to content

BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column #47166

@leodtprojectsd

Description

@leodtprojectsd

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from pandas import Timestamp
import pandas as pd 

print ("pandas Version:",  pd.__version__)
#dataframe
df = pd.DataFrame.from_dict({'filename': ['03_', '03_', '03_', '05_', '05_', '05_', '05_', '05_', '08_', '08_'], 
 'date_time': [Timestamp('2022-05-24 12:10:56'), Timestamp('2022-05-24 12:11:24'), Timestamp('2022-05-24 12:11:51'), 
               Timestamp('2022-05-24 12:41:54'), Timestamp('2022-05-24 12:42:21'), Timestamp('2022-05-24 12:42:49'),
               Timestamp('2022-05-24 12:43:16'), Timestamp('2022-05-24 12:43:44'), Timestamp('2022-05-24 12:57:30'), 
               Timestamp('2022-05-24 12:57:58')],
  'r': [80466.36, 71467.12, 72641.21, 76961.35, 86747.23, 81995.81, 74451.46, 69401.51, 73670.12, 78180.65]})

print ("df column types: ", df.info(),)

print ('\nWorks with: df.groupby(["filename"]).agg(["mean"])\n', df.groupby(["filename"]).agg(["mean"]))
print ('\nNot working with: df.groupby(["filename"]).agg("mean")\n', df.groupby(["filename"]).agg("mean"))
print ('\nNot working with: df.groupby(["filename"]).mean()\n', df.groupby(["filename"]).mean())


OUT: 
pandas Version: 1.3.5
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   filename   10 non-null     object        
 1   date_time  10 non-null     datetime64[ns]
 2   r          10 non-null     float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 368.0+ bytes
df column types:  None

Works with: df.groupby(["filename"]).agg(["mean"]) #See date_time column appearing
                              date_time          r
                                  mean       mean
filename                                         
03_      2022-05-24 12:11:23.666666752  74858.230
05_      2022-05-24 12:42:48.800000000  77911.472
08_      2022-05-24 12:57:44.000000000  75925.385

Not working with: df.groupby(["filename"]).agg("mean") #date_time column is gone
                   r
filename           
03_       74858.230
05_       77911.472
08_       75925.385

Not working with: df.groupby(["filename"]).mean() #date_time column is gone
                   r
filename           
03_       74858.230
05_       77911.472
08_       75925.385

Issue Description

I expected the same behavior from

  • df.groupby(["filename"]).agg(["mean"])
  • df.groupby(["filename"]).agg("mean")
  • df.groupby(["filename"]).mean()

Instead, when used with a df that has a column with datetime64[ns] data, only .agg(["mean"]) works, while .agg("mean") and .mean() drop the datetime64[ns] column

Expected Behavior

I expect that agg(["mean"]), agg("mean"), and mean(), behave the same.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 66e3805 python : 3.7.13.final.0 python-bits : 64 OS : Linux OS-release : 5.4.188+ Version : #1 SMP Sun Apr 24 10:03:06 PDT 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.5
numpy : 1.21.6
pytz : 2022.1
dateutil : 2.8.2
pip : 21.1.3
setuptools : 57.4.0
Cython : 0.29.30
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.6
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 5.5.0
pandas_datareader: 0.9.0
bs4 : 4.6.3
bottleneck : 1.3.4
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.8.1
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.13.3
pyarrow : 6.0.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.4.36
tables : 3.7.0
tabulate : 0.8.9
xarray : 0.20.2
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestGroupbyNuisance ColumnsIdentifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions