PERF: groupby is significantly slower for DatetimeIndex
with timezone
#58956
Labels
Needs Triage
Issue that has not been reviewed by a pandas team member
Performance
Memory or execution speed performance
Pandas version checks
Reproducible Example
The above code does a groupby with an arbitrary reduction, both with and without
tz
in the DatetimeIndex, the timings are the following:This is a significant performance difference, while the results are equal. Is this expected or a bug? I can workaround this with
df.tz_localize(None)
on my dataframe with timezones, but it still seemed good to report this.Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.11.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : Intel64 Family 6 Model 154 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : Dutch_Netherlands.1252
pandas : 2.2.2
numpy : 1.26.4
pytz : 2023.3.post1
dateutil : 2.9.0.post0
setuptools : 68.2.2
pip : 23.3.1
Cython : None
pytest : 7.4.3
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.12.1
gcsfs : None
matplotlib : 3.8.4
numba : 0.58.1
numexpr : None
odfpy : None
openpyxl : 3.1.3
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : 2024.3.0
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : 2.4.1
pyqt5 : None
Prior Performance
Tested for pandas 2.1.4 and 2.2.2, both versions behave similar.
The text was updated successfully, but these errors were encountered: