Skip to content

PERF: to_excel slowing runtime #59180

@OnindoK

Description

@OnindoK

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this issue exists on the latest version of pandas.

  • I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

import numpy as np 
import pandas as pd 

rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(0, 100, size=(100_000, 500)), columns=np.arange(0,500))

%time df.to_excel(r"\to_excel issues\test.xlsx")

New: Wall time: 18min 35s, 18min 29s, 15min 27s, 15min 25s
Old: Wall time: 14min 2s, 12min 10s,14min 7s

The differences aren't that large for this size but when you start getting bigger it grows exponentially (100k, 1k) already takes over an hour or over two hours.

Installed Versions

New:
INSTALLED VERSIONS

commit : a671b5a
python : 3.11.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 2.1.4
numpy : 1.26.4
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : None
pytest : 7.4.0
hypothesis : None
sphinx : 5.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : None
pymysql : 1.4.6
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.20.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat: None
fastparquet : None
fsspec : 2023.10.0
gcsfs : None
matplotlib : 3.8.0
numba : 0.59.0
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 14.0.2
pyreadstat : None
pyxlsb : None
s3fs : 2023.10.0
scipy : 1.11.4
sqlalchemy : 2.0.25
tables : 3.9.2
tabulate : 0.9.0
xarray : 2023.6.0
xlrd : None
zstandard : 0.19.0
tzdata : 2023.3
qtpy : 2.4.1
pyqt5 : None

Prior Performance

Old:
INSTALLED VERSIONS

Wall time: 14min 2s, 12min 10s,14min 7s

commit : 2cb9652
python : 3.8.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.2.4
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 52.0.0.post20210125
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : 4.0.1
blosc : None
feather : None
xlsxwriter : 1.3.8
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.9.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 9.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO Excelread_excel, to_excelPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions