Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Dataframe.to_excel treats decimal.Decimal as string instead of numeric type, the data in the Excel cell is formatted as a string, not a number #49598

Open
3 tasks done
rspocz opened this issue Nov 9, 2022 · 6 comments
Labels
Bug IO Excel read_excel, to_excel

Comments

@rspocz
Copy link

rspocz commented Nov 9, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# ----------------------------------
### imports

import pandas as pd
import decimal as dc

# ----------------------------------
### definitions

value_str = '2.445'

# ----------------------------------
### initialization

df_float = pd.DataFrame(data=[float(value_str)])
df_decimal = pd.DataFrame(data=[dc.Decimal(value_str)])

# ----------------------------------
### exports

# excel
df_float.to_excel('float.xlsx')
df_decimal.to_excel('decimal.xlsx')

Issue Description

The code writes value to float.xlsx as number, but it is written as string in decimal.xlsx.

I identified that in the condition here

if is_integer(val):
decimal.Decimal datatype is not treated, and it is converted to string. I'd propose to return decimal.Decimal without converting to string, but it will require testing if excel writing libraries can handle it.

This same issue was already raised, but it was closed with no solution (by error?) #26277

Expected Behavior

Values in both dataframes should be written to excel as number.

Installed Versions

INSTALLED VERSIONS

commit : 91111fd
python : 3.9.13.final.0
python-bits : 64
OS : Linux
OS-release : 6.0.5-200.fc36.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Wed Oct 26 15:55:21 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.5.1
numpy : 1.23.4
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@rspocz rspocz added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 9, 2022
@alexanderwertman3
Copy link

Hi, I would like to resolve this issue.

@phofl
Copy link
Member

phofl commented Nov 14, 2022

Investigations welcome, but the other issue indicates that openpyxl can not handle decimal.

@phofl phofl added IO Excel read_excel, to_excel and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 14, 2022
@rspocz
Copy link
Author

rspocz commented Nov 14, 2022

For openpyxl, decimal should be supported from version 1.7.0 (2013-10-31) https://foss.heptapod.net/openpyxl/openpyxl/-/blob/branch/3.0/doc/changes.rst

For xlsxwriter, decimal should be supported from version 0.3.8 (August 23 2013) https://xlsxwriter.readthedocs.io/changes.html

I found out both by quickly looking at the change log, needs to be confirmed. But it seems like an optimistic start.

@alexanderwertman3 Do you have time to investigate in more detail the compatibility with openpyxl/xlsxwriter and, eventually, propose change in code? Otherwise, I can have a look in couple of weeks. Thanks

@alexanderwertman3
Copy link

@rspocz, I will begin investigating this week

@alexanderwertman3
Copy link

alexanderwertman3 commented Nov 28, 2022

@rspocz: Yes, it seems based on the change logs for openpxyl and xlsxwriter, decimal should be supported.

@phofl: Was there a reason cited as to why openpyxl cannot handle decimal from the other issue

@alexanderwertman3
Copy link

alexanderwertman3 commented Nov 28, 2022

  • In pandas/pandas/io/excel/_base.py, lines 49-54, is_bool, is_float, is_integer, etc. are imported from pandas.core.dtypes.common, which imports them from pandas.core.dtypes.inference, which imports them from pandas._libs.lib.
  • In pandas/_libs/lib.pyi, is_decimal is defined.

I believe we just need to add is_decimal to the list of methods imported here in pandas/pandas/io/excel/_base.py:
Screen Shot 2022-11-28 at 12 57 20 PM

and then call the method in an elif block here (also in pandas/pandas/io/excel/_base.py):
Screen Shot 2022-11-28 at 12 55 23 PM

so that we can return val as a decimal.Decimal type, rather than returning a string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants