Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: show warning when exporting dataframe and hitting excel truncating limit #56954

Closed
1 of 3 tasks
SophieGarden opened this issue Jan 19, 2024 · 4 comments · Fixed by #57103
Closed
1 of 3 tasks

ENH: show warning when exporting dataframe and hitting excel truncating limit #56954

SophieGarden opened this issue Jan 19, 2024 · 4 comments · Fixed by #57103
Assignees
Labels
Docs Enhancement IO Excel read_excel, to_excel Warnings Warnings that appear or should be added to pandas

Comments

@SophieGarden
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I can see a warning when I save a pandas dataframe to excel, and one of the columns contain super long strings, and these strings are truncated, possibly due to a limit on the Excel side.

Feature Description

detect string length and Show a warning when the some contents of a cell exceed Excel length/size limit, to let the users know that the data won't be saved fully if choose to export to excel. (csv is probably better)

Alternative Solutions

show warning all the time or add this to pd.to_excel documentation

Additional Context

No response

@SophieGarden SophieGarden added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 19, 2024
@BenjaminHelyer
Copy link

Hi @SophieGarden , could you let us know the length of strings you're dealing with? I've personally not yet encountered an issue of strings being erased in a .xlsx file (not to say it can't happen), and I've dealt with exporting some pretty long strings in converting to .xlsx. Is it possible that you're running into the visibility limit for Excel, rather than the character limit?

This might be helpful to add into the pd.to_excel() documentation if this is what you're encountering. I expect others will run into the same, and if we can divert the questions to the Excel side this would probably be the best approach.

@asishm
Copy link
Contributor

asishm commented Jan 25, 2024

Seems like it is 32767 characters. See https://support.microsoft.com/en-au/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3.

Reproducer:

import pandas as pd
from io import BytesIO
df = pd.DataFrame({'col1': ['a'*35000, 'b']})
fobj = BytesIO()
df.to_excel(fobj, index=False)
fobj.seek(0)
df2 = pd.read_excel(fobj, engine='openpyxl')
assert len(df2['col1'].iloc[0]) == 35000 # raises (returns 32767 instead)

@rhshadrach
Copy link
Member

rhshadrach commented Jan 26, 2024

I think it makes sense to check the number of rows and columns (if we don't do so already). I would be opposed to checking every limitation @asishm linked to, and it does seem to me that the number of characters in a string is too detailed. That said, it appears there is a decent place for such a check in ExcelWriter._value_with_fmt, so I'd be okay with adding one there.

I think it would be good to add something like

pandas will check the number of rows, columns, and cell character count does not exceed Excel's limitations. All other limitations must be checked by the user.

to the documentation

@rhshadrach rhshadrach added Docs IO Excel read_excel, to_excel Warnings Warnings that appear or should be added to pandas and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 26, 2024
@luke396
Copy link
Contributor

luke396 commented Jan 27, 2024

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Enhancement IO Excel read_excel, to_excel Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants