Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_excel cannot handle large sheet #26051

Closed
Pzoom522 opened this issue Apr 11, 2019 · 10 comments

Comments

@Pzoom522
Copy link

commented Apr 11, 2019

Problem description

For extremely large sheets (row_num > 1048576, col_num > 16384), see the code of XlsxWriter Engine, area which exceeds the limit won't be printed.
However, no error or warning is raised.

Expected Output

This sheet is too large! Area off (1048576, 1638) will not be printed.

Output of pd.show_versions()

latest (v0.24.2)

@chris-b1

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

A little strange that nothing happens on the xlsxwriter side, but yes would definitely take a PR that guards and raises on our end

@WillAyd

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

I think this makes more sense as an enhancement request for xlsxwriter that would just pass through here rather than doing it on our end

@anordin95

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

I'm new to pandas-dev. Does the closed tag on the feature request I see above indicate the work is already done and this entire issue should be closed?

@chris-b1

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

I think we could handle on the pandas side - Xlsxwriter's interface is essentially cell-based - there isn't a way to know in advance if those limits are going to be broken. Because pandas know the total table size in advance, I think it makes sense to check and raise.

@anordin95

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

I agree @chris-b1. I assume this hasn't been addressed, but I'm uncertain due to the red "Closed" indicator I see beside the feature request above my original comment.

@chris-b1

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

That closed is from the linked issue in the xlsxwriter repo

@anordin95

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

Ah, makes sense. Thank you!

@anordin95

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

Going to raise a ValueError unless anyone has a better suggestion.

@anordin95

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2019

I believe xlsx writer hangs when passed a sheet that is too large. I test whether calling df.to_excel raises a ValueError when given a sheet that is too large. In that case, the test passes. However, if too large of a sheet is passed, and the code to check input size is not there, the to_excel call will hang resulting in the test stalling. Any ideas on how to best proceed? I believe the best course would be to ignore testing the new checking logic, because adding the test I previously described would just add unhelpful code.

@anordin95

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2019

Solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.