Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Improve Styler to_excel Performance #47371

Merged
merged 16 commits into from
Jun 29, 2022

Conversation

tehunter
Copy link
Contributor

@tehunter tehunter commented Jun 15, 2022

@pep8speaks
Copy link

pep8speaks commented Jun 15, 2022

Hello @tehunter! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-06-29 14:18:30 UTC

@tehunter
Copy link
Contributor Author

ASV Results:

..       before           after         ratio
     [6f0be79b]       [db365209]
                      <styler-performance>
-       915±200ms         594±30ms     0.65  io.excel.WriteExcel.time_write_excel('openpyxl')
-       658±200ms         323±20ms     0.49  io.excel.WriteExcel.time_write_excel('xlsxwriter')
-       3.08±0.1s         934±40ms     0.30  io.excel.WriteExcelStyled.time_write_excel_style('xlsxwriter')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@tehunter tehunter changed the title PERF: Improve Styler Performance PERF: Improve Styler to_excel Performance Jun 15, 2022
@tehunter
Copy link
Contributor Author

@github-actions pre-commit

@datapythonista datapythonista added Performance Memory or execution speed performance IO Excel read_excel, to_excel labels Jun 16, 2022
@tehunter tehunter marked this pull request as ready for review June 16, 2022 11:43
@tehunter
Copy link
Contributor Author

tehunter commented Jun 16, 2022

Overview of changes:

  • Moved CSSResolver.expand_* methods (which are accessed in .atomize) to a dictionary called CSS_EXPANSIONS which is keyed directly by the CSS property, instead of having to do a string replace and concatenation.
  • CSSResolver.__call__ now accepts an iterable of declaration tuples (property, value) in addition to the previous string type.
    • When an iterable is passed, .parse does not need to be called
  • CSSExcelCell.__init__ no longer converts the css_styles (i.e. Styler.ctx) from a list of tuples to a CSS string. Instead, the list is converted to a frozenset via a dictionary (to keep only the final instance of each property declaration)
    • The list of tuples was already obtained from the initial CSS string(s) in Styler._update_ctx, so we were previously parsing, reforming into string, and parsing again.
  • CSSToExcelConverter.__call__ is now cached using lru_cache. The frozenset was necessary to make all the arguments hashable for the caching mechanism.

@tehunter
Copy link
Contributor Author

Benchmarks as of 8e56402

.       before           after         ratio
     [2b1184dd]       [8e56402e]
                      <styler-performance>
-        260±10ms          233±5ms     0.90  io.excel.WriteExcel.time_write_excel('xlsxwriter')
-        620±70ms         450±20ms     0.73  io.excel.WriteExcel.time_write_excel('openpyxl')
-      2.51±0.02s        1.42±0.3s     0.57  io.excel.WriteExcelStyled.time_write_excel_style('openpyxl')
-      1.69±0.07s         608±10ms     0.36  io.excel.WriteExcelStyled.time_write_excel_style('xlsxwriter')
- ```

@mroeschke mroeschke added this to the 1.5 milestone Jun 29, 2022
@mroeschke mroeschke merged commit ad842d3 into pandas-dev:main Jun 29, 2022
@mroeschke
Copy link
Member

Awesome improvement! Thanks @tehunter

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* Move CSS expansion lookup to dictionary

* Implement simple CSSToExcelConverter cache

* Eliminate list -> str -> list in CSSResolver

* Allow for resolution of duplicate properties

* Add performance benchmark for styled Excel

* CLN: Clean up PEP8 issues

* DOC: Update PR documentation

* CLN: Clean up PEP8 issues

* Fixes from pre-commit [automated commit]

* Make Excel CSS case-insensitive

* Test for ordering and caching

* Pre-commit fixes

* Remove built-in filter

* Increase maxsize of Excel cache

Co-authored-by: Thomas Hunter <Thomas.Hunter@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Excel Styler treatment of CSS side expansion is slow
4 participants