Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STYLE enable pylint: method-cache-max-size-none #49751

Merged

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented Nov 17, 2022

Issue #48855. This PR enables pylint type "W" warning: method-cache-max-size-none
Based on issue #47371 PERF: Improve Styler to_excel Performance I believe the warning is a false-positive in pandas/io/formats/excel.py#L176.
I disabled this warning on the line 176

@MarcoGorelli
Copy link
Member

Thanks @natmokval - not sure, I think this might be a case where the check might be telling us something useful

@mroeschke do you happen to remember why max_size was set to None, rather than the default 128? #47371 (comment)
The jobs' logs aren't available anymore

@mroeschke
Copy link
Member

@mroeschke do you happen to remember why max_size was set to None, rather than the default 128? #47371 (comment)

Not too familiar with excel styling, but probably was thinking the opposite, why should the cache size be limited?

Could the caching be refactored to an independent function like in https://pylint.pycqa.org/en/latest/user_guide/messages/warning/method-cache-max-size-none.html?

@mroeschke mroeschke added the Code Style Code style, linting, code_checks label Nov 17, 2022
@natmokval
Copy link
Contributor Author

Thanks @natmokval - not sure, I think this might be a case where the check might be telling us something useful

Thanks for checking @MarcoGorelli. In case I set max_size to 128, what will be the best way to check that my changes do not lead to regression?

@natmokval
Copy link
Contributor Author

Could the caching be refactored to an independent function like in https://pylint.pycqa.org/en/latest/user_guide/messages/warning/method-cache-max-size-none.html?

Thanks, @mroeschke. On the surface I don’t see how the below content can be moved out of the class:

properties = self.compute_css(declarations, self.inherited)
return self.build_xlstyle(properties)

Can you please advise me if you see a direction?

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fix this one, my programming messiah Anthony Sottile has a video on how lru_cache on methods should be avoided: https://youtu.be/sVjtp6tGo0g

I haven't tested this out yet, but I'm "requesting changes" to block this for now

@mroeschke
Copy link
Member

If I am understanding how this is used from my cursory scan of the code, this is used in to_excel to CSS style each cell of the result but will be cached relative to the number of unique Styler.ctx + Styler.ctx_index + Styler.ctx_columns

So to justify changing maxsize from None to 128 (or similar), is it reasonable to expect a user to provide 100's of CSS declarations? cc @attack68

@attack68
Copy link
Contributor

The number of total possible CSS properties must only be 100-200. So to use them all per cell is never done. V Rare to go more that 10.

Of course potentially lots more independent declarations if going by number of cells, but I'm assuming you meant the former.

@MarcoGorelli
Copy link
Member

Thanks - let's just go with the default (128) and put @lru_cache() then?

@mroeschke
Copy link
Member

Thanks - let's just go with the default (128) and put @lru_cache() then?

Sounds good

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this a bit more closely (and trying to repeat what Anthony does in the video I linked), and I think there's a real issue here

For this investigation, I applied the following diff:

diff --git a/pandas/io/formats/excel.py b/pandas/io/formats/excel.py
index a26b85390f..3a8a9ee960 100644
--- a/pandas/io/formats/excel.py
+++ b/pandas/io/formats/excel.py
@@ -170,6 +170,9 @@ class CSSToExcelConverter:
             self.inherited = self.compute_css(inherited)
         else:
             self.inherited = None
+    
+    def __del__(self) -> None:
+        print('heeeelp I\'m being deeeleteeeed')
 
     compute_css = CSSResolver()
 
@@ -193,6 +196,7 @@ class CSSToExcelConverter:
             A style as interpreted by ExcelWriter when found in
             ExcelCell.style.
         """
+        print('I\'m being computed (not looked up in cache)')
         properties = self.compute_css(declarations, self.inherited)
         return self.build_xlstyle(properties)

Once converter.__call__ has been called garbage collection no longer deletes converter!

In [1]: from pandas.io.formats.excel import CSSToExcelConverter

In [2]: import gc

In [3]: converter = CSSToExcelConverter("font-weight: bold")

In [4]: converter('vertical-align: top')
I'm being computed (not looked up in cache)
Out[4]: {'alignment': {'vertical': 'top'}, 'font': {'bold': True}}

In [5]: converter = None

In [6]: gc.collect()  # converter doesn't get deleted!
Out[6]: 373

However, if we then apply this trick from the video:

diff --git a/pandas/io/formats/excel.py b/pandas/io/formats/excel.py
index a26b85390f..692dd23e95 100644
--- a/pandas/io/formats/excel.py
+++ b/pandas/io/formats/excel.py
@@ -170,12 +170,20 @@ class CSSToExcelConverter:
             self.inherited = self.compute_css(inherited)
         else:
             self.inherited = None
+        self._call_cached = lru_cache(maxsize=None)(self._call_uncached)
 
     compute_css = CSSResolver()
 
-    @lru_cache(maxsize=None)
     def __call__(
         self, declarations: str | frozenset[tuple[str, str]]
+    ) -> dict[str, dict[str, str]]:
+        return self._call_cached(declarations)
+
+    def _call_uncached(
+        self, declarations: str | frozenset[tuple[str, str]]
     ) -> dict[str, dict[str, str]]:
         """
         Convert CSS declarations to ExcelWriter style.
@@ -193,6 +201,7 @@ class CSSToExcelConverter:
             A style as interpreted by ExcelWriter when found in
             ExcelCell.style.
         """
         properties = self.compute_css(declarations, self.inherited)
         return self.build_xlstyle(properties)

then it looks like it solves the issue:

In [14]: converter = CSSToExcelConverter("font-weight: bold")

In [15]: converter('vertical-align: top')
I'm being computed (not looked up in cache)
Out[15]: {'alignment': {'vertical': 'top'}, 'font': {'bold': True}}

In [16]: converter('vertical-align: top')  # caching still works
Out[16]: {'alignment': {'vertical': 'top'}, 'font': {'bold': True}}

In [17]: converter = None

In [18]: gc.collect()  # now, it gets deleted!
heeeelp I'm being deeeleteeeed
Out[18]: 698

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Nov 18, 2022

Can you please advise me if you see a direction?

I think what needs doing, then is to apply the second diff from #49751 (review) , but:

  • no need to add the __del__ method (that was just for debugging), nor the other print statement
  • the docstring of _call_uncached should go to the docstring of __call__
  • above the assignment self._call_cached = , there should probably be a commenting linking back to this issue, explaining why we avoid lru_cache on the method

pandas/tests/io/formats/test_to_excel.py will also need adjusting slightly, please make sure the tests pass (feel free to reach out on the Slack group if anything trips you up)

@natmokval
Copy link
Contributor Author

Changes from the second diff are applied. pandas/tests/io/formats/test_to_excel.py works with the changes. My local test run passed.

MarcoGorelli
MarcoGorelli approved these changes Nov 18, 2022
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending green, well done!

Couple last requests:

  1. this should probably be backported to 1.5.2, could you add a note to the v1.5.2 whatsnew note please? Just say that a memory leak has been fixed, and mention the class
  2. just left a really minor comment

pandas/io/formats/excel.py Outdated Show resolved Hide resolved
@MarcoGorelli MarcoGorelli added this to the 1.5.2 milestone Nov 18, 2022
natmokval and others added 2 commits November 18, 2022 20:53
Co-authored-by: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
@natmokval
Copy link
Contributor Author

Mentioned the memory leak in v1.5.2 whatsnew with issue #48855. Maybe it would be better to skip the number or create a new issue?

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc @MarcoGorelli merge when ready

@MarcoGorelli
Copy link
Member

Mentioned the memory leak in v1.5.2 whatsnew with issue #48855. Maybe it would be better to skip the number or create a new issue?

The PR number is fine if there's no issue open

Cool, let's wait for tests and we can ship it

@@ -134,7 +134,6 @@ disable = [
"invalid-envvar-default",
"invalid-overridden-method",
"keyword-arg-before-vararg",
"method-cache-max-size-none",
Copy link
Member

@mroeschke mroeschke Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli have we been using pylint before releasing 1.5? Not sure if this change will cause a merge conflict if backported, and I don't think it's worth backporting pylint configurations to 1.5.x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree about not backporting the configuration, I'd just backport the fix itself - I'd expect to get a merge conflict, I'll sort this out there

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @natmokval !

@MarcoGorelli MarcoGorelli merged commit 42bb8fe into pandas-dev:main Nov 19, 2022
@lumberbot-app
Copy link

lumberbot-app bot commented Nov 19, 2022

Oops, something went wrong applying the patch ... Please have a look at my logs.

@natmokval
Copy link
Contributor Author

Thank you @MarcoGorelli and @mroeschke for your help and support. Your explanations helped me to understand the problem.

mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
* STYLE enable pylint: method-cache-max-size-none

* STYLE enable pylint: method-cache-max-size-none II

* Minor comment update

Co-authored-by: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>

* add a note to the v1.5.2 whatsnew

* text improvement

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

Co-authored-by: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants