Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a bound to the inference tips cache (and fix regressions from #2158) #2181

Merged
merged 5 commits into from
May 15, 2023

Conversation

jacobtylerwalls
Copy link
Member

@jacobtylerwalls jacobtylerwalls commented May 15, 2023

Type of Changes

Type
βœ“ πŸ› Bug fix

Overview

Fix some performance regressions from #2158 (1 and 2 below), and then improve on the status quo by adding a bound to the inference tips cache (3).

Closes #2180

Description

1
Not every exit path was removing the recursion guard, so this "cache" meant to always return to empty after recursion concluded was not doing so: 6bdd92d.

2
Add an isEmpty() method on InferenceContext so that None can be used as the portion of the cache key for any empty contexts instead of fresh, distinct InferenceContext instances: c807c03.

3
Add a bound to the inference tips cache: 4757af2.

Benchmark for (3.)

Benchmark: pylint home-assistant/core/homeassistant/helpers
test code:

diff --git a/astroid/inference_tip.py b/astroid/inference_tip.py
index c0e121c5..7a484bf6 100644
--- a/astroid/inference_tip.py
+++ b/astroid/inference_tip.py
@@ -19,6 +19,8 @@ from astroid.typing import (
     TransformFn,
 )
 
+hits = 0
+misses = 0
 _cache: OrderedDict[
     tuple[InferFn[Any], NodeNG, InferenceContext | None], list[InferenceResult]
 ] = OrderedDict()
@@ -52,6 +54,9 @@ def _inference_tip_cached(func: InferFn[_NodesT]) -> InferFn[_NodesT]:
             context = None
         try:
             yield from _cache[func, node, context]
+            global hits
+            hits += 1
+            print("HITS: ", hits)
             return
         except KeyError:
             # Recursion guard with a partial cache key.
@@ -65,6 +70,9 @@ def _inference_tip_cached(func: InferFn[_NodesT]) -> InferFn[_NodesT]:
                 result = _cache[func, node, context] = list(
                     func(node, context, **kwargs)
                 )
+                global misses
+                misses += 1
+                print("MISSES: ", misses)
             finally:
                 # Remove recursion guard.
                 _CURRENTLY_INFERRING.remove(partial_cache_key)
maxsize hits misses
2 5279 4452
64 5946 3533
512 5979 3480

Those are some pretty impressive results for a maxsize of 2! 😱 (I saw the same with several projects.)

For now, I decided to stick with 64.

Future work

In a subsequent PR we could exploring going all the way down to a cache size of 1 (πŸ™€), and then be able to throw away the ugly code we have here for managing dictionary entries, and just store last_inference_tip.

Small bounds still yield about equal hits and misses.

Further work could determine if storing only the last result
is optimal.
@codecov
Copy link

codecov bot commented May 15, 2023

Codecov Report

Merging #2181 (5a33300) into main (835de84) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2181   +/-   ##
=======================================
  Coverage   92.60%   92.60%           
=======================================
  Files          94       94           
  Lines       10798    10810   +12     
=======================================
+ Hits         9999    10011   +12     
  Misses        799      799           
Flag Coverage Ξ”
linux 92.36% <100.00%> (+<0.01%) ⬆️
pypy 87.56% <100.00%> (+0.01%) ⬆️
windows 92.20% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Ξ”
astroid/context.py 97.46% <100.00%> (+0.06%) ⬆️
astroid/inference_tip.py 97.72% <100.00%> (+0.66%) ⬆️

@jacobtylerwalls
Copy link
Member Author

Given the transient failure I just saw and re-ran to pass, we might want to restore the skip from c1e4c95 (that's still described in the comment).

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good ! I missed thΓ© reason why we can't use lru cache after extracting a function ?

@@ -18,9 +19,9 @@
TransformFn,
)

_cache: dict[
_cache: OrderedDict[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are on 3.7+ right? Dicts are ordered by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> d = {}
>>> d.popitem(last=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: dict.popitem() takes no keyword arguments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing OrderedDict is still good at :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL πŸ˜„

@jacobtylerwalls
Copy link
Member Author

jacobtylerwalls commented May 15, 2023

Look good ! I missed thΓ© reason why we can't use lru cache after extracting a function ?

Arbitrary **kwargs cannot be hashed in the cache key. (See comment). So this only works if I omit them:

diff --git a/astroid/inference_tip.py b/astroid/inference_tip.py
index 9eda5b4f..6829d91b 100644
--- a/astroid/inference_tip.py
+++ b/astroid/inference_tip.py
@@ -6,8 +6,8 @@
 
 from __future__ import annotations
 
-from collections import OrderedDict
 from collections.abc import Generator
+from functools import lru_cache
 from typing import Any, TypeVar
 
 from astroid.context import InferenceContext
@@ -19,10 +19,6 @@ from astroid.typing import (
     TransformFn,
 )
 
-_cache: OrderedDict[
-    tuple[InferFn[Any], NodeNG, InferenceContext | None], list[InferenceResult]
-] = OrderedDict()
-
 _CURRENTLY_INFERRING: set[tuple[InferFn[Any], NodeNG]] = set()
 
 _NodesT = TypeVar("_NodesT", bound=NodeNG)
@@ -30,7 +26,14 @@ _NodesT = TypeVar("_NodesT", bound=NodeNG)
 
 def clear_inference_tip_cache() -> None:
     """Clear the inference tips cache."""
-    _cache.clear()
+    run_func.cache_clear()
+
+
+@lru_cache
+def run_func(func, node, context):
+    _CURRENTLY_INFERRING.add((func, node, context))
+    # Omit **kwargs! 😞
+    return list(func(node, context))
 
 
 def _inference_tip_cached(func: InferFn[_NodesT]) -> InferFn[_NodesT]:
@@ -51,32 +54,13 @@ def _inference_tip_cached(func: InferFn[_NodesT]) -> InferFn[_NodesT]:
             # Fresh, empty contexts will defeat the cache.
             context = None
         try:
-            yield from _cache[func, node, context]
-            return
-        except KeyError:
-            # Recursion guard with a partial cache key.
-            # Using the full key causes a recursion error on PyPy.
-            # It's a pragmatic compromise to avoid so much recursive inference
-            # with slightly different contexts while still passing the simple
-            # test cases included with this commit.
-            _CURRENTLY_INFERRING.add(partial_cache_key)
+            yield from run_func(func, node, context)
+        finally:
+            # Remove recursion guard.
             try:
-                # May raise UseInferenceDefault
-                result = _cache[func, node, context] = list(
-                    func(node, context, **kwargs)
-                )
-            finally:
-                # Remove recursion guard.
-                try:
-                    _CURRENTLY_INFERRING.remove(partial_cache_key)
-                except KeyError:
-                    pass  # Recursion may beat us to the punch.
-
-                if len(_cache) > 64:
-                    _cache.popitem(last=False)
-
-        # https://github.com/pylint-dev/pylint/issues/8686
-        yield from result  # pylint: disable=used-before-assignment
+                _CURRENTLY_INFERRING.remove(partial_cache_key)
+            except KeyError:
+                pass  # Recursion may beat us to the punch.
 
     return inner

I guess it's a flaw in the status quo that we're leaving out **kwargs from the full cache key, but I think the next step should really be to replace this cache altogether with a reference to the last inference tip (like a cache with maxsize of 1) after doing some more serious benchmarking.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for the explanation. Having arbitrary kwargs is also a problem imo, albeit not easy to solve.

@jacobtylerwalls
Copy link
Member Author

jacobtylerwalls commented May 15, 2023

Thanks for the review!

Pulling out from above since it was hiding behind the diff:

I think the next step should really be to replace this cache altogether with a reference to the last inference tip (like a cache with maxsize of 1) after doing some more serious benchmarking.

But I think we should merge this now so we can unblock pylint alphas. I'll open a follow-up issue (#2183)

@jacobtylerwalls jacobtylerwalls merged commit 1f9ba55 into main May 15, 2023
18 checks passed
@jacobtylerwalls jacobtylerwalls deleted the performance-regression branch May 15, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inference tip cache is unbounded
3 participants