Add a bound to the inference tips cache (and fix regressions from #2158) #2181

jacobtylerwalls · 2023-05-15T01:44:33Z

Type of Changes

	Type
✓	🐛 Bug fix

Overview

Fix some performance regressions from #2158 (1 and 2 below), and then improve on the status quo by adding a bound to the inference tips cache (3).

Closes #2180

Description

1
Not every exit path was removing the recursion guard, so this "cache" meant to always return to empty after recursion concluded was not doing so: 6bdd92d.

2
Add an isEmpty() method on InferenceContext so that None can be used as the portion of the cache key for any empty contexts instead of fresh, distinct InferenceContext instances: c807c03.

3
Add a bound to the inference tips cache: 4757af2.

Benchmark for (3.)

Benchmark: pylint home-assistant/core/homeassistant/helpers
test code:

diff --git a/astroid/inference_tip.py b/astroid/inference_tip.py
index c0e121c5..7a484bf6 100644
--- a/astroid/inference_tip.py
+++ b/astroid/inference_tip.py
@@ -19,6 +19,8 @@ from astroid.typing import (
     TransformFn,
 )
 
+hits = 0
+misses = 0
 _cache: OrderedDict[
     tuple[InferFn[Any], NodeNG, InferenceContext | None], list[InferenceResult]
 ] = OrderedDict()
@@ -52,6 +54,9 @@ def _inference_tip_cached(func: InferFn[_NodesT]) -> InferFn[_NodesT]:
             context = None
         try:
             yield from _cache[func, node, context]
+            global hits
+            hits += 1
+            print("HITS: ", hits)
             return
         except KeyError:
             # Recursion guard with a partial cache key.
@@ -65,6 +70,9 @@ def _inference_tip_cached(func: InferFn[_NodesT]) -> InferFn[_NodesT]:
                 result = _cache[func, node, context] = list(
                     func(node, context, **kwargs)
                 )
+                global misses
+                misses += 1
+                print("MISSES: ", misses)
             finally:
                 # Remove recursion guard.
                 _CURRENTLY_INFERRING.remove(partial_cache_key)

maxsize	hits	misses
2	5279	4452
64	5946	3533
512	5979	3480

Those are some pretty impressive results for a maxsize of 2! 😱 (I saw the same with several projects.)

For now, I decided to stick with 64.

Future work

In a subsequent PR we could exploring going all the way down to a cache size of 1 (🙀), and then be able to throw away the ugly code we have here for managing dictionary entries, and just store last_inference_tip.

Follow-up to 0740a0d.

Small bounds still yield about equal hits and misses. Further work could determine if storing only the last result is optimal.

codecov · 2023-05-15T02:02:00Z

Codecov Report

Merging #2181 (5a33300) into main (835de84) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2181   +/-   ##
=======================================
  Coverage   92.60%   92.60%           
=======================================
  Files          94       94           
  Lines       10798    10810   +12     
=======================================
+ Hits         9999    10011   +12     
  Misses        799      799

Flag	Coverage Δ
linux	`92.36% <100.00%> (+<0.01%)`	⬆️
pypy	`87.56% <100.00%> (+0.01%)`	⬆️
windows	`92.20% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
astroid/context.py	`97.46% <100.00%> (+0.06%)`	⬆️
astroid/inference_tip.py	`97.72% <100.00%> (+0.66%)`	⬆️

jacobtylerwalls · 2023-05-15T02:49:46Z

Given the transient failure I just saw and re-ran to pass, we might want to restore the skip from c1e4c95 (that's still described in the comment).

Pierre-Sassoulas

Look good ! I missed thé reason why we can't use lru cache after extracting a function ?

DanielNoord · 2023-05-15T06:10:20Z

astroid/inference_tip.py

@@ -18,9 +19,9 @@
    TransformFn,
 )

-_cache: dict[
+_cache: OrderedDict[


We are on 3.7+ right? Dicts are ordered by default?

>>> d = {} >>> d.popitem(last=False) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: dict.popitem() takes no keyword arguments

The one thing OrderedDict is still good at :-)

jacobtylerwalls · 2023-05-15T12:28:48Z

Look good ! I missed thé reason why we can't use lru cache after extracting a function ?

Arbitrary **kwargs cannot be hashed in the cache key. (See comment). So this only works if I omit them:

diff --git a/astroid/inference_tip.py b/astroid/inference_tip.py
index 9eda5b4f..6829d91b 100644
--- a/astroid/inference_tip.py
+++ b/astroid/inference_tip.py
@@ -6,8 +6,8 @@
 
 from __future__ import annotations
 
-from collections import OrderedDict
 from collections.abc import Generator
+from functools import lru_cache
 from typing import Any, TypeVar
 
 from astroid.context import InferenceContext
@@ -19,10 +19,6 @@ from astroid.typing import (
     TransformFn,
 )
 
-_cache: OrderedDict[
-    tuple[InferFn[Any], NodeNG, InferenceContext | None], list[InferenceResult]
-] = OrderedDict()
-
 _CURRENTLY_INFERRING: set[tuple[InferFn[Any], NodeNG]] = set()
 
 _NodesT = TypeVar("_NodesT", bound=NodeNG)
@@ -30,7 +26,14 @@ _NodesT = TypeVar("_NodesT", bound=NodeNG)
 
 def clear_inference_tip_cache() -> None:
     """Clear the inference tips cache."""
-    _cache.clear()
+    run_func.cache_clear()
+
+
+@lru_cache
+def run_func(func, node, context):
+    _CURRENTLY_INFERRING.add((func, node, context))
+    # Omit **kwargs! 😞
+    return list(func(node, context))
 
 
 def _inference_tip_cached(func: InferFn[_NodesT]) -> InferFn[_NodesT]:
@@ -51,32 +54,13 @@ def _inference_tip_cached(func: InferFn[_NodesT]) -> InferFn[_NodesT]:
             # Fresh, empty contexts will defeat the cache.
             context = None
         try:
-            yield from _cache[func, node, context]
-            return
-        except KeyError:
-            # Recursion guard with a partial cache key.
-            # Using the full key causes a recursion error on PyPy.
-            # It's a pragmatic compromise to avoid so much recursive inference
-            # with slightly different contexts while still passing the simple
-            # test cases included with this commit.
-            _CURRENTLY_INFERRING.add(partial_cache_key)
+            yield from run_func(func, node, context)
+        finally:
+            # Remove recursion guard.
             try:
-                # May raise UseInferenceDefault
-                result = _cache[func, node, context] = list(
-                    func(node, context, **kwargs)
-                )
-            finally:
-                # Remove recursion guard.
-                try:
-                    _CURRENTLY_INFERRING.remove(partial_cache_key)
-                except KeyError:
-                    pass  # Recursion may beat us to the punch.
-
-                if len(_cache) > 64:
-                    _cache.popitem(last=False)
-
-        # https://github.com/pylint-dev/pylint/issues/8686
-        yield from result  # pylint: disable=used-before-assignment
+                _CURRENTLY_INFERRING.remove(partial_cache_key)
+            except KeyError:
+                pass  # Recursion may beat us to the punch.
 
     return inner

I guess it's a flaw in the status quo that we're leaving out **kwargs from the full cache key, but I think the next step should really be to replace this cache altogether with a reference to the last inference tip (like a cache with maxsize of 1) after doing some more serious benchmarking.

Reapplied from c1e4c95.

Pierre-Sassoulas

Ok thanks for the explanation. Having arbitrary kwargs is also a problem imo, albeit not easy to solve.

jacobtylerwalls · 2023-05-15T13:44:51Z

Thanks for the review!

Pulling out from above since it was hiding behind the diff:

I think the next step should really be to replace this cache altogether with a reference to the last inference tip (like a cache with maxsize of 1) after doing some more serious benchmarking.

But I think we should merge this now so we can unblock pylint alphas. I'll open a follow-up issue (#2183)

jacobtylerwalls added topic-performance Unreleased labels May 15, 2023

jacobtylerwalls added this to the 3.0.0a4 milestone May 15, 2023

jacobtylerwalls added 3 commits May 14, 2023 21:53

Remove cache entries in all exit paths

6bdd92d

Follow-up to 0740a0d.

Add InferenceContext.is_empty()

c807c03

Add a bound to the inference tips cache

4757af2

Small bounds still yield about equal hits and misses. Further work could determine if storing only the last result is optimal.

jacobtylerwalls force-pushed the performance-regression branch from dfd0563 to 4757af2 Compare May 15, 2023 01:53

Add disable

5922183

Pierre-Sassoulas reviewed May 15, 2023

View reviewed changes

DanielNoord reviewed May 15, 2023

View reviewed changes

Skip recursion test on PyPy

5a33300

Reapplied from c1e4c95.

jacobtylerwalls force-pushed the performance-regression branch from f857127 to 5a33300 Compare May 15, 2023 12:32

jacobtylerwalls mentioned this pull request May 15, 2023

Upgrade astroid to 3.0.0a4 pylint-dev/pylint#8685

Merged

Pierre-Sassoulas approved these changes May 15, 2023

View reviewed changes

jacobtylerwalls mentioned this pull request May 15, 2023

Replace inference tip cache with a single reference to the last inference tip result #2183

Closed

jacobtylerwalls merged commit 1f9ba55 into main May 15, 2023
18 checks passed

jacobtylerwalls deleted the performance-regression branch May 15, 2023 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a bound to the inference tips cache (and fix regressions from #2158) #2181

Add a bound to the inference tips cache (and fix regressions from #2158) #2181

jacobtylerwalls commented May 15, 2023 •

edited

Loading

codecov bot commented May 15, 2023 •

edited

Loading

jacobtylerwalls commented May 15, 2023

Pierre-Sassoulas left a comment

DanielNoord May 15, 2023

jacobtylerwalls May 15, 2023

jacobtylerwalls May 15, 2023

DanielNoord May 15, 2023

jacobtylerwalls commented May 15, 2023 •

edited

Loading

Pierre-Sassoulas left a comment

jacobtylerwalls commented May 15, 2023 •

edited

Loading

Add a bound to the inference tips cache (and fix regressions from #2158) #2181

Add a bound to the inference tips cache (and fix regressions from #2158) #2181

Conversation

jacobtylerwalls commented May 15, 2023 • edited Loading

Type of Changes

Overview

Description

Benchmark for (3.)

Future work

codecov bot commented May 15, 2023 • edited Loading

Codecov Report

jacobtylerwalls commented May 15, 2023

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

DanielNoord May 15, 2023

Choose a reason for hiding this comment

jacobtylerwalls May 15, 2023

Choose a reason for hiding this comment

jacobtylerwalls May 15, 2023

Choose a reason for hiding this comment

DanielNoord May 15, 2023

Choose a reason for hiding this comment

jacobtylerwalls commented May 15, 2023 • edited Loading

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

jacobtylerwalls commented May 15, 2023 • edited Loading

jacobtylerwalls commented May 15, 2023 •

edited

Loading

codecov bot commented May 15, 2023 •

edited

Loading

jacobtylerwalls commented May 15, 2023 •

edited

Loading

jacobtylerwalls commented May 15, 2023 •

edited

Loading