Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With #116680, test_importlib started leaking on refleaks #116731

Closed
jaraco opened this issue Mar 13, 2024 · 15 comments
Closed

With #116680, test_importlib started leaking on refleaks #116731

jaraco opened this issue Mar 13, 2024 · 15 comments
Assignees
Labels
topic-importlib type-bug An unexpected behavior, bug, or error

Comments

@jaraco
Copy link
Member

jaraco commented Mar 13, 2024

Ironically, since this was merged, test_importlib started leaking on refleaks buildbots.

Originally posted by @encukou in #116680 (comment)

Linked PRs

@jaraco jaraco self-assigned this Mar 13, 2024
@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

Example of leak:

$ ./python -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 1080788092
0:00:00 load avg: 1.15 Run 1 test sequentially
0:00:00 load avg: 1.15 [1/1] test_importlib
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XXX XXX
test_importlib leaked [116, 116, 116] references, sum=348
test_importlib leaked [78, 78, 78] memory blocks, sum=234
test_importlib failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
    test_importlib

Total duration: 2.0 sec
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered) failed=1
Result: FAILURE

Originally posted by @vstinner in #116680 (comment)

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

That's interesting - the ref leak failure is in a importlib.metadata test, whereas the PR only made changes to the importlib.resources tests. My guess right now is that the importlib.resources module leak was masking the importlib.metadata ref leaks. If that's the case, it should be possible to replicate the failure prior to the PR merge.

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

Indeed, the ref leak existed prior to the PR:

 cpython main @ git switch --detach a2548077614f81f25a2c3465dabb7a0a3885c40c~1
HEAD is now at bb66600558 CI: Process stale issues twice per day (#116636)
 cpython bb66600558 @ ./python.exe -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 1648965432
0:00:00 load avg: 2.87 Run 1 test sequentially
0:00:00 load avg: 2.87 [1/1] test_importlib
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XXX XXX
test_importlib leaked [116, 116, 116] references, sum=348
test_importlib leaked [78, 78, 78] memory blocks, sum=234
test_importlib failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
    test_importlib

Total duration: 449 ms
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered) failed=1
Result: FAILURE

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

Disabling the logic under test, the leaks go away, so it appears the leak is in the call to requires().

 cpython main @ git diff
diff --git a/Lib/test/test_importlib/test_metadata_api.py b/Lib/test/test_importlib/test_metadata_api.py
index 33c6e85ee9..8e388b89cd 100644
--- a/Lib/test/test_importlib/test_metadata_api.py
+++ b/Lib/test/test_importlib/test_metadata_api.py
@@ -219,7 +219,7 @@ def test_requires_egg_info_empty(self):
             },
             self.site_dir.joinpath('egginfo_pkg.egg-info'),
         )
-        deps = requires('egginfo-pkg')
+        deps = []
         assert deps == []
 
     def test_requires_dist_info(self):
 cpython main @ ./python.exe -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 3436765869
0:00:00 load avg: 1.12 Run 1 test sequentially
0:00:00 load avg: 1.12 [1/1] test_importlib
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XX. ...

== Tests result: SUCCESS ==

1 test OK.

Total duration: 439 ms
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered)
Result: SUCCESS

@sobolevn sobolevn added type-bug An unexpected behavior, bug, or error topic-importlib labels Mar 13, 2024
@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

The leaks don't emerge when calling Distribution.discover() but do emerge when assigning the result to a variable:

 cpython main @ git diff
diff --git a/Lib/test/test_importlib/test_metadata_api.py b/Lib/test/test_importlib/test_metadata_api.py
index 33c6e85ee9..643f17726e 100644
--- a/Lib/test/test_importlib/test_metadata_api.py
+++ b/Lib/test/test_importlib/test_metadata_api.py
@@ -219,8 +219,8 @@ def test_requires_egg_info_empty(self):
             },
             self.site_dir.joinpath('egginfo_pkg.egg-info'),
         )
-        deps = requires('egginfo-pkg')
-        assert deps == []
+        Distribution.discover(name='egginfo-pkg')
+        dist, = Distribution.discover(name='egginfo-pkg')
 
     def test_requires_dist_info(self):
         deps = requires('distinfo-pkg')

So it's the evaluating of the generator returned by Distribution.discover that's triggering the leaks.

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

Digging deeper, I can trigger the leak with:

        from importlib.metadata import DistributionFinder
        context = DistributionFinder.Context(name='egginfo-pkg')
        path_resolver, = Distribution._discover_resolvers()
        tuple(path_resolver(context))

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

Just to confirm, this bug exists at least as far back as Python 3.11, and likely has been around for even longer.

 cpython 3.11 @ ./python.exe -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 1497279497
0:00:00 load avg: 9.25 Run 1 test sequentially
0:00:00 load avg: 9.25 [1/1] test_importlib
beginning 6 repetitions
123456
......
test_importlib leaked [150, 150, 150] references, sum=450
test_importlib leaked [56, 56, 56] memory blocks, sum=168
test_importlib failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
    test_importlib

Total duration: 609 ms
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered) failed=1
Result: FAILURE

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

The issue goes away when removing the lru_cache on FastPath:

diff --git a/Lib/importlib/metadata/__init__.py b/Lib/importlib/metadata/__init__.py
index c612fbefee..7d3b8a7733 100644
--- a/Lib/importlib/metadata/__init__.py
+++ b/Lib/importlib/metadata/__init__.py
@@ -665,7 +665,6 @@ class FastPath:
     ['...']
     """
 
-    @functools.lru_cache()  # type: ignore
     def __new__(cls, root):
         return super().__new__(cls)

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

It seems that because each iteration of the test creates a new temporary site dir and because those are cached in FastPath(), that's creating the reference leaks.

 cpython main @ git diff
diff --git a/Lib/importlib/metadata/__init__.py b/Lib/importlib/metadata/__init__.py
index c612fbefee..13c42e6572 100644
--- a/Lib/importlib/metadata/__init__.py
+++ b/Lib/importlib/metadata/__init__.py
@@ -665,8 +665,8 @@ class FastPath:
     ['...']
     """
 
-    @functools.lru_cache()  # type: ignore
     def __new__(cls, root):
+        print(cls, root)
         return super().__new__(cls)
 
     def __init__(self, root):
 cpython main @ ./python.exe -m test test_importlib -R 3:3 -m test_requires_egg_info_empty
Using random seed: 595199935
0:00:00 load avg: 1.56 Run 1 test sequentially
0:00:00 load avg: 1.56 [1/1] test_importlib
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmpouk96igu
X<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmp_mkqfkjq
X<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmpfybimqfz
.<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmps1cnufs4
 .<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmp2qgor2va
.<class 'importlib.metadata.FastPath'> /var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/tmplqwy62rr
.

== Tests result: SUCCESS ==

1 test OK.

Total duration: 458 ms
Total tests: run=1 (filtered)
Total test files: run=1/1 (filtered)
Result: SUCCESS

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

This patch also works around the ref leak, at least for tests in APITests:

 cpython main @ git diff
diff --git a/Lib/test/test_importlib/test_metadata_api.py b/Lib/test/test_importlib/test_metadata_api.py
index 33c6e85ee9..b30240e291 100644
--- a/Lib/test/test_importlib/test_metadata_api.py
+++ b/Lib/test/test_importlib/test_metadata_api.py
@@ -8,6 +8,7 @@
 from . import fixtures
 from importlib.metadata import (
     Distribution,
+    FastPath,
     PackageNotFoundError,
     distribution,
     entry_points,
@@ -37,6 +38,9 @@ class APITests(
 ):
     version_pattern = r'\d+\.\d+(\.\d)?'
 
+    def tearDown(self):
+        FastPath.__new__.cache_clear()
+
     def test_retrieves_version_of_self(self):
         pkg_version = version('egginfo-pkg')
         assert isinstance(pkg_version, str)

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

These ref leaks are by design - this cache is there to improve performance.

I'm not quite sure how to handle this situation robustly. I know there's the importlib.invalidate_caches and it's even used and tested in the MetaPathFinder... but that MetadataPathFinder is only installed to the sys.meta_path on the importlib_metadata backport. In CPython, it's called explicitly, so no invalidate_caches method will be discovered.

It does look like the following patch also addresses the issue:

 cpython main @ git diff
diff --git a/Lib/importlib/_bootstrap_external.py b/Lib/importlib/_bootstrap_external.py
index 46ddceed07..b26be8583d 100644
--- a/Lib/importlib/_bootstrap_external.py
+++ b/Lib/importlib/_bootstrap_external.py
@@ -1470,6 +1470,9 @@ def invalidate_caches():
         # https://bugs.python.org/issue45703
         _NamespacePath._epoch += 1
 
+        from importlib.metadata import MetadataPathFinder
+        MetadataPathFinder.invalidate_caches()
+
     @staticmethod
     def _path_hooks(path):
         """Search sys.path_hooks for a finder for 'path'."""
diff --git a/Lib/importlib/metadata/__init__.py b/Lib/importlib/metadata/__init__.py
index c612fbefee..41c2a4a608 100644
--- a/Lib/importlib/metadata/__init__.py
+++ b/Lib/importlib/metadata/__init__.py
@@ -797,6 +797,7 @@ def _search_paths(cls, name, paths):
             path.search(prepared) for path in map(FastPath, paths)
         )
 
+    @classmethod
     def invalidate_caches(cls) -> None:
         FastPath.__new__.cache_clear()
 
diff --git a/Lib/test/test_importlib/test_metadata_api.py b/Lib/test/test_importlib/test_metadata_api.py
index 33c6e85ee9..e2936303ac 100644
--- a/Lib/test/test_importlib/test_metadata_api.py
+++ b/Lib/test/test_importlib/test_metadata_api.py
@@ -37,6 +37,9 @@ class APITests(
 ):
     version_pattern = r'\d+\.\d+(\.\d)?'
 
+    def tearDown(self):
+        importlib.invalidate_caches()
+
     def test_retrieves_version_of_self(self):
         pkg_version = version('egginfo-pkg')
         assert isinstance(pkg_version, str)

That patch feels like it's closer to the right thing to do. It first exposes the cache invalidation in the PathFinder, so the caches will be invalidated as a matter of course. That seems worthwhile no matter what.

I'm less confident about using the tearDown on the APITests, because I'm far from confident that's the right place to have this hook. It may be too broad but may also be too narrow. I guess I'll run the whole refleaks test and see what else is failing and make sure the cache invalidation happens for any implicated test.

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

There's another ref leak in test_implicit_files, which can be suppressed by disabling the behavior:

diff --git a/Lib/test/test_importlib/resources/test_files.py b/Lib/test/test_importlib/resources/test_files.py
index 26c8b04e44..90834fcab2 100644
--- a/Lib/test/test_importlib/resources/test_files.py
+++ b/Lib/test/test_importlib/resources/test_files.py
@@ -99,7 +99,7 @@ def test_implicit_files(self):
                 '__init__.py': textwrap.dedent(
                     """
                     import importlib.resources as res
-                    val = res.files().joinpath('res.txt').read_text(encoding='utf-8')
+                    val = 'resources are the best'
                     """
                 ),
                 'res.txt': 'resources are the best',

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

Distilling that issue, it looks like it boils down to calling inspect.stack() inside of an imported module:

diff --git a/Lib/test/test_importlib/resources/test_files.py b/Lib/test/test_importlib/resources/test_files.py
index 26c8b04e44..335f0b6718 100644
--- a/Lib/test/test_importlib/resources/test_files.py
+++ b/Lib/test/test_importlib/resources/test_files.py
@@ -95,18 +95,15 @@ def test_implicit_files(self):
         Without any parameter, files() will infer the location as the caller.
         """
         spec = {
-            'somepkg': {
-                '__init__.py': textwrap.dedent(
-                    """
-                    import importlib.resources as res
-                    val = res.files().joinpath('res.txt').read_text(encoding='utf-8')
-                    """
-                ),
-                'res.txt': 'resources are the best',
-            },
+            'somepkg.py': textwrap.dedent(
+                """
+                import inspect
+                inspect.stack()
+                """
+            ),
         }
         _path.build(spec, self.site_dir)
-        assert importlib.import_module('somepkg').val == 'resources are the best'
+        importlib.import_module('somepkg')
 
 
 if __name__ == '__main__':

@jaraco
Copy link
Member Author

jaraco commented Mar 13, 2024

In c6d794f, I devised a workaround, clearing some global state after running the test. This change and the other allows the refleaks test to pass for test_importlib. None of these refleaks are particularly meaningful. They're pointing out that the test suite consumes unique resources on each run.

@encukou
Copy link
Member

encukou commented Mar 14, 2024

Thank you for hunting this down!

There's a place in regrtest for clearing caches. I'd add it there, and be very narrow -- when calling all of invalidate_caches it'd be easy to miss a cache that could grow without bounds, but shouldn't.
I filed #116804 for the inspect ones. IMO those should be bounded. But that's for later, today let's clear the caches.

encukou added a commit that referenced this issue Mar 14, 2024
…clear_caches (GH-116805)

Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Mar 14, 2024
…es in clear_caches (pythonGH-116805)

(cherry picked from commit bae6579)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
encukou added a commit that referenced this issue Mar 14, 2024
…hes in clear_caches (GH-116805) (GH-116820)

gh-116731: libregrtest: Clear inspect & importlib.metadata caches in clear_caches (GH-116805)
(cherry picked from commit bae6579)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
@encukou encukou closed this as completed Mar 18, 2024
vstinner pushed a commit to vstinner/cpython that referenced this issue Mar 20, 2024
…es in clear_caches (pythonGH-116805)

Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…es in clear_caches (pythonGH-116805)

Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…es in clear_caches (pythonGH-116805)

Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-importlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants