Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't cache exception values in the module cache #837

Closed
wants to merge 1 commit into from

Conversation

rtpg
Copy link
Contributor

@rtpg rtpg commented Aug 30, 2020

Steps

  • For new features or bug fixes, add a ChangeLog entry describing what your PR does.
  • Write a good description on what the PR does.

Description

Exceptions hold tracebacks, which can get very large (especially for highly recursive functions). Especially if the exceptions are being used for control flow instead of actual exception management, by throwing them away as fast as possible we can avoid generating a lot of memory pressure (though at a cost of losting information)

This commit decides to just store a simple Error object for AstroidImportErrors, instead of the full exception. While this reduces visibility in stuff and is a bit hacky, on a "real world" codebase this reduced memory usage by ~80% on a run over just 100 files.

I haven't written a changelog entry yet because I would like to know if this is "mostly OK". If it looks good I'll try adding more documentation to the code itself + a changelog entry.

A picture is worth 1000 words, so here are two:

image

^ Basically "look at all the objects stored on an AstroidImportError! Do we need all these tracebacks for the lifetime of the program?

image

^ Left: memory usage on 100 files before the patch. Right: memory usage on 100 files after the patch.

For more detail see : https://rtpg.co/2020/10/12/pylint-usage.html

Type of Changes

Type
🔨 Refactoring

@rtpg rtpg force-pushed the no-exception-storage branch 3 times, most recently from 426c922 to 910809e Compare August 30, 2020 14:46
Exceptions hold tracebacks, which can get very large (especially for
highly recursive functions). Especially if the exceptions are being used
for control flow instead of actual exception management, we should be
throwing them away as fast as possible.

This commit deals with this with a bit of a hacky solution, but really
the way the module cache handles things ought to be reworked a bit.
@pstch
Copy link

pstch commented Oct 13, 2020

I think that using with_traceback(None) on the Exception would be a better solution, and would involve less changes. Also, in the current version of this PR, there are two conflicting definitions of AstroidImportError.

@char101
Copy link

char101 commented Oct 14, 2020

The changes as per @pstch comments above

diff --git a/astroid/manager.py b/astroid/manager.py
index 82208ad..4f88798 100644
--- a/astroid/manager.py
+++ b/astroid/manager.py
@@ -236,11 +236,11 @@ class AstroidManager:
                 value = exceptions.AstroidImportError(
                     "Failed to import module {modname} with error:\n{error}.",
                     modname=modname,
-                    error=ex,
+                    error=ex.with_traceback(None),
                 )
             self._mod_file_cache[(modname, contextfile)] = value
         if isinstance(value, exceptions.AstroidBuildingError):
-            raise value
+            raise value.with_traceback(None)
         return value

     def ast_from_module(self, module, modname=None):

@Pierre-Sassoulas
Copy link
Member

Thank you for the PR, and also for the interesting article about the step you took to analyse the problem (I edited your issue with it). I don't know if we use this stack-trace in pylint, I'll try to find the time to check the pylint's test suite against your branch and @pstch proposed change this week.

@rtpg
Copy link
Contributor Author

rtpg commented Oct 21, 2020

@Pierre-Sassoulas I haven'th ad time to pick this up (I'll have time next week), I believe that @pstch's patch would be the "correct" way to tackle this issue overall, although there might be a bit more of a memory cost. If next week you haven't gotten around to this I'll try the alternative and measure the memory effects as well to see what the result is.

@rtpg
Copy link
Contributor Author

rtpg commented Oct 26, 2020

I have made an alternate PR (#847) that uses the with_traceback(None) solution. I think it's better overall, despite using a bit more memory

@Pierre-Sassoulas
Copy link
Member

Hello, thank you for making the alternate branch, I also think the simpler solution is better. I opened a pylint MR (pylint-dev/pylint#3922) in order to see the effect on pylint's test for all envs and everything is still passing which means we don't use those informations in pylint. And pylint is the main "client" for astroid so I think this is mergeable. There is no noticeable effects on execution time, but the memory usage is lower as expected.

@Pierre-Sassoulas Pierre-Sassoulas self-assigned this Oct 27, 2020
@Pierre-Sassoulas
Copy link
Member

Closing in favor of #847

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants