-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop purging modules which are garbage collected before shutdown #62414
Comments
Currently when a module is garbage collected its dict is purged by replacing all values except __builtins__ by None. This helps clear things at shutdown. But this can cause problems if it occurs *before* shutdown: if we use a function defined in a module which has been garbage collected, then that function must not depend on any globals, because they will have been purged. Usually this problem only occurs with programs which manipulate sys.modules. For example when setuptools and nose run tests they like to reset sys.modules each time. See for example http://bugs.python.org/issue15881 See also http://bugs.python.org/issue16718 The trivial patch attached prevents the purging behaviour for modules gc'ed before shutdown begins. Usually garbage collection will end up clearing the module's dict anyway. I checked the count of refs and blocks reported on exit when running a trivial program and a full regrtest (which will cause quite a bit of sys.modules manipulation). The difference caused by the patch is minimal. Without patch: With patch: |
This is not true, since global objects might have a __del__ and then hold the whole module dict alive through a reference cycle. Happily though, PEP-442 is going to make that concern obsolete. As for the interpreter shutdown itself, I have a pending patch (post-PEP 442) to get rid of the globals cleanup as well. It may be better to merge the two approaches. |
On 15/06/2013 7:11pm, Antoine Pitrou wrote:
I did say "usually".
So you would just depend on garbage collection? Do you know how many BTW, I had a more complicated patch which keeps track of module dicts |
No, I also clean up those modules that are left alive after a garbage |
Now that PEP-442 is committed, here is the patch. |
Slightly better patch. Also, as I pointed out in python-dev (http://mail.python.org/pipermail/python-dev/2013-July/127673.html), this is still imperfect due to various ways modules can be kept alive from long-lived C variables. |
Updated patch has tests and also removes several cleanup hacks. |
Updated patch with a hack in Lib/site to unpatch builtins early at shutdown. |
New changeset 79e2f5bbc30c by Antoine Pitrou in branch 'default': |
Let's wait for the buildbots on this one too. |
I played a bit with the patch and -v -Xshowrefcount. The number of references and blocks left at exit varies (and is higher than for unpatched python). It appears that a few (1-3) module dicts are not being purged because they have been "orphaned". (i.e. the module object was garbaged collected before we check the weakref, but the module dict survived.) Presumably it is the hash randomization causing the randomness. Maybe 8 out of 50+ module dicts actually die a natural death by being garbage collected before they are purged. Try
|
Module globals can be kept alive by any function defined in that module. So if that function is registered eternally in a C static variable, the globals dict will never get collected.
I always get either: # remaining {'encodings', '__main__'} or # remaining {'__main__', 'encodings'} ... which seems to hint that it is quite stable actually.
I get different numbers from you. If I run "./python -v -c pass", most modules in the "wiping" phase are C extension modules, which is expected. Pretty much every pure Python module ends up garbage collected before that. By the way, please also try bpo-18608 which will bring an other improvement. |
Actually, it's not surprising. Blob's methods hold a reference to the __main__ globals, and there's still a Blob object alive in encodings. If you replace the end of your script with the following: for name, mod in sys.modules.items():
if name != 'encodings':
mod.__dict__["__blob__"] = Blob(name)
del name, mod, Blob then at the end of the shutdown phase, remaining is empty. |
On 01/08/2013 10:59am, Antoine Pitrou wrote:
On Windows, even with this change, I get for example: # remaining {'encodings.mbcs', '__main__', 'encodings.cp1252'} or # remaining {'__main__', 'encodings'} |
You might want to open a prompt and look at gc.get_referrers() for encodings.mbcs.__dict__ (or another of those modules). |
>>> gc.get_referrers(sys.modules['encodings.mbcs'].__dict__)
[<module 'encodings.mbcs' from 'C:\\Repos\\cpython-dirty\\lib\\encodings\\mbcs.py'>, <function decode at 0x01DEEF38>, <function getregentry at 0x01DFA038>, <function IncrementalEncoder.encode at 0x01DFA098>]
>>> gc.get_referrers(sys.modules['encodings.cp1252'].__dict__)
[<module 'encodings.cp1252' from 'C:\\Repos\\cpython-dirty\\lib\\encodings\\cp1252.py'>, <function getregentry at 0x02802578>, <function Codec.encode at 0x02802518>, <function Codec.decode at 0x028025D8>, <function IncrementalEncoder.encode at 0x02802638>, <function IncrementalDecoder.decode at 0x02802698>]
>>> gc.get_referrers(sys.modules['__main__'].__dict__)
[<function Blob.__init__ at 0x0057ABD8>, <function Blob.__del__ at 0x02AD36F8>,
<frame object at 0x027DFA80>, <function <listcomp> at 0x02AD3DB8>, <frame object at 0x02A38038>, <module '__main__' (<_frozen_importlib.SourceFileLoader object
at 0x0271EAB8>)>] |
The *module* gets gc'ed, sure. But you can't tell from "./python -v -c pass" when the *module dict* get gc'ed. Using "./python -v check_purging.py", before the purging stage (# cleanup [3]) I only get # purge/gc operator 54 That leaves lots of pure python module dicts to be purged later on. |
Here (Linux) I get the following: # purge/gc os.path 12 Also, do note that purge/gc after wiping can still be a regular gc pass unless the module has been wiped. The gc could be triggered by another module being wiped. |
For me, the modules which die naturally after purging begins are # purge/gc encodings.aliases 34 Of these, all but the first appear to happen during the final cyclic |
That said, I welcome any suggestions to improve things. The ultimate Do you agree that this patch is ok and we should address those two |
Yes, I agree the patch is ok. It would be would be much simpler to keep track of the module dicts if |
Ok, let's attack the rest separately then :) |
By the way, you may be interested to learn that the patch in bpo-10241 has made things quite a bit better now: C extension modules can be collected much earlier. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: