-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
[C API] Heap types (PyType_FromSpec) must fully implement the GC protocol #87138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Copy of my email sent to python-dev: Hi, In the Python stdlib, many heap types currently don't "properly" There is an on-going effect to destroy all Python objects at exit It's an hard problem and I don't see any simple/obvious solution right == Only the Python stdlib should be affected == PyType_FromSpec() was added to Python 3.2 by the PEP-384 to define To be clear, static types are not affected by this email. Third party extension modules using the limited C API (to use the == Heap type instances now stores a strong reference to their type == In March 2019, the PyObject_Init() function was modified in bpo-35810 == GC and heap types == The new problem is that most heap types don't collaborate well with A heap type must respect the following 3 conditions to collaborate with the GC:
If one of these conditions is not met, the GC can fail to destroy a In practice, when a type is not deleted, a test using subinterpreter == Boring traverse functions == Currently, there is no default traverse implementation which visits the type. For example, I had the implement the following function for _thread.LockType: static int
lock_traverse(lockobject self, visitproc visit, void arg)
{
Py_VISIT(Py_TYPE(self));
return 0;
} It's a little bit annoying to have to implement the GC protocol There is exactly one strong reference: to the type. == Workaround: loop on gc.collect() == A workaround is to run gc.collect() in a loop until it returns 0 (no == Traverse automatically? Nope. == Pablo Galindo attempts to automatically visit the type in the traverse function: https://bugs.python.org/issue40217 Moreover, What's New in Python 3.9 contains a long section suggesting This solution causes too many troubles, and so instead, traverse Currently in the master branch, 89 types are defined as heap types on == How should we address this issue? == I'm not sure what should be done. Working around the issue by If core developers miss these bugs and have troubles to debug them, I == GC+heap type bugs became common == I'm fixing such GC issue for 1 year as part as the work on cleaning Today, I learnt the hard way that defining a traverse is not enough. 11ef53a... == Reference cycles are common == The GC only serves to break reference cycles. But reference cycles are First of all, most types create reference cycles involing themselves. => The GC must break the cycle, otherwise the type cannot be destroyed When a function is defined in a Python module, the function If a function is used as a callback somewhere, the whole module => The GC == Debug GC issues ==
== __del__ hack for debugging == If you want to play with the issue or if you have to debug a GC issue, class VerboseDel:
def __del__(self):
print("DELETE OBJECT")
obj = VerboseDel() Warning: creating such object in a module also prevents to destroy the == Long email == Yeah, I like to put titles in my long emails. Enjoy. Happy hacking! -- |
In June 2020, I create PR 20983 to attempt to automatically traverse the type: I abandoned my PR. I marked bpo-41036 as a duplicate of this issue. |
Should we proceed with fixing GC for all heap types before continuing work with bpo-40077? |
I'm marking this as a 3.10 release blocker untill all converted types that are in 3.10 have GC support. |
I've added a checkbox for types that fully implement the GC protocol to https://discuss.python.org/t/list-of-built-in-types-converted-to-heap-types/8403/1. Heap types that fully implement the GC protocol:
Heap types that do not fully implement the GC protocol:
|
Is there a deterministic way to test these changes? Will something a la this be sufficient: import gc
import sys
gc.collect()
before = sys.gettotalrefcount()
import somemod
del sys.modules['somemod']
del somemod
gc.collect()
after = sys.gettotalrefcount() assert after == before |
Christian, I've got a PR ready for Modules/_ssl.c, but I won't submit it if you'd rather do it yourself. I'll stay off the sha/md5 types unless you approve :) |
Please open PRs and assign them to me. I'll review them as soon as possible. |
Thanks! Hashlib PR comin' up. |
Victor, can you take a look at the opened PRs? |
|
I've opened PR's to fix most of the heap types converted during the 3.10 alpha phase. What's missing (of the 3.10 batch) is:
For the types converted during 3.9 dev, should we backport to 3.9 or just 3.10? |
_winapi is leaky still even with my PR: >>> import sys,gc
>>> for _ in range(5):
... print(sys.gettotalrefcount())
... import _winapi
... del sys.modules['_winapi']
... del _winapi
... gc.collect()
...
50468
51076
51432
51788
52144 I just noticed this, but _winapi doesn't have a m_traverse/m_clear/m_free in the PyModuleDef eventhough it creates nearly 100 objects in m_slot->Py_mod_exec. I'm not a multi phase init expert, but shouldn't there be a cleanup function or am I confusing something here :( ? |
Oops sorry I think I'm wrong. most of those objects may be borrowed refs. Just 1 type object is causing the leak. |
#70587 is failing with an access violation on Windows. It's failing in one of the flaky tests. I wonder if the segfault is related to flaky tests somehow... test_pha_optional (test.test_ssl.TestPostHandshakeAuth) ... ok Current thread 0x000009e0 (most recent call first): Thread 0x000003c4 (most recent call first): Thread 0x00001700 (most recent call first): |
Hm, I'm unable to reproduce it w/addr sanitiser on macOS (FWIW). $ ./python.exe -m test test_ssl -F -u all -m test_pha_required_nocert Passing 1000 successful runs now. I'll see if I can get a Win dev env set up later. |
I'm not fully satisfied by the implementation of the two LRU types in functools. See the discussion in these two PRs: The _lru_list_elem doesnt implement the GC protocol for performance reasons: But I'm not sure if it's ok that _lru_list_elem doesn't implement the GC protocol: it's disucssion in this issue. |
I'm not fully satified by the overlapped_dealloc() implementation neither. There is an unpleasant code path for Windows XP but it's no longer needed in Python 3.11. I would prefer to always call PyObject_GC_UnTrack() and call it earlier. See the dicsussion in the PR: But it can be modified later. |
Can we close this, Victor? |
Yep, I close it. Thanks to everyone who helped to fix the issue! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: