-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Ensure that objects entering the GC are valid #82573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A bug in Python/hamt.c was only discovered 4 months after the code was added to Python:
The problem was that an object was tracked by the GC, whereas calling its traverse method crashed Python... depending when the traverse method is called exactly. Pseudo-code: PyObject_GC_Track(<partially initialized object>);
...
obj2 = PyObject_GC_NewVar(...);
...
<finish to initialize the first object> The problem is that PyObject_GC_NewVar() can trigger a GC collection which uses the partially initialized object, and then we get a cryptic crash in the GC (usually seen in visit_decref()). I propose to make PyObject_GC_Track() stricter: validate that the object seems to be "valid" or "fully initialized". From the GC point of view, it means that calling tp_traverse must not crash. Attached PR is a concrete implementation. This issue is a follow-up of my previously failed attempt bpo-36389 "Add gc.enable_object_debugger(): detect corrupted Python objects in the GC". While bpo-36389 attempts to discoverd corrupted objects once a GC collection is triggered, this issue is restricted to objects entering the GC. First problem: my PR breaks Modules/pyexpat.c whereas the code is not strictly a bug: in Python 3.8, it's ok to put a partially initialized object into the GC if no GC collection can be triggered before the object is fully initialized. In Modules/pyexpat.c case, the code looks safe from that point of view. It means that such change is a backward incompatible change. IMHO it's a good step forward, since it is likely to discovered hidden bugs. |
Pablo, Tim, Neal: what do you think of this idea? |
I'm pretty sure you meant nascheme, not nnorwitz. |
It could be possible to do this in backward compatible way. PyObject_GC_Track() could add the object to the list of new objects (all objects are already linked in bi-linked list, so it would need to just move the object to the specified list), and PyObject_GC_NewVar() could check all new objects and clear the list of new objects (just move the head). I am not sure it is worth to do such complication. |
From documentation: .. c:function:: void PyObject_GC_Track(PyObject *op) Adds the object *op* to the set of container objects tracked by the So there was a bug in Modules/pyexpat.c. |
Oops, right :-) |
That's different than what I'm proposing here. Checking "later" is more on the bpo-36389 side: my bpo-36389 PR modified PyObject_GC_NewVar() to check that young object are valid. Moreover, my bpo-36389 PR required threshold: checking all young objects at each PyObject_GC_NewVar() simply killed performance which made the whole approach unusable in practice. Because of the threshold, my implementation failed to detect the hamt.c bug bpo-33803: the partially initialized was already fully initialized when my "object debugger" ran. PR 16615 detects immediately the hamt.c bug bpo-33803. I designed bpo-36389 to attempt to detect when an object is corrupted after its creation, corrupted by code running "later". We had a customer with a bug in OpenStack Nova. Nova crashed in visit_decref() but I failed to debug the issue. It motivated me to modify the debug ABI in Python 3.8, so later it will become possible to switch from release to debug mode to attempt to debug a program using additional checks added by the debug build. Here my intent is to ensure that objects entering the GC are valid to help to keep the GC list consistent and valid. But it doesn't prevent an object from being corrupted after its creation. We should try to find a different approach for that (maybe revisit bpo-36389). |
Ok, I pushed a change. Let's see how third party projects like it :-) We still have time to revert it if it causes too many damage. |
I am sure this will break third-party projects. But I thing that we should not revert this change, as it completely matches the documentation. It is good that it was applied so early at development cycle. Stefan, could you please test Cython and lxml with this? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: