-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference loss for local classes #78223
Comments
I have a test in Nuitka, designed to detect reference counting problems with Python code. It basically takes a snapshot of the refcount, runs a function until it stabilizes, then says PASS, or else reports the last diff. Obviously for CPython it's supposed to pass. Testing with self compiled 3.7.0 and 3.7.0-1 as in Debian testing (buster) currently, this happens for me: This is a cutout, there are more than 100 functions, I am listing the ones that report: simpleFunction16: FAILED 118414 118412 leaked -2 simpleFunction21: FAILED 118337 118333 leaked -4 This is really bad, because normally values are positive, merely preventing a release. A negative value indicates that references have been lost. This will normally result in corruption of the memory allocator for Python, although I have not yet seen that. I have a few remaining cases, where compiled code causes negative leaks too, there is happens. But I didn't force the issue yet. Notice that this is of course with the debug Python version and let me express, pure CPython is used to run the test code. When compiling with Nuitka and 3.7, my private code gives the same ref counts for there, but it also pretty much does def simpleFunction16():
class EmptyClass:
pass
return EmptyClass
def simpleFunction39():
class Parent(object):
pass All the other cases also use locally defined classes and only test cases using local classes are failing for me here. To reproduce is easy: git clone --branch develop http://git.nuitka.net/Nuitka.git Thanks, |
The simplest reproducer: import gc, sys
for i in range(100):
class A:
pass
del A
c = gc.collect()
del c
print(sys.gettotalrefcount()) |
Just to confirm, this is also happening on Windows as well, with both 32 and 64 bits. |
I investigated that commit but I cannot find anything :(. |
I think I found it. After doing the bisect manually (as redirecting stdout/stderr) seems to affect somehow the test) I found commit b0a7a03 as the culprit. Reverting this commit on the current master seem to solve the problem. This is the diff of the revert: diff --git a/Objects/dictobject.c b/Objects/dictobject.c -
-static PyObject *
-clone_combined_dict(PyDictObject *orig)
-{
- assert(PyDict_CheckExact(orig));
- assert(orig->ma_values == NULL);
- assert(orig->ma_keys->dk_refcnt == 1);
-
- Py_ssize_t keys_size = _PyDict_KeysSize(orig->ma_keys);
- PyDictKeysObject *keys = PyObject_Malloc(keys_size);
- if (keys == NULL) {
- PyErr_NoMemory();
- return NULL;
- }
-
- memcpy(keys, orig->ma_keys, keys_size);
-
- /* After copying key/value pairs, we need to incref all
- keys and values and they are about to be co-owned by a
- new dict object. */
- PyDictKeyEntry *ep0 = DK_ENTRIES(keys);
- Py_ssize_t n = keys->dk_nentries;
- for (Py_ssize_t i = 0; i < n; i++) {
- PyDictKeyEntry *entry = &ep0[i];
- PyObject *value = entry->me_value;
- if (value != NULL) {
- Py_INCREF(value);
- Py_INCREF(entry->me_key);
- }
- }
-
- PyDictObject *new = (PyDictObject *)new_dict(keys, NULL);
- if (new == NULL) {
- /* In case of an error, `new_dict()` takes care of
- cleaning up `keys`. */
- return NULL;
- }
- new->ma_used = orig->ma_used;
- assert(_PyDict_CheckConsistency(new));
- if (_PyObject_GC_IS_TRACKED(orig)) {
- /* Maintain tracking. */
- _PyObject_GC_TRACK(new);
- }
- return (PyObject *)new;
-}
-
PyObject *
PyDict_New(void)
{
@@ -2527,13 +2481,7 @@ PyDict_Copy(PyObject *o)
PyErr_BadInternalCall();
return NULL;
}
-
mp = (PyDictObject *)o;
- if (mp->ma_used == 0) {
- /* The dict is empty; just return a new dict. */
- return PyDict_New();
- }
-
if (_PyDict_HasSplitTable(mp)) {
PyDictObject *split_copy;
Py_ssize_t size = USABLE_FRACTION(DK_SIZE(mp->ma_keys));
@@ -2560,27 +2508,6 @@ PyDict_Copy(PyObject *o)
_PyObject_GC_TRACK(split_copy);
return (PyObject *)split_copy;
}
-
- if (PyDict_CheckExact(mp) && mp->ma_values == NULL &&
- (mp->ma_used >= (mp->ma_keys->dk_nentries * 2) / 3))
- {
- /* Use fast-copy if:
-
- (1) 'mp' is an instance of a subclassed dict; and
-
- (2) 'mp' is not a split-dict; and
-
- (3) if 'mp' is non-compact ('del' operation does not resize dicts),
- do fast-copy only if it has at most 1/3 non-used keys.
-
- The last condition (3) is important to guard against a pathological
- case when a large dict is almost emptied with multiple del/pop
- operations and copied after that. In cases like this, we defer to
- PyDict_Merge, which produces a compacted copy.
- */
- return clone_combined_dict(mp);
- }
-
copy = PyDict_New();
if (copy == NULL)
return NULL; |
@antoine Wow, it seems that we both foundb0a7a037b8fde56b62f886d5188bced7776777b4 with one minute of difference :D I added Yuri to the noise list as is the author of that PR. |
This isn't a real reference bug, but rather a bug in total refs accountability. It seems that I missed the fact that we track refs to the keys table with a DK_INCREF macro. The new diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index 7a1bcebec6..3ac6a54415 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -656,6 +656,7 @@ clone_combined_dict(PyDictObject *orig)
/* Maintain tracking. */
_PyObject_GC_TRACK(new);
}
+ _Py_INC_REFTOTAL;
return (PyObject *)new;
} I don't think this is a critical-level bug that needs an emergency 3.7.1 release, but I'll submit a PR right now. Would appreciate if you guys can review it. |
Agreed, if it was a real reference bug, the interpreter should crash before the total reference count gets negative ;-) |
First thing I checked with Serhiy's script :) A PR with a fix: #8119 |
Hello, so it's harmless and it explains the other reference counting issue, where a change in call convention could make a reference counting bug show or go away: codecs.open(TESTFN, encoding='cp949')
This was showing it, where as
codecs.open(TESTFN, "rb", 'cp949') was not. With all the difference being a dictionary passed into a function call or not. I am now assuming that getting your fix this will also go away. Caused me a bit of head scratching already. If you could do what you often you, and make this what distributions like Debian pull from, it would be good enough in terms of release for me, as it blocks Nuitka tests from passing on them. Yours, |
I'm not sure what you mean by "what you often do". I'll push a fix to master and 3.7 branches in a few minutes, so perhaps you can open a PR to ask Linux vendors to add a patch for 3.7.0 to their packages. An easier solution would be for you to disable refcounting tests that rely on gettotalrefcount just for 3.7.0. And I apologize for the hassle :( |
Thank you for reporting the issue, Kay. And huge thanks to Antoine and Pablo for bisecting. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: