-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Same constants in tuples are not merged while compile() #78281
Comments
In the Python 3.7.0 interpreter, the following evaluates to False. In 3.6.4, it was True:
This seems to be because, in some cases, Python 3.7 fails to intern integers inside tuples: a = (500,500)
print(a[0] is a[1]) # False
a = (500,500,42)
print(a[0] is a[1]) # False
a = (500,500,'42')
print(a[0] is a[1]) # False
answer = 42
a = (500,500,answer)
print(a[0] is a[1]) # True
a = (500,500,[42])
print(a[0] is a[1]) # True
a = [500,500]
print(a[0] is a[1]) # True I believe the above should all return True. |
Another curious case: a = (500,500); b = (500,500)
print(a[0] is b[0]) # True
print(a[0] is b[1]) # False
print(a[1] is b[0]) # False
print(a[1] is b[1]) # True |
This was a side-effect of bpo-29469. |
Should the AST optimizer "interns" constants? |
Thank you for finding it. I had worked on constant merging. 1 But it may be a too big to fix only this regression. |
This is not only with integers. >>> a = ((1, 2), (1, 2))
>>> a[0] is a[1]
False
>>> a = ('@#$', '@#$')
>>> a[0] is a[1]
False
>>> a = (1.0, 1.0)
>>> a[0] is a[1]
False The only exception is short ASCII identifier-like strings (as a side effect of interning them): >>> a = ('foo', 'foo')
>>> a[0] is a[1]
True I'm not sure this is a problem which should be resolved. |
The language doesn't define anything about this - any program relying on accidental identity is in error itself. Still, it's nice if a code object's co_consts vector is as short as reasonably possible. That's a matter of pragmatics, not of correctness. |
OK, unless example code this regression affects seriously is shown, I target only 3.8. |
The co_consts vector is already as short as possible, except cases when tuples are created at code generation time, but this is not related to this issue (see bpo-33318 and bpo-33325). >>> def f():
... a = (1.0, 1.0)
... b = (1.0, 1.0)
...
>>> f.__code__.co_consts
(None, (1.0, 1.0))
>>> f.__code__.co_consts[1][0] is f.__code__.co_consts[1][1]
False |
Fine, Serhiy, so reword it a tiny bit: it's nice if a code object's co_consts vector references as few distinct objects as possible. Still a matter of pragmatics, not of correctness. |
FYI, same constants are shared even when they are in different tuples on 3.6. For modules having large constant table containing large integer or floats, non name-like string, and bytes are affected significantly. |
Honestly, I don't think that it's a bug. We might enhance Python 3.8 to reduce constants duplication, but there is no need to "fix" Python 3.7. |
I also agree that there's no bug here, so I'm marking this an enhancement and removing the regression label. |
I think (space) 'performance' would be a better label, as this is strictly an implementation improvement, not a language change. But we often (usually? sometimes?) limit performance improvements to the 'next version' so we have the alpha/beta/candidate releases to discover possible regressions. I think this is worth considering just because the pattern is so odd. But if this is ironed out as part of a broader and better patch, great. |
Counting object types in logging/pycache/init.cpython-38.pyc: master: #52588: It reduced about 5% objects. I chose logging/init because it's large except tests, and it's written in OO-based. (Large application has many OO-based code). |
This is another memory overhead comparison. import sys, django, flask
sys._debugmallocstats() ### master branch class size num pools blocks in use avail blocks ### merge-const branch class size num pools blocks in use avail blocks |
This change introduced a lot of memory leaks (reference leaks): The following change fix "make && ./python -m test -R 3:3 test_code": diff --git a/Python/compile.c b/Python/compile.c
index acb5cfe29b..cb3e73740d 100644
--- a/Python/compile.c
+++ b/Python/compile.c
@@ -1277,6 +1277,7 @@ merge_consts_recursive(struct compiler *c, PyObject *o)
else {
u = k;
}
+ Py_DECREF(k);
Py_INCREF(u);
PyTuple_SET_ITEM(tuple, i, u);
i++;
@@ -1288,6 +1289,7 @@ merge_consts_recursive(struct compiler *c, PyObject *o)
Py_DECREF(key);
return NULL;
}
+ Py_DECREF(PyTuple_GET_ITEM(key, 1));
PyTuple_SET_ITEM(key, 1, new);
}
But I dislike the frozenset branch of merge_consts_recursive(): modifying a tuple is a bad idea. For example, if you replace PyTuple_SET_ITEM() with PyTuple_SetItem(), you get a PyErr_BadInternalCall() because the reference count is greater than 1. I don't see how to rewrote properly the code, so I reverted (removed) it to let someone else fix the code: PR 10743. |
merge_consts_recursive(): rather than modifying the 'key' tuple, would it make sense to *add* the new key to the cache? |
I agree that modifying tuple is bad idea in general case. But in case of constants, in-place modify is the easiest approach. Our purpose is "merging" constant. On the other hand, "build new tuple and So I don't think we need to follow the general principle unless it reduces code. FYI, we have similar in-place editing for interning already. |
Well, it's up to you. I will review your PR ;-) |
Thanks INADA Naoki! Thanks a simple and cool optimization! Maybe you might want to document it at https://docs.python.org/dev/whatsnew/3.8.html#optimizations ? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: