-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters #90164
Comments
_PyUnicode_EqualToASCIIId() seems to be incompatible with subinterpreter: it makes the assumption that if direct pointer comparison fails and the string is interned, the two strings are not equal. -- super_init_without_args() of Objects/typeobject.c calls _PyUnicode_EqualToASCIIId(name, &PyId___class__) to test if the Unicode string 'name' is equal to "__class__". int
_PyUnicode_EqualToASCIIId(PyObject *left, _Py_Identifier *right)
{
right_uni = _PyUnicode_FromId(right);
...
if (left == right_uni)
return 1;
if (PyUnicode_CHECK_INTERNED(left))
return 0;
...
return unicode_compare_eq(left, right_uni);
} _PyUnicode_EqualToASCIIId() makes the assumption that left and right are not equal if left and _PyUnicode_FromId(right) pointers are not equal and left is an interned string. In the reproducer, left object is abc.ABCMeta.__new__.__code__.co_freevars[0]. Depending on how the stdlib abc.py file was loaded (in the main interpreter and in the subinterpreter), __code__.co_freevars[0] may or may not be an interned string. If __code__.co_freevars[0] is an interned string, _PyUnicode_EqualToASCIIId() fails in a subinterpreter if the direct pointer comparison fails (if left and right_uni pointers are not equal). -- Reproducer from: ninia/jep#358 (comment)
Output: Before creating sub interpreter
Traceback (most recent call last):
File "/opt/py310/lib/python3.10/io.py", line 52, in <module>
File "/opt/py310/lib/python3.10/abc.py", line 184, in <module>
File "/opt/py310/lib/python3.10/abc.py", line 106, in __new__
RuntimeError: super(): __class__ cell not found
Fatal Python error: _PyThreadState_Delete: tstate 0x7f9f2001c710 is still current
Python runtime state: initialized Current thread 0x00007f9f27c99640 (most recent call first): py-bt command in gdb: (gdb) py-bt
Traceback (most recent call first):
File "/opt/py310/lib/python3.10/abc.py", line 106, in __new__
cls = super().__new__(mcls, name, bases, namespace, **kwargs)
<built-in method __build_class__ of module object at remote 0x7fffea0b4cc0>
File "/opt/py310/lib/python3.10/abc.py", line 184, in <module>
class ABC(metaclass=ABCMeta):
<built-in method exec of module object at remote 0x7fffea0b4cc0>
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "/opt/py310/lib/python3.10/io.py", line 52, in <module>
import abc
<built-in method exec of module object at remote 0x7fffea0b4cc0>
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
<built-in method __import__ of module object at remote 0x7fffea0b4cc0> |
In Python 3.9, the code works because the _Py_IDENTIFIER() API shares Python Unicode objects between all interpreters. _PyUnicode_FromId() was modified to be per-interpreter in bpo-39465 by: New changeset ba3d67c by Victor Stinner in branch 'master': |
Serhiy: Do you recall the idea of the PyUnicode_CHECK_INTERNED() optimization? The PyUnicode_CHECK_INTERNED() test is as old as the _PyUnicode_EqualToASCIIId() function. commit f5894dd
|
When the bug occurs, I see that the Python stdlib abc.py file is loaded twice: the main interpreter builds a code object, and then subinterpreter builds its own code object: same content, but different Python object (at different memory addresses so inequal pointers!). I modified reproducer.c to add "Py_VerboseFlag = 1;" before the Py_Initialize() call. Truncated output: ...
# code object from /opt/py310/lib/python3.10/abc.py
...
# code object from /opt/py310/lib/python3.10/abc.py
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
...
RuntimeError: super(): __class__ cell not found
... |
There are around 27 _PyUnicode_EqualToASCIIId() calls in the Python code base. I don't think that avoiding _PyUnicode_EqualToASCIIId() is a good solution :-) Fixing _PyUnicode_EqualToASCIIId() to make it compatible with subinterpreters sound more reasonable: remove the PyUnicode_CHECK_INTERNED() test optimization. |
Should |
Right now, there still many cases where objects are still shared between two interpreters:
More details in the following issues: |
That's too bad. |
If two strings are interned and part of the same interpreter, the "ptr1 == ptr2" comparison continues to work. What is not compatible with subinterpreters, which have other interned string objects, is the assumption that two interned strings are not equal if both strings are interned and pointeres are not equal. _PyUnicode_EqualToASCIIId() is the only function making this assumption.
None of these functions have a fast path for interned strings. Well, _PyUnicode_EqualToASCIIId() is different since right is always an interned string. Note: unicode_compare() is strange, it requires both strings to be equal, but it does not consider that two strings are not equal if their kinds are not equal. If both strings are ready and their kinds are not equal, the two strings cannot be equal... unicode_compare_eq() returns 0 if the two kinds are not equal. |
Yeah, AFAIK Comparing two interned strings from different interpreters are not the available use-case. |
I marked bpo-46034 as a duplicate: setting a class "__init__" attribute doesn't update properly its tp_init slot. update_slot() of Objects/typeobject.c only uses a fast pointer comparison since both strings are interned, but the test fails if the two strings are part of two different Python interpreters. In bpo-46034 case, the problem is that the "slotdefs" array uses interned strings created in the main interpreter, whereas the update_slot() name parameter (Python string) was created in a sub-interpreter. Reproducer: import _testcapi
code = r"""class Greeting():
pass
def new_init(inst, name):
inst.text = f"Hello, {name}\n"
Greeting.__init__ = new_init Greeting("Miro")""" _testcapi.run_in_subinterp(code) ========================================= |
Attachd cmp_interned_strings.patch fix _PyUnicode_EqualToASCIIId() and update_slot() for subinterpreters. I will create a PR once we agree if it's required to support subinterpreters there, and if there is no better (faster) option. For update_slot(), one option to avoid adding a "slow" _PyUnicode_EQ() call would be to have per-interpreter interned strings for the slotdefs array. |
I created PR 30123 to fix _PyUnicode_EqualToASCIIId() and type update_slot() functions. I added comments explaining why we can no longer optimize the comparison of two interned string objects. |
These bug prevent the Fedora infra team from upgrading the Koji builders to Fedora 35. Koji runs on mod_wsgi which is affected by the bug: |
See also bpo-46070: I don't know if it's related. |
The problem here is that different sub-interpreters have different strings for the same Python string. Unless sub-interpreters are fully independent, and they cannot be due to limitations imposed by the stable API, then all sub-interpreters must share the same poll of strings. Since the only object reachable from a string is the As Victor points out, there is no bug in 3.9 because interned strings are common across all interpreter. We should revert that behavior. |
It sounds like this bug is another case where we have made some objects For now can we just move the relevant per-interpreter strings from Personally, I'd rather we not revert the original change. Moving the data Of course, the potential bug would still exist in _PyUnicode_EqualToASCIIId(). |
FWIW, it makes sense to me for the interned strings to be per-interpreter eventually. However, if we end up with immortal objects then I think all the strings created |
I've created a PR for moving the interned strings and identifiers to _PyRuntimeState until we are ready to move them back to the interpreter. |
If that seems okay, I'll work on a backport PR for 3.10. |
are there any objections to my PR? |
Even in 3.10? Your PR looks pretty heavy for a bugfix release, and won't apply cleanly to 3.10.
I guess that's what makes the PR hard to review. What's the plan (or are there more conflicting plans), and what's the state in 3.10 vs. main? |
IMO writing a complete rationale for running multiple interpreters in parallel which require a whole PEP. I didn't write such PEP yet since there are still non-trivial technical issues, especially the problem of static types: bpo-40601. I don't have time right now to measure the performance overhead of PR 30123 on _PyUnicode_EqualToASCIIId() and on changing a type attribute (like overriding the __init__() method). I expect a minor overhead on micro-benchmark and no significant change on pyperformance macrobenchmarks. But again, right I don't have time to go through such analysis. The priority for now is to unblock the Python 3.11 release and repair the Python 3.10 regression, so I'm fine with reverting my commit ea25180 which introduced the regression. PR 30131 makes more changes than just reverting the commit, it changes the _PyRuntimeState structure and it also reverts the identifiers change, whereas so far, there is no known relationship between this issue and identifiers. IMO it's ok to leave identifiers unchanged. |
+1 on just reverting in both branches. I can deal with my stuff separately. |
FYI, I'm planning on having such a PEP published in the next few days. |
_PyUnicode_EqualToASCIIId() and type update_slot() functions are fixed in 3.10 and main branches. The regression is now fixed. But the revert reintroduces the issue on subinterpreters, so I created bpo-46283: "[subinterpreters] Unicode interned strings must not be shared between interpreters". |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: