-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[subinterpreters] Design a subinterpreter friendly alternative to _Py_IDENTIFIER #83646
Comments
Both #18066 (collections module) and #18032 (asyncio module) ran into the problem where porting them to multi-phase initialisation involves replacing their usage of the When _posixsubprocess was ported, the replacement was a relatively ad hoc combination of string interning and the interpreter-managed module-specific state: 5a7d2e1 I'm wondering if we may able to devise a comparable struct-field based system that replaces the
And then the following additional state management macros would be needed to handle the string interning and reference counting: // Module state struct declaration
typedef struct {
// This would declare an initialised array of _Py_Identifier structs
// under a name like __cached_identifiers__. The end of the array
// would be indicated by a strict with "value" set to NULL.
_Py_START_CACHED_IDENTIFIERS;
_Py_CACHED_IDENTIFIER(disable);
_Py_CACHED_IDENTIFIER(enable);
_Py_CACHED_IDENTIFIER(isenabled);
_Py_END_CACHED_IDENTIFIERS;
);
} _posixsubprocessstate;
With the requirement to declare usage of the cached identifiers, they could be lazily initialized the same way the existing static variables are (even re-using the same struct declaration). Note: this is just a draft of one possible design, the intent of this issue is to highlight the fact that this issue has now come up multiple times, and it would be good to have a standard answer available. |
Once I discussed with Eric Snow during a core developer sprint: _Py_IDENTIFIER() should use an "interpreter local storage" for identifiers values. _Py_IDENTIFIER() would only be a "key" and _PyUnicode_FromId() would store the value somewhere in a hash table stored in PyInterpreterState. Something similar to the TSS API:
But per interpreter, rather than being per thread. The key can be simply the variable address in memory. It only has to be unique in the process. |
What is the problem between _Py_IDENTIFIER and multi-phase initialisation modules? If both are incompatible, we may need a different but similar API: values would be stored in a hash table per module object. The hash table can be stored in the module object directly, or it can be store in a second hash table (module => hash table). If we want a unified API, maybe we can use module=NULL (or any other marker) for "global" identifiers (not specific to a module). |
AFAIK there is no problem now, except possibly a race condition when initializing the identifiers. The problem will come with per-interpreter reference counting, or when the |
The GIL avoids any risk of race condition, no? |
Looks like the GIL would affect performance more or less?
+1. IMHO, for those two cases, the simplest idea is move IDENTIFIER to moduleState which would increase more memory usage than InterpreterState. |
As Petr notes, as long as all subinterpreters share the GIL, and share str instances, then the existing _Py_IDENTIFIER mechanism will work fine for both single phase and multi-phase initialisation. However, that constraint also goes the other way: as long as we have modules that use the existing _Py_IDENTIFIER mechanism, then subinterpreters *must* share str instances, and hence *must* share the GIL. Hence the "enhancement" classification: there's nothing broken right now, but if we're ever going to achieve the design goal of using subinterpreters to exploit multiple CPU cores without the overhead of running multiple full interpreter processes, we're going to need to design a different way of handling this. Something to keep in mind with The reason multi-phase initialisation makes this more complicated is that it means we can't use the memory addresses of C process globals as unique identifiers any more, since more than one module object may be created from the same C shared library. However, if we assume we've moved to per-module state storage (to get unique memory addresses back), then we can largely re-use the existing |
I created bpo-40602: "Move Modules/hashtable.h to Include/internal/pycore_hashtable.h". |
Attached bench.py: Micro-benchmark on _PyUnicode_FromId(). It requires attached bench.patch being applied. |
This change introduced a subtle regression: bpo-46006 "[subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters". |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: