Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-Phase Init Extension Module Init Functions Still Run in Isolated Interpreters #117953

Open
ericsnowcurrently opened this issue Apr 16, 2024 · 3 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented Apr 16, 2024

Bug report

Bug description:

When an extension module is imported the first time, we load the shared-object file and get the module's init function from it. We then run that init function and use the returned object to decide if the module is single-phase init or multi-phase init.

For isolated subinterpreters, where PyInterpreterConfig.check_multi_interp_extensions (AKA Py_RTFLAGS_MULTI_INTERP_EXTENSIONS) is True, we immediately fail single-phase init modules. The problem is that at that point the init function has already run, which all sorts of potential side effects and process-global state (including registered callbacks) that we mostly can't clean up.

This has come up before, for example with the readline module. It's potentially a bigger problem than I thought at first, so I'd like to get it sorted out for 3.13.


FWIW, the simplest solution I can think of is to always call the module init func from the main interpreter (without necessarily doing all the other import steps there). That would look something like this:

  1. start a normal import in an isolated subinterpreter
  2. get the init function
  3. switch to the main interpreter
  4. call the init function
  5. switch back
  6. fail if it is single-phase init (and remember that fact)

For the main interpreter and non-isolated subinterpreters, nothing different would happen from now; there would be no switching. Also, if the first attempt was in an isolated interpreter (which would fail), a subsequent import of that module in the main interpreter (or a non-isolated one) would succeed.

The only tricky part is, when the init function raises an exception, how to an propagate it from the main interpreter to the subinterpreter. For multi-phase init (if known) we would just call the init func again after switching back. For single-phase init (or unknown) we'd have preserve the exception somehow. This is something I had to deal with for _interpreters.exec(), but I'm not sure the same thing will work here.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@ericsnowcurrently ericsnowcurrently added type-bug An unexpected behavior, bug, or error interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters 3.13 bugs and security fixes labels Apr 16, 2024
ericsnowcurrently added a commit that referenced this issue Apr 23, 2024
)

This is a collection of very basic cleanups I've pulled out of gh-118116.  It is mostly renaming variables and moving a couple bits of code in functionally equivalent ways.
ericsnowcurrently added a commit that referenced this issue Apr 24, 2024
These are cleanups I've pulled out of gh-118116.  Mostly, this change moves code around to align with some future changes and to improve clarity a little.  There is one very small change in behavior: we now add the module to the per-interpreter caches after updating the global state, rather than before.
ericsnowcurrently added a commit that referenced this issue Apr 24, 2024
…inglephase or Not (gh-118193)

This change makes other upcoming changes simpler.
ericsnowcurrently added a commit that referenced this issue Apr 24, 2024
This helps with a later change that splits up _PyImport_LoadDynamicModuleWithSpec().
ericsnowcurrently added a commit that referenced this issue Apr 24, 2024
A couple of refleaks slipped through in gh-118194. This takes care of them.

(AKA _Py_ext_module_loader_info_init() does not steal references.)
ericsnowcurrently added a commit to ericsnowcurrently/cpython that referenced this issue Apr 24, 2024
…-118250)

A couple of refleaks slipped through in pythongh-118194. This takes care of them.

(AKA _Py_ext_module_loader_info_init() does not steal references.)
@encukou
Copy link
Member

encukou commented Apr 29, 2024

Continuing here from the closed PR. @ericsnowcurrently wrote:

Thanks for bringing [m_slots=NULL] up. What you've said makes sense and appreciate the clarity of it. There are indeed some gaps (fairly small, I expect), both functionally and in tests. It isn't clear what the impact is in practice, but I definitely think they should be addressed regardless. I'll be sure to open some issues in the next week or two.

Thanks!

AFAICS, the current state is that is_singlephase() function might return true on some multi-phase modules. And it's currently only used in asserts, where assert(is_singlephase(...)) is fine, but assert(!is_singlephase(...)) runs into the false positive. The current use of the latter is in create_builtin, which is AFAIK only used modules that don't have m_slots=NULL. Another use in #118203 broke a test.

IMO, current PRs should simply avoid assert(!is_singlephase); in another PR the function should become assert_is_singlephase(), to limit future uses to the case that works.

FWIW, the general thought of kinds of extension modules was already on my mind (and a bit clearer in my original mega-PR, #118116). For now I have a later PR (#118205) that is a bit more deliberate about keeping track of what the init function returns. While that PR doesn't do so currently, I might look at explicitly tracking the kind on the module def. That would help address the point you've brought up.

Yes, looks like this tracking will be needed in order to use this for more than asserts.
I guess “tracking the kind on the module def” is a typo and you meant “he kind of the module def”? (PyModuleDef is part of the stable ABI, and it doesn't have space for more data.)

ericsnowcurrently added a commit that referenced this issue Apr 29, 2024
Basically, I've turned most of _PyImport_LoadDynamicModuleWithSpec() into two new functions (_PyImport_GetModInitFunc() and _PyImport_RunModInitFunc()) and moved the rest of it out into _imp_create_dynamic_impl().  There shouldn't be any changes in behavior.

This change makes some future changes simpler.  This is particularly relevant to potentially calling each module init function in the main interpreter first.  Thus the critical part of the PR is the addition of _PyImport_RunModInitFunc(), which is strictly focused on running the init func and validating the result.  A later PR will take it a step farther by capturing error information rather than raising exceptions.

FWIW, this change also helps readers by clarifying a bit more about what happens when an extension/builtin module is imported.
@ericsnowcurrently
Copy link
Member Author

(PyModuleDef is part of the stable ABI, and it doesn't have space for more data.)

Right. Tracking it *on the module def would require hiding the bit(s) in one of the existing fields. That's doable, but I'd do that only if there wasn't another good place to stick the info.

ericsnowcurrently added a commit that referenced this issue Apr 29, 2024
…nsions (gh-118204)

This change will make some later changes simpler. It also brings more consistent behavior and lower maintenance costs.
ericsnowcurrently added a commit that referenced this issue May 1, 2024
…chinery (gh-118205)

This change will make some later changes simpler.
ericsnowcurrently added a commit that referenced this issue May 3, 2024
This change will make some later changes simpler.
ericsnowcurrently added a commit that referenced this issue May 4, 2024
We have only been tracking each module's PyModuleDef.  However, there are some problems with that.  For example, in some cases we load single-phase init extension modules from def->m_base.m_init or def->m_base.m_copy, but if multiple modules share a def then we can end up with unexpected behavior.

With this change, we track the following:

* PyModuleDef (same as before)
* for some modules, its init function or a copy of its __dict__, but specific to that module
* whether it is a builtin/core module or a "dynamic" extension
* the interpreter (ID) that owns the cached __dict__ (only if cached)

This also makes it easier to remember the module's kind (e.g. single-phase init) and if loading it previously failed, which I'm doing separately.
ericsnowcurrently added a commit that referenced this issue May 7, 2024
)

This ensures the kind is always either _Py_ext_module_kind_SINGLEPHASE or _Py_ext_module_kind_MULTIPHASE.
ericsnowcurrently added a commit that referenced this issue May 7, 2024
…h-118157)

This change makes sure all extension/builtin modules have their init function run first by the main interpreter before proceeding with import in the original interpreter (main or otherwise).  This means when the import of a single-phase init module fails in an isolated subinterpreter, it won't tie any global state/callbacks to the subinterpreter.
@ericsnowcurrently
Copy link
Member Author

The core change has landed, but there are a few small follow-up things to wrap up.

SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
…ort Machinery (pythongh-118205)

This change will make some later changes simpler.
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
…ongh-118532)

We have only been tracking each module's PyModuleDef.  However, there are some problems with that.  For example, in some cases we load single-phase init extension modules from def->m_base.m_init or def->m_base.m_copy, but if multiple modules share a def then we can end up with unexpected behavior.

With this change, we track the following:

* PyModuleDef (same as before)
* for some modules, its init function or a copy of its __dict__, but specific to that module
* whether it is a builtin/core module or a "dynamic" extension
* the interpreter (ID) that owns the cached __dict__ (only if cached)

This also makes it easier to remember the module's kind (e.g. single-phase init) and if loading it previously failed, which I'm doing separately.
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
…ythongh-118684)

This ensures the kind is always either _Py_ext_module_kind_SINGLEPHASE or _Py_ext_module_kind_MULTIPHASE.
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
…irst (pythongh-118157)

This change makes sure all extension/builtin modules have their init function run first by the main interpreter before proceeding with import in the original interpreter (main or otherwise).  This means when the import of a single-phase init module fails in an isolated subinterpreter, it won't tie any global state/callbacks to the subinterpreter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

2 participants