Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Please support subinterpreters #24755

Open
mkostousov opened this issue Sep 20, 2023 · 6 comments
Open

ENH: Please support subinterpreters #24755

mkostousov opened this issue Sep 20, 2023 · 6 comments

Comments

@mkostousov
Copy link

Proposed new feature or change:

Version 1.25.1, Python 3.12.re02
After enabling interpreters in Python C Api:

PyInterpreterConfig config = {
.check_multi_interp_extensions = 1,
.gil = PyInterpreterConfig_OWN_GIL,
};
PyThreadState *tstate = NULL;
PyStatus status = Py_NewInterpreterFromConfig(&tstate, &config);
if (PyStatus_Exception(status)) {
return -1;
}

Import numpy throws an exception:
module numpy.core._multiarray._umath does not support loading in subinterpreters

@mattip mattip changed the title Support for subinterpreters ENH: Please support subinterpreters Sep 21, 2023
@mattip
Copy link
Member

mattip commented Sep 21, 2023

PEP 554 states:

To mitigate that impact and accelerate compatibility, we will do the following:

  • be clear that extension modules are not required to support use in multiple interpreters
  • raise ImportError when an incompatible module is imported in a subinterpreter
  • provide resources (e.g. docs) to help maintainers reach compatibility
  • reach out to the maintainers of Cython and of the most used extension modules (on PyPI) to get feedback and possibly provide assistance

The PEP also links to Isolating Extensions which has a lot of theory, but does not clearly state how to migrate a large existing c-extension library like NumPy to support subinterpreters. I think we would need to:

  • move to HeapTypes
  • move all static state into module state
  • carefully analyze code for possible shared state.

I am a bit unclear whether subinterpreters share a single GIL, if not we would also have to carefully examine the code for possible race conditions.

This is a lot of work, and may have performance implications. What is your use case for subinterpreters? Do you think you could help with the effort or find funding for this effort?

@rgommers
Copy link
Member

@seberg
Copy link
Member

seberg commented Sep 26, 2023

This is a lot of work, and may have performance implications. What is your use case for subinterpreters? Do you think you could help with the effort or find funding for this effort?

I suspect the vast majority of changes to be relatively easy, but there is still the same problem that we need someone to explicitly dedicate time on this, and I doubt it will be one of the current core devs.
We even added a warning a long time back saying exactly that, but it seems CPython changes to make subinterpreter support better in the long-run now enforces an error rather than a warning.

@a-reich
Copy link

a-reich commented Oct 10, 2023

PEP 554 states: …

FWIW the recent CPython changes should be from PEP 684 “Per-Interpreter GIL”; PEP 554, for the Python API and subinterpreter management features, is still in draft status.

@mdekstrand
Copy link

There's a very strong use case for subinterpreters since PEP 684 for parallel processing that I expect would be useful to a lot of numpy client code: using subinterpreters in separate threads will enable shared memory (at least in a read-only case) with significantly less hassle than multiprocessing.

@a-reich
Copy link

a-reich commented Nov 10, 2023

I’m also very excited about the potential opportunities of using subinterpreters with numpy, and agree with what @mdekstrand said. In particular, the latest draft of PEP 734 discusses sharing data via the buffer protocol (and already implemented in the private interpreters module since 3.13a1). Since ndarrays can export their buffer or be created from one without copies, this could be a very nice pattern:

  • pickle your array with protocol 5 to get some serialized metadata plus the memoryview,
  • pass that view to a bunch of interpreters (which is basically instant) as well as the small metadata,
  • and unpickle: now all of them are sharing the data in each of their arrays
  • And if you don’t want to worry about data races, seems like np can handle that by setting the readonly flag.

You get concurrency with performant, opt-in data sharing, without the hassles of managing subprocesses and using multiprocessing.shared_memory where you have to create a shared buffer of fixed size ahead of time and only create arrays using that. With interpreters you can take any random array you got and easily share it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants