Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify what is a "sub-interpreter" and what is an "interpreter". #69

Open
markshannon opened this issue Oct 19, 2020 · 12 comments
Open

Comments

@markshannon
Copy link

PEP 554 is entitled "Multiple Interpreters in the Stdlib", yet the term "subinterpreters" is used throughout this repo.

There is the additional confusion of the C struct names.
It seems to me that the C struct PyInterpreterState corresponds to the sub-interpreter and that the C struct _PyRuntimeState corresponds to the interpreter.

Confusion about which is which makes the goals of this project unclear, and I fear may have resulted in some unnecessary work, as data structures are moved to PyInterpreterState that could more easily, and with less impact, been moved to (or left in) _PyRuntimeState.

@encukou
Copy link
Collaborator

encukou commented Oct 20, 2020

IMO, "subinterpreter" not a good term; generally we should aim to make all interpreters equal (though that can be a long-term goal).

@markshannon
Copy link
Author

Sub-interpreters already exist, whether we like the term or not.
They share the same heap, although they cannot see each other's sub-heap, just common objects like builtin types and numbers.

Why not leave them working as they do now, and enable multiple interpreters?
That way seems easier to implement in practice, and causes less breakage (at least, no more breakage).

@encukou
Copy link
Collaborator

encukou commented Oct 20, 2020

As far as I can see, "sub-interpreter" and "interpeter" are basically interchangeable terms at this point. See e.g. the first two sentences in the Py_NewInterpreter docs.
They shared some objctes like builtin types and numbers, which are immutable and currently OK to share – until you want per-interpreter GIL, which is one of the goals in this repo.
And unfortunately they also sometimes share some objects which they shouldn't, like anything that references a Python function's globals, so I'd rather fix them, not leave them working as they do now.

_PyRuntimeState holds the stuff that's common to all (sub-)interpreters, such as, well, the list of (sub-)interpreters. Everything else should be per-(sub)interpreter.

@markshannon
Copy link
Author

The problem with that approach is that involves a lot of moving stuff from _PyRuntimeState to PyInterpreterState.
Wouldn't allowing several _PyRuntimeState be less work as it already has a GIL?
It would also allow subinterpreters to work as they currently do.

Until multiple interpreters can run in parallel, moving global state into _PyRuntimeState has no adverse impact on performance. Moving that state into PyInterpreterState slows things down.

@encukou
Copy link
Collaborator

encukou commented Oct 20, 2020

Wouldn't allowing several _PyRuntimeState be less work as it already has a GIL?

I doubt it – you'd need to make a per-_PyRuntimeState GIL, whereas in the current approach you'd need to make a per-PyInterpreterState GIL. The main issues, like making sure threads don't mangle a shared object's refcounts, are basically the same.

What exactly do you mean by allowing subinterpreters to work as they currently do?

@markshannon
Copy link
Author

All sub-interpreters share the same heap (even though they can see different parts of it) and share the GIL.

@encukou
Copy link
Collaborator

encukou commented Oct 21, 2020

So, to clarify, under your proposal with multiple _PyRuntimeState, we would plan to make one GIL per _PyRuntimeState?
Would sub-interpreters from different _PyRuntimeStates not share the heap?

@markshannon
Copy link
Author

Doesn't sharing a heap between interpreters require synchronization for the cycle GC?

@markshannon
Copy link
Author

My main point is that without clearer naming, it is impossible to discuss these alternatives without a lot confusion.

@encukou
Copy link
Collaborator

encukou commented Oct 21, 2020

OK. Here's my take.
You can have multiple interpreters in a single process. They should be isolated from each other; we're working on improving that isolation.
The term subinterpreter essentially means the same thing as interpreter. There are subtle differences:

  • If you start one interpreter from another, you'd call the child a "subinterpreter". (But you can also start interpreters from pure C code, and subinterpreters should be able to outlive their parents, though I don't think the high-level API is built for that.)
  • Saying "subinterpreters" makes it clear that you're working on better support for multiple interpreters, as opposed to improving other aspects of Python. Not a very good label, IMO, but it's what's used.

As for an earlier question, I don't think that moving stuff from _PyRuntimeState to PyInterpreterState is more work than allowing several _PyRuntimeState. But then, I'm not the one actually doing that work.

@ericsnowcurrently
Copy link
Owner

The key detail is that there is a "main" interpreter:

  • created during runtime initialization
  • used during runtime initialization
  • used during runtime finalization
  • the initial interpreter exposed to users
  • has the "main" thread

We have been calling all other interpreters in the runtime "subinterpreters".

FWIW, in the context of PEP 554, we start at the main interpreter. Each new interpreter then effectively ends up as a node in an implicit tree relative to "parent" interpreter under which the new one was created. However, that isn't fundamental at the C level.

@ericsnowcurrently
Copy link
Owner

ericsnowcurrently commented Oct 22, 2020

FYI, the C-API docs have a paragraph explaining the distinction (thanks to @nanjekyejoannah).

@markshannon, do you think it would help to have more detail there? (IMHO, there isn't much more to say that we say there.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants