Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Interpreter Isolation #100227

Closed
ericsnowcurrently opened this issue Dec 14, 2022 · 18 comments
Closed

Improve Interpreter Isolation #100227

ericsnowcurrently opened this issue Dec 14, 2022 · 18 comments
Assignees
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters type-feature A feature request or enhancement

Comments

@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented Dec 14, 2022

In CPython, we store (persistent) runtime data in several locations. (See https://github.com/ericsnowcurrently/multi-core-python/wiki/3-CPython's-Runtime#cpythons-run-time-data.)

The state of each interpreter (PyInterpreterState) encapsulates the data which is unique to that interpreter and isolated from other interpreters. That isolation is important for the proper operation of multiple interpreters in a process, and will be critical for a per-interpreter GIL, AKA PEP 684 (assuming it gets approved). There are still a number of gaps in isolation which we will address here.

The deficiencies may be categorized by the following:

  • shared runtime state (_PyRuntimeState) conceptually better suited to each interpreter
  • shared runtime state currently protected by the GIL
  • shared global variables that shouldn't be shared
  • shared global variables currently protected by the GIL

Linked PRs

@ericsnowcurrently ericsnowcurrently added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters 3.12 bugs and security fixes labels Dec 14, 2022
@ericsnowcurrently ericsnowcurrently self-assigned this Dec 14, 2022
@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented Dec 14, 2022

A related analysis of _PyRuntimeState and the remaining global variables is found at https://github.com/ericsnowcurrently/multi-core-python/wiki/3-CPython's-Runtime#isolating-interpreter-state.

In summary, the following state needs attention:

  • isolate to PyInterpreterState
    • _PyRuntimeState.obmalloc  (issue)
    • _PyRuntimeState.signals.default_handler (move to module state)
    • _PyRuntimeState.signals.ignore_handler (move to module state)
    • _PyRuntimeState.imports.lock
    • _PyRuntimeState.imports.find_and_load
    • _PyRuntimeState.dtoa
    • _PyRuntimeState.dict_state.global_version
    • _PyRuntimeState.dict_state.next_keys_version
    • _PyRuntimeState.func_state.next_version
    • _PyRuntimeState.types.next_version_tag  (issue) (special-case for global/immortal types)
    • _PyRuntimeState.cached_objects.str_replace_inf
    • _PyRuntimeState.cached_objects.interned
  • currently protected by the GIL (need granular locks for per-interpreter GIL)
    • _PyRuntimeState.exitfuncs (race in Py_AtExit())
    • _PyRuntimeState.nexitfuncs (race in Py_AtExit())
    • _PyRuntimeState.allocators (PyMem_SetAllocator() with "wrappers"; also when using mem/obj allocators)
    • _PyRuntimeState.audit_hook_head (race in PySys_AddAuditHook())
    • _PyRuntimeState.ceval.perf
    • Objects/longobject.c - (long_from_non_binary_base) log_base_BASE (lazy)
    • Objects/longobject.c - (long_from_non_binary_base) convwidth_base (lazy)
    • Objects/longobject.c - (long_from_non_binary_base) convmultmax_base (lazy)
  • both (move some state and keep some state; copy)
    • Objects/object.c - _Py_RefTotal  (issue)

Moves to PyInterpreterState with the GIL (not part of this issue):

  • _PyRuntimeState.gilstate.tstate_current

Also note:

  • the GIL is currently fine as it is (part of _PyRuntimeState); moving it to PyInterpreterState is a separate matter
  • we'll handle faulthandler and tracemalloc state separately
old list

The following state needs further investigation and maybe further attention:

  • maybe isolate to PyInterpreterState
    • _PyRuntimeState.allocators.obj_arena (effectively coupled to _PyRuntime.obmalloc)
    • _PyRuntimeState.obmalloc.dump_debug_stats
    • ...
    • _PyRuntimeState.signals.wakeup
    • _PyRuntimeState.signals.is_tripped
    • _PyRuntimeState.signals.unhandled_keyboard_interrupt
    • ...
    • _PyRuntimeState.tracemalloc.traced_memory
    • _PyRuntimeState.tracemalloc.peak_traced_memory
    • _PyRuntimeState.tracemalloc.traces
    • _PyRuntimeState.tracemalloc.domains
    • ...
    • _PyRuntimeState.parser.memo_statistics
  • maybe protected by the GIL
    • _PyRuntimeState.fileutils.force_ascii
    • Modules/posixmodule.c - (os_dup2_impl) dup3_works (lazy)
    • Modules/faulthandler.c - (faulthandler_dump_traceback) reentrant
    • Objects/unicodeobject.c - bloom_linebreak (lazy)
    • Python/pylifecycle.c - (_Py_FatalErrorFormat) reentrant
    • Python/pylifecycle.c - (fatal_error) reentrant
  • currently protected by the GIL but maybe isolate to PyInterpreterState
    • _PyRuntimeState.imports.last_module_index
    • _PyRuntimeState.imports.extensions
    • _PyRuntimeState.imports.pkgcontext
    • ...
    • _PyRuntimeState.tracemalloc.filenames
    • _PyRuntimeState.tracemalloc.traceback
    • _PyRuntimeState.tracemalloc.tracebacks
  • maybe either
    • _PyRuntimeState.open_code_hook
    • _PyRuntimeState.open_code_userdata
    • ...
    • _PyRuntimeState.faulthandler.fatal_error
    • _PyRuntimeState.faulthandler.thread
    • _PyRuntimeState.faulthandler.user_signals
    • _PyRuntimeState.faulthandler.stack
    • _PyRuntimeState.faulthandler.old_stack
    • ...
    • _PyRuntimeState.tracemalloc.config
    • _PyRuntimeState.tracemalloc.allocators
    • _PyRuntimeState.tracemalloc.tables_lock
    • ...
    • Parser/action_helpers.c - (_PyPegen_dummy_name) cache
  • safe as-is but may be unnecessary (or need other solution)
    • _PyRuntimeState.gilstate.check_enabled
    • _PyRuntimeState.gilstate.autoInterpreterState

@corona10
Copy link
Member

corona10 commented Dec 14, 2022

Objects/object.c - _Py_RefTotal

Hmm how about introducing a new debug API: _Py_Get_RefTotal()?
And then move _Py_RefTotal into per interpreter attribute(_ref_total) from the global.
_Py_Get_RefTotal() will return the aggregated value of each interpreter _ref_total value..
Current _Py_RefTotal will only return the main interpreter value and will be deprecated.

Ahh.. But it looks like stable API, I am not sure it's proper approach..

@encukou
Copy link
Member

encukou commented Dec 14, 2022

It's OK to deprecate it. It just can't be removed :)
If PEP 684 is accepted, _Py_RefTotal should be used for old-style interpreters, and interpreters with own_gil could get their own counter.

@markshannon
Copy link
Member

Before proceeding with these changes, I think we need some up-front design.
(TBH I think we need more up-front design for a lot of things in CPython)

Memory allocation

We need some sort of global memory allocator. Is a per-interpreter ob_malloc, backed by a global allocator, presumably malloc the correct design? It might be, but we seem to be making an architectural decision by accident here.

An alternative design for allocation, that I would like to try, is per-interpreter size-based freelists backed by a global allocator.
This allows backwards compatibility with pluggable allocators and is fast. For more details faster-cpython/ideas#132

Tracemalloc

Tracemalloc data structures should be per-interpreter. Access to global values (like process-wide total memory traced) will need to be done through a function interface.

Efficiency

As more and more of the VM becomes per-interpreter, the performance impact of getting the interpreter with a C-API that was not designed with it in mind becomes larger.
@vstinner added a version of many C API functions that take the thread-state, but these aren't joined up, so there is a lot of PyThreadState_Get() calls scattered throughout the code.

We need to decide whether to pass the thread state or interpreter state.
IMO, we should follow the lead of HPy and pass the interpreter state.
We will need new API functions in many places, but many of those will be simple mechanical transformations.

Backwards compatibility

_Py_RefTotal is only defined if Py_REF_DEBUG is defined.
All updates to _Py_RefTotal can be made atomic (which will have terrible performance with multiple parallel interpreters) to preserve the API.

Ownership (runtime or per-interpreter) and constants

Making something const means it can be global (and shouldn't be in the runtime state, which is mutable).
A few of the values listed above are constants. E.g. log_base_BASE

Some values are not const but are constant once initialized. These can be shared without locks. These values should be grouped separately from values that are shared and mutable and will need locks.

Nothing should be in the "maybe both" category above.

@ericsnowcurrently
Copy link
Member Author

FYI, I have a plan for _Py_RefTotal that should preserve stable ABI compatibility without breaking interpreter isolation (and without relying on granular locks). Basically:

  • deprecate _Py_Reftotal but keep exporting the symbol (for stable ABI compatibility)
  • add _PyRuntimeState.object_state.last_legacy_reftotal
  • add _PyRuntimeState.object_state.reftotal
  • update everywhere to use the reftotal field (for simplicity, add #define _Py_RefTotal _PyRuntime.object_state.reftotal)

We would update _Py_GetRefTotal() to something like this:

Py_ssize_t
_Py_GetRefTotal(void)
{
    // _Py_GetLegacyRefTotal() returns the value of the actual `_Py_RefTotal` global.
    Py_ssize_t legacy = _Py_GetLegacyRefTotal();
    _PyRuntimeState.object_state.reftotal += legacy - _PyRuntimeState.object_state.last_legacy_reftotal;
    _PyRuntimeState.object_state.last_legacy_reftotal = legacy;
    return _PyRuntimeState.object_state.reftotal;
}

All this assumes that folks are not using _Py_RefTotal directly.


For a per-interpreter GIL, I see two options:

  1. add a granular lock around modifying _PyRuntimeState.object_state.reftotal, which would hurt the performance of incref/decref, but only on debug builds
  2. switch to PyInterpreterState.object_state.reftotal and aggregate the values from all interpreters in _Py_GetRefTotal()

@ericsnowcurrently
Copy link
Member Author

FYI, I have a plan for _Py_RefTotal that should preserve stable ABI compatibility without breaking interpreter isolation

FWIW, the 3.10 changes to Py_INCREF() and Py_DECREF() help here because _Py_RefTotal is no longer used by stable ABI extensions (other than ones compiled against 3.9 and earlier).

@ericsnowcurrently
Copy link
Member Author

Before proceeding with these changes, I think we need some up-front design.

Yes, for any changes that don't make sense without a per-interpreter GIL, we would not merge them before PEP 684 is accepted.

All the points you have brought up are worth discussing. Note, though, that I consider the current per-interpreter GIL work to be a minimal approach to getting there, with at little disruption as possible while tackling a variety of improvements that make sense anyway, including improving the isolation of the existing interpreters. Much of what you've brought up are important points, but ones that will require much more discussion, take substantially more effort, and introduce more disruption. I don't think we should hold up the current project for a resolution of those broader matters.

That said, in the context of the above checklists, I agree that we must be very conservative in how we bake in any new design/architecture. I just consider isolated interpreters to be an old design (with a less than ideal API and an incomplete implementation) that we are carrying through. Improving the design is a separate project, which I wholly support!

@ericsnowcurrently
Copy link
Member Author

FWIW, I made a comment suggesting we leave some process-level globals as static variables and not moving them to the interpreter state:

#100262 (comment)

More generally, for fields (or variables) that are currently thread-safe due to the GIL we would solve them one of two ways (for a per-interpreter GIL):

  • use a global lock (other than the GIL) for the same effect
  • move the field to PyInterpreterState

Which one we do will depend on the following:

  • how hot the field/variable is
  • how well a move fits conceptually
  • how much code churn there would be
  • if there would be any changes in semantics
  • (likely other criteria)

@ericsnowcurrently
Copy link
Member Author

@pablogsal, regarding the currently runtime-global state for the perf trampoline:

struct {
#ifdef PY_HAVE_PERF_TRAMPOLINE
perf_status_t status;
Py_ssize_t extra_code_index;
struct code_arena_st *code_arena;
struct trampoline_api_st trampoline_api;
FILE *map_file;
#else
int _not_used;
#endif
} perf;

What parts make sense moving to PyInterpreterState? (For a per-interpreter GIL we'd probably want a granular lock for the rest.) My rough guess is that we'd move extra_code_index at least. I want to say all of it makes sense as per-interpreter, but I'm not quite sure.

@ericsnowcurrently
Copy link
Member Author

@markshannon, in which contexts do any of the version tags (e.g. dict version) make sense as a globally-unique value. Would there be any problem moving them all to PyInterpreterState?

@ericsnowcurrently
Copy link
Member Author

@pablogsal, would it make sense to move _PyRuntimeState.parser.memo_statistics to PyInterpreterState? What would be the value of keeping it runtime-global? (If needed, it would be easy enough to aggregate the value from each interpreter after the fact.)

@pablogsal
Copy link
Member

pablogsal commented Dec 16, 2022

@pablogsal, would it make sense to move _PyRuntimeState.parser.memo_statistics to PyInterpreterState? What would be the value of keeping it runtime-global? (If needed, it would be easy enough to aggregate the value from each interpreter after the fact.)

I don't think so. These structures are mainly for debugging so I think it doesn't make sense to worry a lot to make them compatibles with multiple interpreters because you would only use them when debugging the parser. They are not even active in debug mode IIRC as there are only activated by hand.

@pablogsal
Copy link
Member

@pablogsal, regarding the currently runtime-global state for the perf trampoline:

struct {
#ifdef PY_HAVE_PERF_TRAMPOLINE
perf_status_t status;
Py_ssize_t extra_code_index;
struct code_arena_st *code_arena;
struct trampoline_api_st trampoline_api;
FILE *map_file;
#else
int _not_used;
#endif
} perf;

What parts make sense moving to PyInterpreterState? (For a per-interpreter GIL we'd probably want a granular lock for the rest.) My rough guess is that we'd move extra_code_index at least. I want to say all of it makes sense as per-interpreter, but I'm not quite sure.

That depends on how we decide that the perf trampoline should behave with multiple interpreters. The easiest thing to do is just activate it globally (which is what I think it makes more sense and is more maintainable). In this case we just need the code index because code objects are per interpreter IIRC

@ericsnowcurrently
Copy link
Member Author

The easiest thing to do is just activate it globally (which is what I think it makes more sense and is more maintainable)

That's fine with me. Would it be okay to restrict sys.activate_stack_trampoline() to the main interpreter then?


FWIW, this is part of a relatively small class of runtime-global state that (conceptually or concretely) manages or utilizes certain process-global resources. We encountered this with the syslog module recently and applies a similar restriction (after some discussion). In fact, I'm growing confident that, in these cases, restricting to the main interpreter is the cleanest solution that does not sacrifice our ability to expand support later or go in a different direction.

AFAICS, the main alternative to support a per-interpreter GIL (assuming that is approved), aside from moving to PyInterpreterState, is to add fine-grained locks around the relevant state. (That said, I would argue that better interpreter isolation is valuable, regardless of a per-interpreter GIL.)

Furthermore, while some currently global state/resources are conceptually better suited as per-interpreter, other such state and resources make the most sense staying global. So far, it has made sense to tie them to the main interpreter each of the (few) cases we've seen. This aligns with the prevalent idea that the app should be responsible for managing certain global resources, and the "app" in Python is essentially the __main__ module, running in the main thread of the main interpreter.

@pablogsal
Copy link
Member

Would it be okay to restrict sys.activate_stack_trampoline() to the main interpreter then?

I think it should be activated in all interpreters at the same time. The switch should be global, the same way is global for all threads.

@markshannon
Copy link
Member

in which contexts do any of the version tags (e.g. dict version) make sense as a globally-unique value?

  • Function objects: Per interpreter
  • Dict keys: Per interpreter
  • Type versions: Global for static, immutable types, like int. Otherwise per-interpreter.

@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented Jan 31, 2023

Having looked through the faulthandler code, it definitely leans on whichever interpreter calls into the module (assuming -X faulthandler or -X dev isn't used). We could improve isolation there, as well as expand the output to cover all interpreters (rather than just one), but I'm not sure it's worth the effort at the moment. So I'm going to mark it as such.

(We'll address this separately in gh-101509.)

@ericsnowcurrently
Copy link
Member Author

The same goes for tracemalloc: gh-101520.

warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
…rpreters (pythongh-102925)

This is effectively two changes.  The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.).  The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly.

Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict).  _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters.  This is especially important as we move the obmalloc state to each interpreter (pythongh-101660), as well as, potentially, the GIL (PEP 684).

python#100227
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
…obal Interned Dict Safe for Isolated Interpreters" (pythongh-103063)

This reverts commit 87be8d9.

This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
…ate (pythongh-102339)

We can revisit the options for keeping it global later, if desired.  For now the approach seems quite complex, so we've gone with the simpler isolation solution in the meantime.

python#100227
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
…Interpreters (pythongh-103084)

Sharing mutable (or non-immortal) objects between interpreters is generally not safe.  We can work around that but not easily. 
 There are two restrictions that are critical for objects that break interpreter isolation.

The first is that the object's state be guarded by a global lock.  For now the GIL meets this requirement, but a granular global lock is needed once we have a per-interpreter GIL.

The second restriction is that the object (and, for a container, its items) be deallocated/resized only when the interpreter in which it was allocated is the current one.  This is because every interpreter has (or will have, see pythongh-101660) its own object allocator.  Deallocating an object with a different allocator can cause crashes.

The dict for the cache of module defs is completely internal, which simplifies what we have to do to meet those requirements.  To do so, we do the following:

* add a mechanism for re-using a temporary thread state tied to the main interpreter in an arbitrary thread
   * add _PyRuntime.imports.extensions.main_tstate` 
   * add _PyThreadState_InitDetached() and _PyThreadState_ClearDetached() (pystate.c)
   * add _PyThreadState_BindDetached() and _PyThreadState_UnbindDetached() (pystate.c)
* make sure the cache dict (_PyRuntime.imports.extensions.dict) and its items are all owned by the main interpreter)
* add a placeholder using for a granular global lock

Note that the cache is only used for legacy extension modules and not for multi-phase init modules.

python#100227
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
Decref the key in the right interpreter in _extensions_cache_set().

This is a follow-up to pythongh-103084. I found the bug while working on pythongh-101660.
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
…it (pythongh-103315)

This cleans things up a bit and simplifies adding new granular global locks.
ericsnowcurrently added a commit that referenced this issue Apr 24, 2023
Deep-frozen code objects are cannot be shared (currently) by
interpreters, due to how adaptive specialization can modify the
bytecodes. We work around this by only using the deep-frozen objects in
the main interpreter. This does incur a performance penalty for
subinterpreters, which we may be able to resolve later.
carljm added a commit to carljm/cpython that referenced this issue Apr 24, 2023
* main:
  pythongh-100227: Only Use deepfreeze for the Main Interpreter (pythongh-103794)
  pythongh-103492: Clarify SyntaxWarning with literal comparison (python#103493)
  pythongh-101100: Fix Sphinx warnings in `argparse` module (python#103289)
carljm added a commit to carljm/cpython that referenced this issue Apr 24, 2023
* superopt:
  pythongh-100227: Only Use deepfreeze for the Main Interpreter (pythongh-103794)
  pythongh-103492: Clarify SyntaxWarning with literal comparison (python#103493)
  pythongh-101100: Fix Sphinx warnings in `argparse` module (python#103289)
ericsnowcurrently added a commit that referenced this issue Apr 25, 2023
…gh-103460)

The lock is unnecessary as long as there's a GIL, but completely
necessary with a per-interpreter GIL.
ericsnowcurrently added a commit that referenced this issue Jun 8, 2023
The risk of a race with this state is relatively low, but we play it safe anyway.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 8, 2023
…gh-105514)

The risk of a race with this state is relatively low, but we play it safe anyway.
(cherry picked from commit 7799c8e)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
ericsnowcurrently added a commit that referenced this issue Jun 8, 2023
The risk of a race with this state is relatively low, but we play it safe anyway.
ericsnowcurrently added a commit that referenced this issue Jun 8, 2023
…5514) (gh-105517)

The risk of a race with this state is relatively low, but we play it safe anyway.
(cherry picked from commit 7799c8e)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
ericsnowcurrently added a commit to ericsnowcurrently/cpython that referenced this issue Jun 8, 2023
The risk of a race with this state is relatively low, but we play it safe anyway.
ericsnowcurrently added a commit that referenced this issue Jun 8, 2023
…h-105516)

The risk of a race with this state is relatively low, but we play it safe anyway. We do avoid using the lock in performance-sensitive cases where the risk of a race is very, very low.
ericsnowcurrently added a commit that referenced this issue Jun 8, 2023
…h-105525)

The risk of a race with this state is relatively low, but we play it safe anyway.
(cherry picked from commit e822a67)
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 8, 2023
…ate (pythongh-105516)

The risk of a race with this state is relatively low, but we play it safe anyway. We do avoid using the lock in performance-sensitive cases where the risk of a race is very, very low.
(cherry picked from commit 68dfa49)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
ericsnowcurrently added a commit that referenced this issue Jun 8, 2023
…tate (gh-105516) (gh-105532)

The risk of a race with this state is relatively low, but we play it safe anyway. We do avoid using the lock in performance-sensitive cases where the risk of a race is very, very low.
(cherry picked from commit 68dfa49)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

5 participants