diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 98b068b4f81..b26159bf04f 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -1,5 +1,5 @@ PEP: 788 -Title: PyInterpreterRef: Interpreter References in the C API +Title: Protecting the C API from Interpreter Finalization Author: Peter Bierma Sponsor: Victor Stinner Discussions-To: https://discuss.python.org/t/93653 @@ -15,38 +15,65 @@ Post-History: `10-Mar-2025 `__, Abstract ======== +This PEP introduces a suite of functions in the C API to safely attach to an +interpreter. For example: + +.. code-block:: c + + static int + thread_function(PyInterpreterView view) + { + PyInterpreterLock lock = PyInterpreterLock_AcquireView(view); + if (lock == 0) { + return -1; + } + PyThreadView thread_view = PyThreadState_Ensure(lock); + if (thread_view == 0) { + PyInterpreterLock_Release(lock); + return -1; + } + + /* Call Python code, without worrying about the thread hanging due to + finalization. */ + + PyThreadState_Release(thread_view); + PyInterpreterLock_Release(); + return 0; + } + +In addition, the APIs in the ``PyGILState`` family are deprecated by this +proposal. + +Background +========== + In the C API, threads are able to interact with an interpreter by holding an -:term:`attached thread state` for the current thread. This works well, but -can get complicated when it comes to creating and attaching -:term:`thread states ` in a thread-safe manner. - -Specifically, the C API doesn't have any way to ensure that an interpreter -is in a state where it can be called when creating and/or attaching a thread -state. As such, attachment might hang the thread, or it might flat-out crash -due to the interpreter's structure being deallocated in subinterpreters. +:term:`attached thread state` for the current thread. This can get complicated +when it comes to creating and attaching :term:`thread states ` +in a safe manner, because any non-Python thread (one not created via the +:mod:`threading` module) is considered to be "daemon", meaning that the interpreter +won't wait on that thread before shutting down. Instead, the interpreter will hang the +thread when it goes to attach a thread state, making the thread unusable past that +point. + +Attaching a thread state can happen at any point when invoking Python, such +as in-between bytecode instructions (to yield the :term:`GIL` to a different thread), +or when a C function exits a :c:macro:`Py_BEGIN_ALLOW_THREADS` block, so simply +guarding against whether the interpreter is finalizing isn't enough to safely +call Python code. (Note that hanging the thread is a relatively new behavior; +in older versions, the thread would exit, but the issue is the same.) + +Currently, the C API doesn't have any way to ensure that an interpreter +is in a state where it won't hang a thread when trying to attach. This can be a frustrating issue to deal with in large applications that want to execute Python code alongside some other native code. -In addition, assumptions about which interpreter to use tend to be wrong -inside of subinterpreters, primarily because :c:func:`PyGILState_Ensure` -always creates a thread state for the main interpreter in threads where -Python hasn't ever run. - -This PEP intends to solve these kinds issues through the introduction of -interpreter references that prevent an interpreter from finalizing (or more -technically, entering a stage in which attachment of a thread state hangs). -This allows for more structure and reliability when it comes to thread state -management, because it forces a layer of synchronization between the -interpreter and the caller. - -With this new system, there are a lot of changes needed in CPython and -third-party libraries to adopt it. For example, in APIs that don't require -the caller to hold an attached thread state, a strong interpreter reference -should be passed to ensure that it targets the correct interpreter, and that -the interpreter doesn't concurrently deallocate itself. The best example of -this in CPython is :c:func:`PyGILState_Ensure`. As part of this proposal, -:c:func:`PyThreadState_Ensure` is provided as a modern replacement that -takes a strong interpreter reference. +In addition, a common pattern among users creating non-Python threads is to +use :c:func:`PyGILState_Ensure`, which was introduced in :pep:`311`. This has +been very unfortunate for subinterpreters, because :c:func:`PyGILState_Ensure` +tends to choose to create a thread state for the main interpreter instead of the current interpreter. This leads +to thread-safety issues when extensions create threads that interact with the +Python interpreter, because assumptions about the GIL are incorrect. Motivation ========== @@ -55,10 +82,8 @@ Non-Python Threads Always Hang During Finalization -------------------------------------------------- Many large libraries might need to call Python code in highly-asynchronous -situations where the desired interpreter -(:ref:`typically the main interpreter `) -could be finalizing or deleted, but want to continue running code after -invoking the interpreter. This desire has been +situations where the desired interpreter could be finalizing or deleted, but +want to continue running code after invoking the interpreter. This desire has been `brought up by users `_. For example, a callback that wants to call Python code might be invoked when: @@ -84,26 +109,14 @@ Generally, this pattern would look something like this: /* ... */ } -In the current C API, any non-Python thread (one not created via the -:mod:`threading` module) is considered to be "daemon", meaning that the interpreter -won't wait on that thread before shutting down. Instead, the interpreter will hang the -thread when it goes to :term:`attach ` a :term:`thread state`, -making the thread unusable past that point. Attaching a thread state can happen at -any point when invoking Python, such as in-between bytecode instructions -(to yield the :term:`GIL` to a different thread), or when a C function exits a -:c:macro:`Py_BEGIN_ALLOW_THREADS` block, so simply guarding against whether the -interpreter is finalizing isn't enough to safely call Python code. (Note that hanging -the thread is relatively new behavior; in prior versions, the thread would exit, -but the issue is the same.) - This means that any non-Python thread may be terminated at any point, which is severely limiting for users who want to do more than just execute Python code in their stream of calls. -``Py_IsFinalizing`` Is Insufficient -*********************************** +``Py_IsFinalizing`` Is Not Atomic +********************************* -The :ref:`docs ` +Due to the problem mentioned previously, the :ref:`docs ` currently recommend :c:func:`Py_IsFinalizing` to guard against termination of the thread: @@ -113,23 +126,31 @@ the thread: interpreter is in process of being finalized before calling this function to avoid unwanted termination. -Unfortunately, this isn't correct, because of time-of-call to time-of-use +Unfortunately, this doesn't work reliably, because of time-of-call to time-of-use issues; the interpreter might not be finalizing during the call to :c:func:`Py_IsFinalizing`, but it might start finalizing immediately afterwards, which would cause the attachment of a thread state to hang the thread. -Daemon Threads Can Break Finalization -************************************* +Users have `expressed a desire `_ for an +atomic way to call ``Py_IsFinalizing`` in the past. + +Locks in Native Extensions Can Be Unusable During Finalization +-------------------------------------------------------------- -When acquiring locks, it's extremely important to detach the thread state to -prevent deadlocks. This is true on both the with-GIL and free-threaded builds. +When acquiring locks in a native API, it's common to release the GIL (or +critical sections on the free-threaded build) to avoid lock-ordering deadlocks. +This can be problematic during finalization, because threads holding locks might +be hung. For example: -When the GIL is enabled, a deadlock can occur pretty easily when acquiring a -lock if the GIL wasn't released; thread A grabs a lock, and starts waiting on -its thread state to attach, while thread B holds the GIL and is waiting on the -lock. A similar deadlock can occur on the free-threaded build during stop-the-world -pauses when running the garbage collector. +1. A thread goes to acquire a lock, first detaching its thread state to avoid + deadlocks. +2. The main thread begins finalization and tells all thread states to hang + upon attachment. +3. The thread acquires the lock it was waiting on, but is then hung by attempting + to reattach its thread state via :c:macro:`Py_END_ALLOW_THREADS`. +4. The main thread can no longer acquire the lock, because the thread holding it + has been hung. This affects CPython itself, and there's not much that can be done to fix it with the current API. For example, @@ -140,91 +161,10 @@ for :data:`sys.stderr`, and then a finalizer tried to write to it. Ideally, a thread should be able to temporarily prevent the interpreter from hanging it while it holds the lock. -However, it's generally unsafe to acquire Python locks (for example, -:class:`threading.Lock`) in finalizers, because the garbage collector -might run while the lock is held, which would deadlock if another finalizer -tried to acquire the lock. This does not apply to many C locks, such as with -:data:`sys.stderr`, because Python code cannot be run while the lock is held. -This PEP intends to fix this problem for C locks, not Python locks. - -Daemon Threads Are Not the Problem -********************************** - -Prior to this PEP, deprecating daemon threads was discussed -`extensively `_. Daemon threads technically -cause many of the issues outlined in this proposal, so removing daemon threads -could be seen as a potential solution. The main argument for removing daemon -threads is that they're a large cause of problems in the interpreter -`[1] `_. - - Except that daemon threads don’t actually work reliably. They’re attempting - to run and use Python interpreter resources after the runtime has been shut - down upon runtime finalization. As in they have pointers to global state for - the interpreter. - -However, in practice, daemon threads are useful for simplifying many threading -applications in Python, and since the program is about to close in most cases, -it's not worth the added complexity to try and gracefully shut down a thread -`[2] `_. - - When I’ve needed daemon threads, it’s usually been the case of “Long-running, - uninterruptible, third-party task” in terms of the examples in the linked issue. - Basically I’ve had something that I need running in the background, but I have - no easy way to terminate it short of process termination. Unfortunately, I’m on - Windows, so ``signal.pthread_kill`` isn’t an option. I guess I could use the - Windows Terminate Thread API, but it’s a lot of work to wrap it myself compared - to just letting process termination handle things. - -Finally, removing Python-level daemon threads does not fix the whole problem. -As noted by this PEP, extension modules are free to create their own threads -and attach thread states for them. Similar to daemon threads, Python doesn't -try and join them during finalization, so trying to remove daemon threads -as a whole would involve trying to remove them from the C API, which would -require a much more massive API change than what is currently being proposed -`[3] `_. - - Realize however that even if we get rid of daemon threads, extension - module code can and does spawn its own threads that are not tracked by - Python. ... Those are realistically an alternate form of daemon thread - ... and those are never going to be forbidden. - -Joining the Thread Isn't Always a Good Idea -******************************************* - -Even in daemon threads, it's generally *possible* to prevent hanging of -non-Python threads through :mod:`atexit` functions. -A thread could be started by some C function, and then as long as -that thread is joined by :mod:`atexit`, then the thread won't hang. - -:mod:`atexit` isn't always an option for a function, because to call it, it -needs to already have an :term:`attached thread state` for the thread. If -there's no guarantee of that, then :func:`atexit.register` cannot be safely -called without the risk of hanging the thread. This shifts the contract -of joining the thread to the caller rather than the callee, which again, -isn't reliable enough in practice to be a viable solution. - -For example, large C++ applications might want to expose an interface that can -call Python code. To do this, a C++ API would take a Python object, and then -call :c:func:`PyGILState_Ensure` to safely interact with it (for example, by -calling it). If the interpreter is finalizing or has shut down, then the thread -is hung, disrupting the C++ stream of calls. - -The GIL-state APIs Are Buggy and Confusing ------------------------------------------- - -There are currently two public ways for a user to create and attach a -:term:`thread state` for their thread; manual use of :c:func:`PyThreadState_New` -and :c:func:`PyThreadState_Swap`, or the convenient :c:func:`PyGILState_Ensure`. - -The latter, :c:func:`PyGILState_Ensure`, is significantly more common, having -`nearly 3,000 hits `_ in a code -search, whereas :c:func:`PyThreadState_New` has -`less than 400 hits `_. - .. _pep-788-hanging-compat: Finalization Behavior for ``PyGILState_Ensure`` Cannot Change -************************************************************* +------------------------------------------------------------- There will always have to be a point in a Python program where :c:func:`PyGILState_Ensure` can no longer attach a thread state. @@ -241,22 +181,11 @@ the thread or emit a fatal error, as noted in proceed. The API was designed as "it'll block and only return once it has the GIL" without any other option. -For this reason, we can't make any real changes to how :c:func:`PyGILState_Ensure` +As a result, CPython can't make any real changes to how :c:func:`PyGILState_Ensure` works during finalization, because it would break existing code. -``PyGILState_Ensure`` Generally Crashes During Finalization -*********************************************************** - -At the time of writing, the current behavior of :c:func:`PyGILState_Ensure` does not -always match the documentation. Instead of hanging the thread during finalization -as previously noted, it's possible for it to crash with a segmentation -fault. This is a `known issue `_ -that could be fixed in CPython, but it's definitely worth noting -here, because acceptance and implementation of this PEP will likely fix -the existing crashes caused by :c:func:`PyGILState_Ensure`. - The Term "GIL" Is Tricky for Free-threading -******************************************* +------------------------------------------- A large issue with the term "GIL" in the C API is that it is semantically misleading. This was noted in `python/cpython#127989 @@ -297,256 +226,180 @@ used on objects shared between the threads. For example, if the thread had access to object A, which belongs to a subinterpreter, but then called :c:func:`PyGILState_Ensure`, the thread would have an :term:`attached thread state` pointing to the main interpreter, -not the subinterpreter. This means that any :term:`GIL` assumptions about the -object are wrong! There isn't any synchronization between the two GILs, so both -the thread and the main thread could try to increment the object's reference count -at the same time, causing a data race. +not the subinterpreter. This means that any GIL assumptions about the +object are wrong, because there isn't any synchronization between the two GILs. + +There's not any great way to solve this, other than introducing a new API that +explicitly takes an interpreter from the caller. -An Interpreter Can Concurrently Deallocate ------------------------------------------- +Subinterpreters Can Concurrently Deallocate +------------------------------------------- The other way of creating a non-Python thread, :c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`, is a lot better for supporting subinterpreters (because :c:func:`PyThreadState_New` takes an explicit interpreter, rather than assuming that the main interpreter was requested), but is still limited by the -current hanging problems in the C API. Manual creation of thread states -("manual" in contrast to the implicit creation of one in -:c:func:`PyGILState_Ensure`) does not solve any of the aforementioned -thread-safety issues with thread states. - -In addition, subinterpreters typically have a much shorter lifetime than the -main interpreter, so if there was no synchronization between the calling thread -and the created thread, there's a much higher chance that an interpreter-state -passed to a thread will have already finished and have been deallocated, -causing use-after-free crashes. As of writing, this is a relatively -theoretical problem, but it's likely this will become more of an issue -in newer versions with the recent acceptance of :pep:`734`. +current hanging problems in the C API, and is subject to crashes when the +subinterpreter finalizes before the thread has a chance to start. This is because +in subinterpreters, the ``PyInterpreterState *`` structure is allocated on the +heap, whereas the main interpreter is statically allocated on the Python runtime +state. Rationale ========= -Preventing Interpreter Shutdown With Reference Counting -------------------------------------------------------- +Preventing Interpreter Shutdown +------------------------------- -This PEP takes an approach where an interpreter is given a reference count -that prevents it from shutting down. So, holding a "strong reference" to the -interpreter will make it safe to call the C API without worrying about the -thread being hung. +This PEP takes an approach where an interpreter comes with a locking API +that prevents it from shutting down. Holding an interpreter lock will make it +safe to call the C API without worrying about the thread being hung. This means that interfacing Python (for example, in a C++ library) will need -a reference to the interpreter in order to safely call the object, which is -definitely more inconvenient than assuming the main interpreter is the right -choice, but there's not really another option. A future proposal could perhaps -make this cleaner by adding a tracking mechanism for an object's interpreter -(such as a field on :c:type:`PyObject`). - -Generally speaking, a strong interpreter reference should be short-lived. An -interpreter reference should act similar to a lock, or a "critical section", -where the interpreter must not hang the thread or deallocate. For example, -when acquiring an IO lock, a strong interpreter reference should be acquired -before locking, and then released once the lock is released. - -Weak References -*************** - -This proposal also comes with weak references to an interpreter that don't -prevent it from shutting down, but can be promoted to a strong reference when -the user decides that they want to call the C API. If an interpreter is -destroyed or past the point where it can create strong references, promotion -of a weak reference will fail. - -A weak reference will typically live much longer than a strong reference. -This is useful for many of the asynchronous situations stated previously, -where the thread itself shouldn't prevent the desired interpreter from shutting -down, but also allow the thread to execute Python when needed. - -For example, a (non-reentrant) event handler may store a weak interpreter -reference in its ``void *arg`` parameter, and then that weak reference will -be promoted to a strong reference when it's time to call Python code. - -Removing the Outdated GIL-state APIs ------------------------------------- - -Due to the unfixable issues with ``PyGILState``, this PEP intends to do away -with them entirely. In today's C API, all ``PyGILState`` functions are -replaceable with ``PyThreadState`` counterparts that are compatibile with -subinterpreters: - -- :c:func:`PyGILState_Ensure`: :c:func:`PyThreadState_Swap` & :c:func:`PyThreadState_New` -- :c:func:`PyGILState_Release`: :c:func:`PyThreadState_Clear` & :c:func:`PyThreadState_Delete` -- :c:func:`PyGILState_GetThisThreadState`: :c:func:`PyThreadState_Get` (roughly) -- :c:func:`PyGILState_Check`: ``PyThreadState_GetUnchecked() != NULL`` +a lock to the interpreter in order to safely call the object, which is more +inconvenient than assuming the main interpreter is the right choice, but +there's not really another option. -This PEP specifies a deprecation for these functions (while remaining -in the stable ABI), because :c:func:`PyThreadState_Ensure` and -:c:func:`PyThreadState_Release` will act as more-correct replacements for -:c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`, due to the -requirement of a specific interpreter. - -The exact details of this deprecation aren't too clear. It's likely that -the usual five-year deprecation (as specificed by :pep:`387`) will be too -short, so for now, these functions will have no specific removal date. +This proposal also comes with "views" to an interpreter that can be used to +safely poke at an interpreter that may be dead or alive. Using a view, users +can acquire an interpreter lock at any point during its lifecycle, and +will safely fail if the interpreter can no longer support calling Python code. Compatibility Shim for ``PyGILState_Ensure`` -------------------------------------------- -This proposal comes with :c:func:`PyUnstable_GetDefaultInterpreterRef` as a +This proposal comes with :c:func:`PyUnstable_InterpreterView_FromDefault` as a compatibility hack for some users of :c:func:`PyGILState_Ensure`. It is a -thread-safe way to acquire a strong reference to the main (or "default") +thread-safe way to acquire a lock to the main (or "default") interpreter. The main drawback to porting new code to :c:func:`PyThreadState_Ensure` is that it isn't a drop-in replacement for :c:func:`!PyGILState_Ensure`, as it needs -an interpreter reference argument. In some large applications, refactoring to -use a :c:type:`PyInterpreterRef` everywhere might be tricky; so, this function -acts as a silver bullet for users who explicitly want to disallow support for +an interpreter lock argument. In some large applications, refactoring to +use a :c:type:`PyInterpreterLock` everywhere might be tricky; so, this function +acts as a last resort for users who explicitly want to disallow support for subinterpreters. Specification ============= -Interpreter References to Prevent Shutdown ------------------------------------------- - -An interpreter will keep a reference count that's managed by users of the -C API. When the interpreter starts finalizing, it will wait until its reference -count reaches zero before proceeding to a point where threads will be hung and -it may deallocate its state. The interpreter will wait on its reference count -around the same time when :class:`threading.Thread` objects are joined, but -note that this *is not* the same as joining the thread; the interpreter will -only wait until the reference count is zero, and then proceed. -After the reference count has reached zero, threads can no longer prevent the -interpreter from shutting down (thus :c:func:`PyInterpreterRef_FromCurrent` and -:c:func:`PyInterpreterWeakRef_Promote` will fail). +Interpreter Locks +----------------- -A weak reference to an interpreter won't prevent it from finalizing, and can -be safely accessed after the interpreter no longer supports creating strong -references, and even after the interpreter-state has been deleted. Deletion -and duplication of the weak reference will always be allowed, but promotion -(:c:func:`PyInterpreterWeakRef_Promote`) will always fail after the -interpreter reaches a point where strong references have been waited on. +.. c:type:: PyInterpreterLock -Strong Interpreter References -***************************** + An opaque interpreter lock. -.. c:type:: PyInterpreterRef + By holding an interpreter lock, the caller can know that the interpreter + will be in a state where it can safely execute Python code. - An opaque, strong reference to an interpreter. - - The interpreter will wait until a strong reference has been released - before shutting down. + This is a special type of "readers-writers" lock; threads may hold an + interpreter's lock concurrently, and the interpreter will have to wait + until all threads have released the lock until it can enter finalization. This type is guaranteed to be pointer-sized. -.. c:function:: PyInterpreterRef PyInterpreterRef_FromCurrent(void) - - Acquire a strong reference to the current interpreter. +.. c:function:: PyInterpreterLock PyInterpreterLock_AcquireCurrent(void) - On success, this function returns a strong reference to the current - interpreter, and returns ``0`` with an exception set on failure. + Acquire a lock for the current interpreter. - Failure typically indicates that the interpreter has already finished - waiting on strong references. + On success, this function locks the interpreter and returns an opaque + reference to the lock, or returns ``0`` with an exception set on failure. The caller must hold an :term:`attached thread state`. -.. c:function:: PyInterpreterRef PyUnstable_GetDefaultInterpreterRef(PyInterpreterRef *ref) - Acquire a strong reference to the main interpreter. +.. c:function:: PyInterpreterLock PyInterpreterLock_AcquireView(PyInterpreterView view) - This function only exists for special cases where a specific interpreter - can't be saved. Prefer safely acquiring a reference through - :c:func:`PyInterpreterRef_FromCurrent` whenever possible. + Acquire a lock to an interpreter through a view. - On success, this function returns a strong reference to the main - interpreter, and returns ``0`` without an exception set on failure. + On success, this function returns a lock to the interpreter + denoted by *view*. The view is still valid after calling this + function. - Failure typically indicates that the main interpreter has already finished - waiting on its reference count. + If the interpreter no longer exists or can no longer support calling Python + code safely, then this function returns ``0`` without an exception set. The caller does not need to hold an :term:`attached thread state`. -.. c:function:: PyInterpreterState *PyInterpreterRef_GetInterpreter(PyInterpreterRef ref) - Return the :c:type:`PyInterpreterState` pointer denoted by *ref*. +.. c:function:: PyInterpreterState *PyInterpreterLock_GetInterpreter(PyInterpreterLock lock) + + Return the :c:type:`PyInterpreterState` pointer denoted by *lock*. This function cannot fail, and the caller doesn't need to hold an :term:`attached thread state`. -.. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref) +.. c:function:: PyInterpreterLock PyInterpreterLock_Copy(PyInterpreterLock lock) - Duplicate a strong reference to an interpreter. + Duplicate a lock to an interpreter. - On success, this function returns a strong reference to the interpreter - denoted by *ref*, and returns ``0`` without an exception set on failure. + On success, this function returns a lock to the interpreter + denoted by *lock*, and returns ``0`` without an exception set on failure. The caller does not need to hold an :term:`attached thread state`. -.. c:function:: void PyInterpreterRef_Close(PyInterpreterRef ref) +.. c:function:: void PyInterpreterLock_Release(PyInterpreterLock lock) - Release a strong reference to an interpreter, allowing it to shut down - if there are no references left. + Release an interpreter's lock, possibly allowing it to shut down. This function cannot fail, and the caller doesn't need to hold an :term:`attached thread state`. -Weak Interpreter References -*************************** +Interpreter Views +----------------- -.. c:type:: PyInterpreterWeakRef +.. c:type:: PyInterpreterView - An opaque, weak reference to an interpreter. + An opaque view of an interpreter. - The interpreter will *not* wait for the reference to be - released before shutting down. + This is a thread-safe way to access an interpreter that may be finalized + in another thread. This type is guaranteed to be pointer-sized. -.. c:function:: int PyInterpreterWeakRef_FromCurrent(PyInterpreterWeakRef *wref) +.. c:function:: PyInterpreterView PyInterpreterView_FromCurrent(void) - Acquire a weak reference to the current interpreter. + Create a view to the current interpreter. This function is generally meant to be used in tandem with - :c:func:`PyInterpreterWeakRef_Promote`. + :c:func:`PyInterpreterLock_AcquireView`. - On success, this function returns a weak reference to the current + On success, this function returns a view to the current interpreter, and returns ``0`` with an exception set on failure. The caller must hold an :term:`attached thread state`. -.. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref) +.. c:function:: PyInterpreterView PyInterpreterView_Copy(PyInterpreterView view) - Duplicate a weak reference to an interpreter. + Duplicate a view to an interpreter. - On success, this function returns a non-zero weak reference to the - interpreter denoted by *wref*, and returns ``0`` without an exception set + On success, this function returns a non-zero view to the + interpreter denoted by *view*, and returns ``0`` without an exception set on failure. This function cannot fail, and the caller doesn't need to hold an :term:`attached thread state`. -.. c:function:: PyInterpreterRef PyInterpreterWeakRef_Promote(PyInterpreterWeakRef wref) +.. c:function:: void PyInterpreterView_Close(PyInterpreterView view) - Acquire a strong reference to an interpreter through a weak reference. + Delete an interpreter view. - On success, this function returns a strong reference to the interpreter - denoted by *wref*. The weak reference is still valid after calling this - function. + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. - If the interpreter no longer exists or has already finished waiting - for its reference count to reach zero, then this function returns ``0`` - without an exception set. +.. c:function:: PyInterpreterView PyUnstable_InterpreterView_FromDefault() - This function is not safe to call in a re-entrant signal handler. + Create a view for an arbitrary "main" interpreter. - The caller does not need to hold an :term:`attached thread state`. + This function only exists for special cases where a specific interpreter + can't be saved. -.. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref) + On success, this function returns a view to the main + interpreter, and returns ``0`` without an exception set on failure. - Release a weak reference to an interpreter. + The caller does not need to hold an :term:`attached thread state`. - This function cannot fail, and the caller doesn't need to hold an - :term:`attached thread state`. Ensuring And Releasing Thread States ------------------------------------ @@ -554,20 +407,20 @@ Ensuring And Releasing Thread States This proposal includes two new high-level threading APIs that intend to replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`. -.. c:type:: PyThreadRef +.. c:type:: PyThreadView - An opaque reference to a :term:`thread state`. + An opaque view of a :term:`thread state`. - In the initial implementation, holding a thread reference will - not block finalization of threads or interpreters. - This may change in the future. + In this PEP, a thread view comes with no additional properties over a + :c:expr:`PyThreadState *` pointer. APIs for ``PyThreadView`` may be added + in the future. This type is guaranteed to be pointer-sized. -.. c:function:: int PyThreadState_Ensure(PyInterpreterRef ref, PyThreadRef *thread) +.. c:function:: PyThreadView PyThreadState_Ensure(PyInterpreterLock lock) Ensure that the thread has an :term:`attached thread state` for the - interpreter denoted by *ref*, and thus can safely invoke that + interpreter denoted by *lock*, and thus can safely invoke that interpreter. It is OK to call this function if the thread already has an attached thread state, as long as there is a subsequent call to :c:func:`PyThreadState_Release` that matches this one. @@ -575,18 +428,16 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`. Nested calls to this function will only sometimes create a new :term:`thread state`. If there is no attached thread state, then this function will check for the most recent attached thread - state used by this thread. If none exists or it doesn't match *ref*, - a new thread state is created. If it does match *ref*, it is reattached. + state used by this thread. If none exists or it doesn't match *lock*, + a new thread state is created. If it does match *lock*, it is reattached. If there is an attached thread state, then a similar check occurs; - if the interpreter matches *ref*, it is attached, and otherwise a new + if the interpreter matches *lock*, it is attached, and otherwise a new thread state is created. - The old thread state is stored as a thread reference in *\*thread*, and is - to be restored by :c:func:`PyThreadState_Release`. - - Return ``0`` on success, and ``-1`` without an exception set on failure. + Return a non-zero thread view of the old thread state on success, and + ``0`` on failure. -.. c:function:: void PyThreadState_Release(PyThreadRef ref) +.. c:function:: void PyThreadState_Release(PyThreadView lock) Release a :c:func:`PyThreadState_Ensure` call. @@ -597,8 +448,8 @@ replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`. This function cannot fail. -Deprecation of GIL-state APIs ------------------------------ +Deprecation of ``PyGILState`` APIs +---------------------------------- This PEP deprecates all of the existing ``PyGILState`` APIs in favor of the existing and new ``PyThreadState`` APIs. Namely: @@ -611,25 +462,15 @@ existing and new ``PyThreadState`` APIs. Namely: instead. All of the ``PyGILState`` APIs are to be removed from the non-limited C API in -a future Python version. They will remain available in the stable ABI for +Python 3.20. They will remain available in the stable ABI for compatibility. -It's worth noting that :c:func:`PyThreadState_Get` and -:c:func:`PyThreadState_GetUnchecked` aren't perfect replacements for -:c:func:`PyGILState_GetThisThreadState`, because -:c:func:`PyGILState_GetThisThreadState` is able to return a thread state even -when it is :term:`detached `. This PEP intentionally -doesn't leave a perfect replacement for this, because the GIL-state pointer -(which holds the last used thread state by the thread) is only useful for -those implementing :c:func:`PyThreadState_Ensure` or similar. It's not a -common API to want as a user. - Backwards Compatibility ======================= This PEP specifies a breaking change with the removal of all the -``PyGILState`` APIs from the public headers of the non-limited C API in a -future version. +``PyGILState`` APIs from the public headers of the non-limited C API in +Python 3.20. Security Implications ===================== @@ -657,62 +498,65 @@ Imagine that you're developing a C library for logging. You might want to provide an API that allows users to log to a Python file object. -With this PEP, you'd implement it like this: +With this PEP, you would implement it like this: .. code-block:: c int - LogToPyFile(PyInterpreterWeakRef wref, + LogToPyFile(PyInterpreterView view, PyObject *file, - const char *text) + PyObject *text) { - PyInterpreterRef ref = PyInterpreterWeakRef_Promote(wref); - if (ref == 0) { + PyInterpreterLock lock = PyInterpreterLock_AcquireView(view); + if (lock == 0) { /* Python interpreter has shut down */ return -1; } - PyThreadRef thread_ref; - if (PyThreadState_Ensure(ref, &thread_ref) < 0) { - PyInterpreterRef_Close(ref); + PyThreadView thread_view = PyThreadState_Ensure(lock); + if (thread_view == 0) { + PyInterpreterLock_Release(lock); fputs("Cannot call Python.\n", stderr); return -1; } - char *to_write = do_some_text_mutation(text); + const char *to_write = PyUnicode_AsUTF8(text); + if (to_write == NULL) { + // Since the exception may be destroyed upon calling PyThreadState_Release(), + // print out the exception ourself. + PyErr_Print(); + PyThreadState_Release(thread_view); + PyInterpreterLock_Release(lock); + return -1; + } int res = PyFile_WriteString(to_write, file); free(to_write); - PyErr_Print(); + if (res < 0) { + PyErr_Print(); + } - PyThreadState_Release(thread_ref); - PyInterpreterRef_Close(ref); + PyThreadState_Release(thread_view); + PyInterpreterLock_Release(lock); return res < 0; } -If you were to use :c:func:`PyGILState_Ensure` for this case, then your -thread would hang if the interpreter were to be finalizing at that time! - -Additionally, the API supports subinterpreters. If you were to assume that -the main interpreter created the file object (via :c:func:`PyGILState_Ensure`), -then using file objects owned by a subinterpreter could possibly crash. - Example: A Single-threaded Ensure ********************************* -This example shows acquiring a lock in a Python method. +This example shows acquiring a C lock in a Python method. If this were to be called from a daemon thread, then the interpreter could hang the thread while reattaching the thread state, leaving us with the lock -held. Any future finalizer that wanted to acquire the lock would be deadlocked! +held. Any future finalizer that attempted to acquire the lock would be deadlocked. .. code-block:: c static PyObject * - my_critical_operation(PyObject *self, PyObject *unused) + my_critical_operation(PyObject *self, PyObject *Py_UNUSED(args)) { assert(PyThreadState_GetUnchecked() != NULL); - PyInterpreterRef ref = PyInterpreterRef_FromCurrent(); - if (ref == 0) { + PyInterpreterLock lock = PyInterpreterLock_AcquireCurrent(); + if (lock == 0) { /* Python interpreter has shut down */ return NULL; } @@ -726,7 +570,7 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked! release_some_lock(); Py_END_ALLOW_THREADS; - PyInterpreterRef_Close(ref); + PyInterpreterLock_Release(lock); Py_RETURN_NONE; } @@ -774,17 +618,17 @@ This is the same code, rewritten to use the new functions: static int thread_func(void *arg) { - PyInterpreterRef interp = (PyInterpreterRef)arg; - PyThreadRef thread_ref; - if (PyThreadState_Ensure(interp, &thread_ref) < 0) { - PyInterpreterRef_Close(interp); + PyInterpreterLock interp = (PyInterpreterLock)arg; + PyThreadView thread_view = PyThreadState_Ensure(interp); + if (thread_view == 0) { + PyInterpreterLock_Release(interp); return -1; } if (PyRun_SimpleString("print(42)") < 0) { PyErr_Print(); } - PyThreadState_Release(thread_ref); - PyInterpreterRef_Close(interp); + PyThreadState_Release(thread_view); + PyInterpreterLock_Release(interp); return 0; } @@ -794,13 +638,13 @@ This is the same code, rewritten to use the new functions: PyThread_handle_t handle; PyThead_indent_t indent; - PyInterpreterRef ref = PyInterpreterRef_FromCurrent(); - if (ref == 0) { + PyInterpreterLock lock = PyInterpreterLock_AcquireCurrent(); + if (lock == 0) { return NULL; } - if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) { - PyInterpreterRef_Close(ref); + if (PyThread_start_joinable_thread(thread_func, (void *)lock, &ident, &handle) < 0) { + PyInterpreterLock_Release(lock); return NULL; } Py_BEGIN_ALLOW_THREADS @@ -815,7 +659,7 @@ Example: A Daemon Thread With this PEP, daemon threads are very similar to how non-Python threads work in the C API today. After calling :c:func:`PyThreadState_Ensure`, simply -release the interpreter reference to allow the interpreter to shut down (and +release the interpreter lock to allow the interpreter to shut down (and hang the current thread forever). .. code-block:: c @@ -823,19 +667,19 @@ hang the current thread forever). static int thread_func(void *arg) { - PyInterpreterRef ref = (PyInterpreterRef)arg; - PyThreadRef thread_ref; - if (PyThreadState_Ensure(ref, &thread_ref) < 0) { - PyInterpreterRef_Close(ref); + PyInterpreterLock lock = (PyInterpreterLock)arg; + PyThreadView thread_view = PyThreadState_Ensure(lock); + if (thread_view == 0) { + PyInterpreterLock_Release(lock); return -1; } - /* Release the interpreter reference, allowing it to + /* Release the interpreter lock, allowing it to finalize. This means that print(42) can hang this thread. */ - PyInterpreterRef_Close(ref); + PyInterpreterLock_Release(lock); if (PyRun_SimpleString("print(42)") < 0) { PyErr_Print(); } - PyThreadState_Release(thread_ref); + PyThreadState_Release(thread_view); return 0; } @@ -845,13 +689,13 @@ hang the current thread forever). PyThread_handle_t handle; PyThead_indent_t indent; - PyInterpreterRef ref = PyInterpreterRef_FromCurrent(); - if (ref == 0) { + PyInterpreterLock lock = PyInterpreterLock_AcquireCurrent(); + if (lock == 0) { return NULL; } - if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) { - PyInterpreterRef_Close(ref); + if (PyThread_start_joinable_thread(thread_func, (void *)lock, &ident, &handle) < 0) { + PyInterpreterLock_Release(lock); return NULL; } Py_RETURN_NONE; @@ -860,56 +704,54 @@ hang the current thread forever). Example: An Asynchronous Callback ********************************* -In some cases, the thread might not ever start, such as in a callback. -We can't use a strong reference here, because a strong reference would -deadlock the interpreter if it's not released. - .. code-block:: c typedef struct { - PyInterpreterWeakRef wref; + PyInterpreterView view; } ThreadData; static int async_callback(void *arg) { - ThreadData *data = (ThreadData *)arg; - PyInterpreterWeakRef wref = data->wref; - PyInterpreterRef ref = PyInterpreterWeakRef_Promote(wref); - if (ref == 0) { + ThreadData *tdata = (ThreadData *)arg; + PyInterpreterView view = tdata->view; + PyInterpreterLock lock = PyInterpreterLock_AcquireView(view); + if (lock == 0) { fputs("Python has shut down!\n", stderr); return -1; } - PyThreadRef thread_ref; - if (PyThreadState_Ensure(ref, &thread_ref) < 0) { - PyInterpreterRef_Close(ref); + PyThreadView thread_view = PyThreadState_Ensure(lock); + if (thread_view == 0) { + PyInterpreterLock_Release(lock); return -1; } if (PyRun_SimpleString("print(42)") < 0) { PyErr_Print(); } - PyThreadState_Release(thread_ref); - PyInterpreterRef_Close(ref); + PyThreadState_Release(thread_view); + PyInterpreterLock_Release(lock); + PyInterpreterView_Close(view); + PyMem_RawFree(tdata); return 0; } static PyObject * setup_callback(PyObject *self, PyObject *unused) { - // Weak reference to the interpreter. It won't wait on the callback + // View to the interpreter. It won't wait on the callback // to finalize. ThreadData *tdata = PyMem_RawMalloc(sizeof(ThreadData)); if (tdata == NULL) { PyErr_NoMemory(); return NULL; } - PyInterpreterWeakRef wref = PyInterpreterWeakRef_FromCurrent(); - if (wref == 0) { + PyInterpreterView view = PyInterpreterView_FromCurrent(); + if (view == 0) { PyMem_RawFree(tdata); return NULL; } - tdata->wref = wref; + tdata->view = view; register_callback(async_callback, tdata); Py_RETURN_NONE; @@ -919,37 +761,31 @@ Example: Calling Python Without a Callback Parameter **************************************************** There are a few cases where callback functions don't take a callback parameter -(``void *arg``), so it's impossible to acquire a reference to any specific -interpreter. The solution to this problem is to acquire a reference to the main -interpreter through :c:func:`PyUnstable_GetDefaultInterpreterRef`. - -But wait, won't that break with subinterpreters, per -:ref:`pep-788-subinterpreters-gilstate`? Fortunately, since the callback has -no callback parameter, it's not possible for the caller to pass any objects or -interpreter-specific data, so it's completely safe to choose the main -interpreter here. +(``void *arg``), so it's difficult to acquire a lock to any specific +interpreter. The solution to this problem is to acquire a lock to the main +interpreter through :c:func:`PyUnstable_InterpreterView_FromDefault`. .. code-block:: c static void call_python(void) { - PyInterpreterRef ref = PyUnstable_GetDefaultInterpreterRef(); - if (ref == 0) { + PyInterpreterLock lock = PyUnstable_InterpreterView_FromDefault(); + if (lock == 0) { fputs("Python has shut down.", stderr); return; } - PyThreadRef thread_ref; - if (PyThreadState_Ensure(ref, &thread_ref) < 0) { - PyInterpreterRef_Close(ref); + PyThreadView thread_view = PyThreadState_Ensure(lock); + if (thread_view == 0) { + PyInterpreterLock_Release(lock); return -1; } if (PyRun_SimpleString("print(42)") < 0) { PyErr_Print(); } - PyThreadState_Release(thread_ref); - PyInterpreterRef_Close(ref); + PyThreadState_Release(thread_view); + PyInterpreterLock_Release(lock); return 0; } @@ -965,13 +801,13 @@ Open Issues How Should the APIs Fail? ------------------------- -There is a bit of disagreement on how the ``PyInterpreter[Weak]Ref`` APIs +There is a bit of disagreement on how the ``PyInterpreter[Lock|View]`` APIs should indicate a failure to the caller. There are two competing ideas: 1. Return -1 to indicate failure, and 0 to indicate success. On success, - functions will assign to a ``PyInterpreter[Weak]Ref`` pointer passed as an + functions will assign to a ``PyInterpreter[Lock|View]`` pointer passed as an argument. -2. Directly return a ``PyInterpreter[Weak]Ref``, which a value of 0 being +2. Directly return a ``PyInterpreter[Lock|View]``, which a value of 0 being equivalent to ``NULL``, indicating failure. Currently, the PEP spells the latter. @@ -979,16 +815,55 @@ Currently, the PEP spells the latter. Rejected Ideas ============== +Interpreter Reference Counting +------------------------------ + +There were two iterations of this proposal that both specified an interpreter to +have a reference count, and the interpreter would wait for that reference count +to hit zero before shutting down. + +The first iteration of this idea did this by adding implicit reference counting +to ``PyInterpreterState *`` pointers. A function known as ``PyInterpreterState_Hold`` +would increment the reference count (making it a "strong reference"), and +``PyInterpreterState_Release`` would decrement it. An interpreter's ID (a +standalone ``int64_t``) was used as a form of weak reference, which could be +used to look up an interpreter state and atomically increment its reference +count. These ideas were ultimately rejected because they seemed to make things +very confusing -- all existing uses of ``PyInterpreterState *`` would be +borrowed, which would make it difficult for developers to understand which +areas of their code required/used a strong reference. + +In response to that pushback, this PEP specified ``PyInterpreterRef`` APIs +that would also mimic reference counting, but in a more explicit manner that +made it easier upon developers. ``PyInterpreterRef`` was analogous to +:c:type:`PyInterpreterLock` in this PEP. Similarly, the older revision included +``PyInterpreterWeakRef``, which was analogous to :c:type:`PyInterpreterView`. + +Eventually, the notion of reference counting was completely abandonded from +this proposal for a few reasons: + +1. There was contention about overcomplication in the API design; the reference + counting design looked very similar to that of HPy, which had no precedent + in CPython. There was fear that this proposal was being overcomplicated to + look more like HPy. +2. Unlike traditional reference counting APIs, acquiring a strong reference to + an interpreter could arbitrarily fail, and an interpreter would not + immediately deallocate when its reference count reached zero. +3. There was prior discussion about adding "true" reference counting to + interpreters (which would deallocate upon reaching zero), which would have + been very confusing if there was an existing API in CPython titled + ``PyInterpreterRef`` that did something different. + Non-daemon Thread States ------------------------ -In prior iterations of this PEP, interpreter references were a property of +In earlier revisions of this PEP, interpreter locks were a property of a thread state rather than a property of an interpreter. This meant that -:c:func:`PyThreadState_Ensure` stole a strong interpreter reference, and +:c:func:`PyThreadState_Ensure` kept an interpreter lock held, and it was released upon calling :c:func:`PyThreadState_Release`. A thread state -that held a reference to an interpreter was known as a "non-daemon thread +that held a lock to an interpreter was known as a "non-daemon thread state." At first, this seemed like an improvement, because it shifted management -of a reference's lifetime to the thread instead of the user, which eliminated +of a lock's lifetime to the thread instead of the user, which eliminated some boilerplate. However, this ended up making the proposal significantly more complex and @@ -998,56 +873,12 @@ hurt the proposal's goals: threads as the problem, which hurt the clarity of the PEP. Additionally, the phrase "non-daemon" added extra confusion, because non-daemon Python threads are explicitly joined, whereas a non-daemon C thread is only waited on - until it releases its reference. -- In many cases, an interpreter reference should outlive a singular thread - state. Stealing the interpreter reference in :c:func:`PyThreadState_Ensure` + until it releases its lock. +- In many cases, an interpreter lock should outlive a singular thread + state. Stealing the interpreter lock in :c:func:`PyThreadState_Ensure` was particularly troublesome for these cases. If :c:func:`PyThreadState_Ensure` - didn't steal a reference with non-daemon thread states, it would muddy the - ownership story of the interpreter reference, leading to a more confusing API. - -Retrofiting the Existing Structures with Reference Counts ---------------------------------------------------------- - -Interpreter-State Pointers for Reference Counting -************************************************* - -Originally, this PEP specified :c:func:`!PyInterpreterState_Hold` -and :c:func:`!PyInterpreterState_Release` for managing strong references -to an interpreter, alongside :c:func:`!PyInterpreterState_Lookup` which -converted interpreter IDs (weak references) to strong references. - -In the end, this was rejected, primarily because it was needlessly -confusing. Interpreter states hadn't ever had a reference count prior, so -there was a lack of intuition about when and where something was a strong -reference. The :c:type:`PyInterpreterRef` and :c:type:`PyInterpreterWeakRef` -types seem a lot clearer. - -Interpreter IDs for Reference Counting -************************************** - -Some iterations of this API took an ``int64_t interp_id`` parameter instead of -``PyInterpreterState *interp``, because interpreter IDs cannot be concurrently -deleted and cause use-after-free violations. The reference counting APIs in -this PEP sidestep this issue anyway, but an interpreter ID have the advantage -of requiring less magic: - -- Nearly all existing interpreter APIs already return a :c:type:`PyInterpreterState` - pointer, not an interpreter ID. Functions like - :c:func:`PyThreadState_GetInterpreter` would have to be accompanied by - frustrating calls to :c:func:`PyInterpreterState_GetID`. -- Threads typically take a ``void *arg`` parameter, not an ``int64_t arg``. - As such, passing a reference requires much less boilerplate - for the user, because an additional structure definition or heap allocation - would be needed to store the interpreter ID. This is especially an issue - on 32-bit systems, where ``void *`` is too small for an ``int64_t``. -- To retain usability, interpreter ID APIs would still need to keep a - reference count, otherwise the interpreter could be finalizing before - the non-Python thread gets a chance to attach. The problem with using an - interpreter ID is that the reference count has to be "invisible"; it - must be tracked elsewhere in the interpreter, likely being *more* - complex than :c:func:`PyInterpreterRef_FromCurrent`. There's also a lack - of intuition that a standalone integer could have such a thing as - a reference count. + didn't steal a lock with non-daemon thread states, it would muddy the + ownership story of the interpreter lock, leading to a more confusing API. .. _pep-788-activate-deactivate-instead: @@ -1094,9 +925,9 @@ Acknowledgements ================ This PEP is based on prior work, feedback, and discussions from many people, -including Victor Stinner, Antoine Pitrou, Da Woods, Sam Gross, Matt Page, +including Victor Stinner, Antoine Pitrou, David Woods, Sam Gross, Matt Page, Ronald Oussoren, Matt Wozniski, Eric Snow, Steve Dower, Petr Viktorin, -and Gregory P. Smith. +Gregory P. Smith, and Alyssa Coghlan. Copyright =========