Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the eval_breaker to PyThreadState #112175

Closed
Tracked by #108219
colesbury opened this issue Nov 16, 2023 · 4 comments
Closed
Tracked by #108219

Move the eval_breaker to PyThreadState #112175

colesbury opened this issue Nov 16, 2023 · 4 comments
Assignees
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-feature A feature request or enhancement

Comments

@colesbury
Copy link
Contributor

colesbury commented Nov 16, 2023

Feature or enhancement

The eval_breaker is a variable that keeps track of requests to break out of the eval loop to handle things like signals, run a garbage collection, or handle asynchronous exceptions. It is currently in the interpreter state (in interp->ceval.eval_breaker). However, some of the events are specific to a given thread. For example, signals and some pending calls can only be executed on the "main" thread of an interpreter.

We should move the eval_breaker to PyThreadState to better handle these thread-specific events. This is more important for the --disable-gil builds where multiple threads within the same interpreter may be running at the same time.

@markshannon suggested a combination of per-interpreter and per-thread state, where the thread copies the per-interpreter eval_breaker state to the per-thread state when it acquires the GIL.

Linked PRs

@colesbury colesbury added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.13 bugs and security fixes topic-free-threading labels Nov 16, 2023
@gaogaotiantian
Copy link
Member

If I remember correctly, eval_breaker also contains the global version for instrumentation. Will having multiple eval_breakers in different threads work with the current instrumentation mechanism?

@colesbury
Copy link
Contributor Author

colesbury commented Nov 16, 2023

I don't think it'll be an issue in the default build, but I'll need to think about how instrumentation works in --disable-gil builds with multiple threads.

I think Mark's idea works basically like:

  • Set the eval_breaker in PyThreadState.eval_breaker.
  • On _PyThreadState_Detach(), copy bits from PyThreadState.eval_breaker to PyInterpreterState.interp_eval_breaker
  • On _PyThreadState_Attach(), copy bits from PyInterpreterState.interp_eval_breaker back to PyThreadState.eval_breaker

In --disable-gil builds, we will need to loop over all the threads when setting interpreter-wide bits.

@swtaarrs
Copy link
Member

I'm currently working on this, so if anyone has any related ideas/comments, please post them!

@swtaarrs
Copy link
Member

@markshannon, do you have any more context on your per-thread eval_breaker idea that Sam described a couple comments up? Also, this is a heads up that I'm working on this, since you expressed an interest in it in a previous PR comment.

I'm fleshing out exactly how all the flags will work in free-threaded vs. normal builds. As Sam says above, we need to loop over all threads for interpreter-wide flags in a free-threaded build, and this is pretty straightforward. All eval_breaker flags will go in a new member of PyThreadState. I'm planning on preserving the interpreter-wide interp_eval_breaker (renamed from eval_breaker) to keep holding the global instrumentation version. Properly supporting the version number in a free-threaded build will be handled separately from this issue.

For normal builds, if we want to avoid looping over all threads, we can set interpreter-wide flags on the active thread and use interp_eval_breaker to shuffle them between threads when a context switch happens. Each flag is slightly different, so here's my current plan for how to handle them (this is just for the normal build, where we still have the GIL):

  • _PY_SIGNALS_PENDING_BIT is easy - signals are only ever handled by the main thread in the main interpreter. We set the bit on that thread when a signal is received.
  • _PY_ASYNC_EXCEPTION_BIT also applies to a single, known thread, so we set the bit on that specific thread.
  • _PY_GIL_DROP_REQUEST_BIT is again set on a specific thread (the one holding the GIL). This is only set while holding gil->mutex, which should ensure that the thread holding the GIL doesn't change between when we decide to set the flag and when we actually set the flag.
  • _PY_GC_SCHEDULED_BIT can be handled by any thread in the targeted interpreter. The one existing caller of the private function _Py_ScheduleGC(PyInterpreterState*) schedules a GC for the current interpreter, so I believe it should be safe to change it to operate on a PyThreadState* instead, and always pass the current thread. The next time that thread checks its eval_breaker, it will run the GC. If it yields before then, it will move the GC bit to interp_eval_breaker for the next scheduled thread to pick up. If we want to preserve the ability for one interpreter to schedule a GC in another interpreter, the strategy used for _PY_CALLS_TO_DO (next item) should work.
  • _PY_CALLS_TO_DO is the least restricted case, because since gh-104812: Run Pending Calls in any Thread #104813, any thread can add a callback to run in any interpreter, and callbacks can be allowed to run in any thread. The general strategy is similar to _PY_GC_SCHEDULED_BIT: set the bit on the current thread, and if that thread yields before processing the callback, move the bit to interp_eval_breaker. If no thread in the targeted interpreter holds the GIL, directly set interp_eval_breaker.
    • To safely set this signal across interpreters, I will ensure that a) this bit is only set while holding gil->mutex for the signaled interpreter (similar to _PY_GIL_DROP_REQUEST), and b) bits will only be transferred between eval_breaker and interp_eval_breaker while holding gil->mutex. Half of part b) already happens with update_eval_breaker_from_thread(), called from take_gil(). I'll add a reversed version of that in drop_gil(). Both directions will only apply to _PY_GC_SCHEDULED_BIT and _PY_CALLS_TO_DO_BIT, since the other flags only apply to a single thread.

Does this all sound reasonable, especially for the interpreter-wide flags? If it's too complicated, I could set them by looping over all threads in both build types. That would add some overhead to normal builds, but I expect it wouldn't be measurable overall. It's not much work to try out both implementations, so I could see if there's a measurable performance difference, if that would help make the decision.

swtaarrs added a commit to swtaarrs/cpython that referenced this issue Feb 7, 2024
swtaarrs added a commit to swtaarrs/cpython that referenced this issue Feb 8, 2024
swtaarrs added a commit to swtaarrs/cpython that referenced this issue Feb 9, 2024
This change adds an `eval_breaker` field to `PyThreadState`, renaming the existing `eval_breaker` to
`interp_eval_breaker` (its uses are explained further down). The primary motivation is for
performance in free-threaded builds: with thread-local eval breakers, we can stop a specific thread
(e.g., for an async exception) without interrupting other threads.

There are still two situations where we want the first available thread to handle a request:
- Running a garbage collection: In normal builds, we set `_PY_GC_SCHEDULED_BIT` on the current
  thread. In case a thread suspends before handling the collection, the bit is copied to and from
  `interp_eval_breaker` on thread suspend and resume, respectively. In a free-threaded build, we
  simply iterate over all threads and set the bit. The first thread to check its eval breaker runs
  the collection, unsetting the bit on all threads.
  - Free-threaded builds could have multiple threads attempt a GC from one trigger if we get very
    unlucky with thread scheduling. I didn't put any protections against this in place because a)
    the consequences of it happening are just that one or more threads will check the GC thresholds
    right after a collection finishes, which won't affect correctness and b) it's incredibly,
    vanishingly unlikely.
- Pending calls not limited to the main thread (possible since python/cpython@757b402ea1c2). This is
  a little tricker, since the callback can be added from any thread, with or without the GIL
  held. If the targeted interpreter's GIL is locked, we signal the holding thread. When a thread is
  resumed, its `_PY_CALLS_TO_DO` bit is derived from the source of truth for pending calls (one of
  two `_pending_calls` structs). This handles situations where no thread held the GIL when the call
  was first added, or if the active thread did not handle the call before releasing the GIL. In a
  free-threaded build, all threads all signaled, similar to scheduling a GC.

The source of truth for the global instrumentation version is still in `interp_eval_breaker`, in
both normal and free-threaded builds. Threads usually read the version from their local
`eval_breaker`, where it continues to be colocated with the eval breaker bits, and the method for
keeping it up to date depends on build type. All builds first update the version in
`interp_eval_breaker`, and then:
- Normal builds update the version in the current thread's `eval_breaker`. When a thread takes the
  GIL, it copies the current version from `interp_eval_breaker` as part of the same operation that
  copies `_PY_GC_SCHEDULED_BIT`.
- Free-threaded builds again iterate over all threads in the current interpreter, updating the
  version on each one.
Instrumentation (and the specializing interpreter more generally) will need more work to be
compatible with free-threaded builds, so these changes are just intended to maintain the status quo
in normal builds for now.

Other notable changes are:
- The `_PY_*_BIT` macros now expand to the actual bit being set, rather than the bit's index. I
  think this is simpler overall. I also moved their definitions from `pycore_ceval.h` to
  `pycore_pystate.h`, since their main usage is on `PyThreadState`s now.
- Most manipulations of `eval_breaker` are done with a new pair of functions:
  `_PyThreadState_Signal()` and `_PyThreadState_Unsignal()`. Having two separate functions to
  set/unset a bit, rather than one function that takes the bit value to use, lets us use a single
  atomic `or`/`and`, rather than a loop around an atomic compare/exchange like the old
  `_Py_set_eval_breaker_bit` function.

Existing tests provide pretty good coverage for most of this functionality. The one new test I added
is to make sure a GC still happens if a thread schedules it then drops the GIL before the GC runs. I
don't love how complicated this test ended up so I'm open to other ideas for how to test this (or
other things to test in general).
colesbury pushed a commit that referenced this issue Feb 20, 2024
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.

The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
@Eclips4 Eclips4 closed this as completed Feb 22, 2024
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.

The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
This change adds an `eval_breaker` field to `PyThreadState`. The primary
motivation is for performance in free-threaded builds: with thread-local eval
breakers, we can stop a specific thread (e.g., for an async exception) without
interrupting other threads.

The source of truth for the global instrumentation version is stored in the
`instrumentation_version` field in PyInterpreterState. Threads usually read the
version from their local `eval_breaker`, where it continues to be colocated
with the eval breaker bits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants