GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846

markshannon · 2023-09-25T14:17:09Z

Merge the various eval-breaker flags and monitoring version into a single atomic value to allow faster checking by combining the eval-breaker check and the monitoring version check.

Issue: Executors might ignore instrumentation. #109369

…t machines.

…vents.

Python/ceval_gil.c

ericsnowcurrently · 2023-09-27T01:04:50Z

Python/ceval_gil.c

+        calls_to_do = _Py_atomic_load_int32_relaxed(
+            &_PyRuntime.ceval.pending_mainthread.calls_to_do);


Isn't this fully covered by the places we call SIGNAL_PENDING_CALLS()?

Not in this case.

When running a non-main thread with pending_mainthread.calls_to_do we don't want _PY_CALLS_TO_DO_BIT set, to avoid constantly calling _Py_HandlePending.
So when we switch to the main thread, we need to set _PY_CALLS_TO_DO_BIT if pending_mainthread.calls_to_do

Similarly to #109846 (comment), why does acquiring the GIL require that we reproduce what was already done via Py_AddPendingCall()?

Is it because we always flip the bit in _make_pending_calls(), even if we're not in the main thread but there are global pending calls waiting. That seems like we could use a separate bit for the global pending calls.

Is i because we don't want to interrupt all the other threads while waiting for the main thread to handle its pending calls? I suppose the current approach sort of addresses that, but (as indicated in #109846 (comment)) that means we put off those extraneous interruptions only until the main thread takes the GIL again. Then the interruptions resume until the main thread handles its pending calls. That feels very jumpy. ISTM, a per-thread eval breaker would be a clearer solution.

This all applies to the signal-related code here in update_eval_breaker_from_thread() too.

I'm not suggesting that we change things up in this PR. Rather, if a less jumpy approach makes sense then we could do it in a follow-up PR.

ericsnowcurrently · 2023-09-27T01:07:22Z

Python/ceval_gil.c

+    if (tstate->async_exc != NULL) {
+        _Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 1);
+    }


Where are we not already setting the bit when tstate->async_exc is set, such that we must do so here?

If tstate->async_exc is set on a sleeping thread, then we need to set the bit when that thread gets the GIL.

Let's see if I understand things correctly. A thread will only ever acquire the GIL either by the eval loop itself or via the C-API:

C-API¹

before the eval loop has started

while the eval loop is running

after the eval loop has finished

running eval loop--for some bytecode instructions²

On the other hand, all the "pending" actions³ happen only are performed in a running eval loop, driven by the eval breaker². The eval loop calls _Py_HandlePending(), which executes in this order:

handle signals, if any (main thread only)

make per-interpreter pending calls

make global pending calls (main thread only)

run GC⁴

release & re-acquire the GIL (if another thread has asked for it)

apply the async exception, if applicable (sort of thread-specific)

When the GIL is acquired via the C-API (outside a running eval loop), none of that will happen until at least one eval loop is running and it reaches one of the instructions that checks the eval breaker².

When the GIL is acquired by a running eval loop, the first 4 pending events have already just been handled, and the last one (async exception) is about to be.

With all that in mind, the question is: how does the effect of PyThreadState_SetAsyncExc() relate to acquiring the GIL? Hence, why should we worry about _PY_ASYNC_EXCEPTION_BIT in update_eval_breaker_from_thread()? PyThreadState_SetAsyncExc() always sets tstate->async_exc and then sets the eval breaker for the target interpreter. That eval breaker bit never gets cleared until the targeted thread takes care of event. So why do we need to set the bit when taking the GIL, regardless of if the thread is sleeping or not? It seems unnecessary.

That does lead to another question: why have the async exception be part of the eval breaker at all? It only only applies to the target thread, but the eval breaker bit is set it will interrupt all threads. Would it make sense to have a per-thread eval breaker? Then again, it only matters if async exceptions are used by the community often enough.

Footnotes

e.g. PyEval_RestoreThread() (after earlier releasing it), or for the first time when an interpreter is created or with threading.Thread.start(). ↩

the small set of instructions that invoke CHECK_EVAL_BREAKER(): the intermediate end of loops; after calls; sometimes for RESUME; _JUMP_TO_TOP and ENTER_EXECUTOR (whatever those are for). ↩ ↩² ↩³

which are driven by the events represented in this PR ↩

GC is also triggered in other places, like the signal-related C-API ↩

There's a similar question for signals and the main thread pending calls,. However, unlike async exceptions, we know that signals are somewhat prevalent. Maybe a separate per-thread eval breaker would be worth it? Perhaps it would help to have a separate (generated) variant of the eval loop that also checks a per-thread eval breaker (or checks it exclusively)?

FWIW, I don't think we have much room to change the order of actions in _Py_HandlePending().

ericsnowcurrently · 2023-09-27T01:09:19Z

Python/ceval_gil.c

+        if (_Py_atomic_load(&_PyRuntime.signals.is_tripped)) {
+            _Py_set_eval_breaker_bit(interp, _PY_SIGNALS_PENDING_BIT, 1);
+        }


I'm guessing this is something the signals module can't do directly. Is that right?

Only the main thread can handle signals. So if the main thread is not the running thread when a signal happens, we don't want to set the eval breaker bit then, but we do want to set it when the main thread gets the GIL.

See #109846 (comment).

ericsnowcurrently · 2023-09-27T01:10:25Z

Python/ceval_gil.c

-        _PyEval_SignalAsyncExc(tstate->interp);
-    }
+    RESET_GIL_DROP_REQUEST(interp);
+    update_eval_breaker_from_thread(interp, tstate);


This seems less critical since we have the single source of truth now.

This is important, as _PY_CALLS_TO_DO_BIT, _PY_SIGNALS_PENDING_BIT and _PY_ASYNC_EXCEPTION_BIT are, to varying degrees, per-thread.
Whenever we switch thread we need to update them.

Yeah, it comes back to how we are being a little tricky with unsetting certain bits when the apply to the main thread and setting them again here when the main thread takes the GIL.

Python/ceval_gil.c

markshannon · 2023-09-29T14:23:39Z

I would have expected a slight speedup, due to the simpler check, but it doesn't seem to make a difference.
Nominally 0.2% slower, but in the noise

ericsnowcurrently · 2023-09-29T15:39:18Z

Python/ceval_gil.c

-        return -1;
+    if (_Py_eval_breaker_bit_is_set(interp, _PY_ASYNC_EXCEPTION_BIT)) {
+        _Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 0);
+        if (tstate->async_exc != NULL) {


Doesn't the bit being set imply an exception is set? If so, this should be an assert. Furthermore, PyThreadState_SetAsyncExc() should probably be fixed to unset _PY_ASYNC_EXCEPTION_BIT when NULL is passed in.

ericsnowcurrently · 2023-09-29T17:41:00Z

Python/ceval_gil.c

+    if (tstate->async_exc != NULL) {
+        _Py_set_eval_breaker_bit(interp, _PY_ASYNC_EXCEPTION_BIT, 1);
+    }


Let's see if I understand things correctly. A thread will only ever acquire the GIL either by the eval loop itself or via the C-API:

C-API¹

before the eval loop has started

while the eval loop is running

after the eval loop has finished

running eval loop--for some bytecode instructions²

On the other hand, all the "pending" actions³ happen only are performed in a running eval loop, driven by the eval breaker². The eval loop calls _Py_HandlePending(), which executes in this order:

handle signals, if any (main thread only)

make per-interpreter pending calls

make global pending calls (main thread only)

run GC⁴

release & re-acquire the GIL (if another thread has asked for it)

apply the async exception, if applicable (sort of thread-specific)

When the GIL is acquired via the C-API (outside a running eval loop), none of that will happen until at least one eval loop is running and it reaches one of the instructions that checks the eval breaker².

When the GIL is acquired by a running eval loop, the first 4 pending events have already just been handled, and the last one (async exception) is about to be.

With all that in mind, the question is: how does the effect of PyThreadState_SetAsyncExc() relate to acquiring the GIL? Hence, why should we worry about _PY_ASYNC_EXCEPTION_BIT in update_eval_breaker_from_thread()? PyThreadState_SetAsyncExc() always sets tstate->async_exc and then sets the eval breaker for the target interpreter. That eval breaker bit never gets cleared until the targeted thread takes care of event. So why do we need to set the bit when taking the GIL, regardless of if the thread is sleeping or not? It seems unnecessary.

That does lead to another question: why have the async exception be part of the eval breaker at all? It only only applies to the target thread, but the eval breaker bit is set it will interrupt all threads. Would it make sense to have a per-thread eval breaker? Then again, it only matters if async exceptions are used by the community often enough.

Footnotes

e.g. PyEval_RestoreThread() (after earlier releasing it), or for the first time when an interpreter is created or with threading.Thread.start(). ↩

the small set of instructions that invoke CHECK_EVAL_BREAKER(): the intermediate end of loops; after calls; sometimes for RESUME; _JUMP_TO_TOP and ENTER_EXECUTOR (whatever those are for). ↩ ↩² ↩³

which are driven by the events represented in this PR ↩

GC is also triggered in other places, like the signal-related C-API ↩

…d handling signals.

markshannon · 2023-10-02T13:13:03Z

Hence, why should we worry about _PY_ASYNC_EXCEPTION_BIT in update_eval_breaker_from_thread()? PyThreadState_SetAsyncExc() always sets tstate->async_exc and then sets the eval breaker for the target interpreter. That eval breaker bit never gets cleared until the targeted thread takes care of event. So why do we need to set the bit when taking the GIL, regardless of if the thread is sleeping or not? It seems unnecessary.

If async_exc is set for one thread, we don't want the eval breaker bit set when other threads are running, otherwise we constantly be calling _Py_HandlePending() to no effect, which would be terrible for performance.

I suspect that a per-thread eval-breaker would be more efficient and simpler. But that's for another PR.

ericsnowcurrently

I think we're on the same page about the possible per-thread eval breaker. So mostly LGTM.

I'm approving the PR, but please address the one new comment I've left before merging.

Python/ceval_gil.c

markshannon added 6 commits September 13, 2023 11:11

Remove nonsense thread checks

dba7b7e

Use one bit per check in eval_breaker

2b8766e

Add main callback enum

f7205ff

Put instrumentation version and eval-breaker flags into same word

4da67aa

Restore name of eval_breaker

4fa77bc

Use full word for monitoring. Avoids running out of versions on 64 bi…

b8258df

…t machines.

markshannon requested a review from pablogsal as a code owner September 25, 2023 14:17

bedevere-app bot added the awaiting review label Sep 25, 2023

bedevere-app bot mentioned this pull request Sep 25, 2023

Executors might ignore instrumentation. #109369

Closed

markshannon added 11 commits September 25, 2023 16:04

Merge branch 'main' into tidy-up-eval-breaker

44c7900

Remove some #includes

5589233

Fix a couple of sizes and make read atomic

66cd1c3

Keep eval_breaker synchronized with thread state

0b85608

Avoid using tstate when it might have been freed

558a4f9

Address review comments

36f85b7

Add news

9643290

Convert magic numbers to named constants

b6a49f5

Make sure that async exception bit is cleared when handling pending e…

94b0515

…vents.

Relax a load and use named const

0bb8bd2

Regen files

7c74435

ericsnowcurrently reviewed Sep 27, 2023

View reviewed changes

Python/ceval_gil.c Show resolved Hide resolved

Use named constant

92cf1ff

ericsnowcurrently mentioned this pull request Sep 29, 2023

test_threading failed: test_reinit_tls_after_fork() failed with "env changed" on GHA Address Sanitizer: process 19857 is still running after 300.4 seconds #110031

Closed

markshannon requested a review from ericsnowcurrently September 29, 2023 14:20

ericsnowcurrently reviewed Sep 29, 2023

View reviewed changes

markshannon added 2 commits October 2, 2023 04:39

Merge branch 'main' into tidy-up-eval-breaker

134ae82

Clarify distinction between main thread handling calls and main threa…

da0d844

…d handling signals.

ericsnowcurrently approved these changes Oct 2, 2023

View reviewed changes

Python/ceval_gil.c Outdated Show resolved Hide resolved

bedevere-app bot added awaiting core review and removed awaiting review labels Oct 2, 2023

markshannon force-pushed the tidy-up-eval-breaker branch 3 times, most recently from 200f545 to da0d844 Compare October 4, 2023 10:45

markshannon mentioned this pull request Oct 4, 2023

Merge checks for Python recursion limit and stack buffer overflow. faster-cpython/ideas#620

Open

markshannon merged commit bf4bc36 into python:main Oct 4, 2023
23 checks passed

bedevere-app bot removed the awaiting core review label Oct 4, 2023

gaogaotiantian mentioned this pull request Oct 12, 2023

Assertion failure in instrumentation during interpreter finalization #110752

Closed

swtaarrs mentioned this pull request Jan 23, 2024

Move the eval_breaker to PyThreadState #112175

Closed

swtaarrs mentioned this pull request Feb 9, 2024

gh-112175: Add eval_breaker to PyThreadState #115194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846

GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846

markshannon commented Sep 25, 2023 •

edited by bedevere-app bot

ericsnowcurrently Sep 27, 2023

markshannon Sep 27, 2023

ericsnowcurrently Sep 29, 2023

ericsnowcurrently Sep 29, 2023

ericsnowcurrently Sep 27, 2023

markshannon Sep 27, 2023

ericsnowcurrently Sep 29, 2023

ericsnowcurrently Sep 29, 2023

ericsnowcurrently Sep 29, 2023

ericsnowcurrently Sep 27, 2023

markshannon Sep 27, 2023

ericsnowcurrently Sep 29, 2023

ericsnowcurrently Sep 27, 2023

markshannon Sep 27, 2023 •

edited

ericsnowcurrently Sep 29, 2023

markshannon commented Sep 29, 2023

ericsnowcurrently Sep 29, 2023

ericsnowcurrently Sep 29, 2023

markshannon commented Oct 2, 2023

ericsnowcurrently left a comment

		calls_to_do = _Py_atomic_load_int32_relaxed(
		&_PyRuntime.ceval.pending_mainthread.calls_to_do);

GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846

GH-109369: Merge all eval-breaker flags and monitoring version into one word. #109846

Conversation

markshannon commented Sep 25, 2023 • edited by bedevere-app bot

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon Sep 27, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon commented Sep 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

markshannon commented Oct 2, 2023

ericsnowcurrently left a comment

Choose a reason for hiding this comment

markshannon commented Sep 25, 2023 •

edited by bedevere-app bot

markshannon Sep 27, 2023 •

edited