Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-111964: Implement stop-the-world pauses #112471

Merged
merged 5 commits into from
Jan 23, 2024

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Nov 27, 2023

The --disable-gil builds occasionally need to pause all but one thread. Some examples include:

  • Cyclic garbage collection, where this is often called a "stop the world event"
  • Before calling fork(), to ensure a consistent state for internal data structures
  • During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects

This adds the following functions to implement global and per-interpreter pauses:

  • _PyRuntimeState_StopTheWorld and _PyRuntimeState_StartTheWorld
  • _PyInterpreterState_StopTheWorld and _PyInterpreterState_StartTheWorld

These functions are no-ops outside of the --disable-gil build.

This also adds _PyRWMutex, a "readers-writer" lock, which is used to serialize global stop-the-world pauses with per-interpreter pauses.

@colesbury
Copy link
Contributor Author

@eendebakpt - thanks!

@colesbury colesbury marked this pull request as ready for review November 27, 2023 22:21
@gvanrossum
Copy link
Member

Mark and I are currently on vacation. I'm not back until after Dec 8. Mark should be back this week.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change mostly makes sense, but it took me a little while to connect all the pieces in my head. I've left various comments related to that and to a number of other things that might be worth adjusting.

Include/internal/pycore_lock.h Outdated Show resolved Hide resolved
Python/parking_lot.c Outdated Show resolved Hide resolved
Include/internal/pycore_lock.h Outdated Show resolved Hide resolved
Include/internal/pycore_lock.h Outdated Show resolved Hide resolved
Python/lock.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Show resolved Hide resolved
@bedevere-app
Copy link

bedevere-app bot commented Dec 6, 2023

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Contributor Author

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split off the _PyRWMutex PR here: #112859. (This is now rebased on that one.)

I don't think the stop-the-world functionality belongs in ceval related files. It's fundamentally about managing PyThreadState objects (particularly, the "state" field) and is closely tied to the existing _PyThreadState_Attach/Detach functions. The connection to eval_breaker is only a small piece, similar to how the GC has an eval breaker bit.

Comment on lines 42 to 58
// Support for stop-the-world events. This exists in both the PyRuntime struct
// for global pauses and in each PyInterpreterState for per-interpreter pauses.
struct _stoptheworld_state {
PyMutex mutex; // Serializes stop-the-world attempts.

// NOTE: The fields below are protected by HEAD_LOCK(runtime), not by the
// above mutex.
bool requested; // Set when a pause is requested.
bool world_stopped; // Set when the world is stopped.
bool is_global; // Set when contained in PyRuntime struct.

PyEvent stop_event; // Set when thread_countdown reaches zero.
Py_ssize_t thread_countdown; // Number of threads that must pause.

PyThreadState *requester; // Thread that requested the pause (may be NULL).
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only loosely related to the eval_breaker. It's fundamentally about PyThreadState.state: transitioning threads to the _Py_THREAD_GC state. You could still have this without eval-breaker (it would just be much less responsive), but you can't implement it without PyThreadState.state.

Include/internal/pycore_lock.h Outdated Show resolved Hide resolved
Include/internal/pycore_lock.h Outdated Show resolved Hide resolved
Include/internal/pycore_lock.h Outdated Show resolved Hide resolved
Include/internal/pycore_runtime.h Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/lock.c Outdated Show resolved Hide resolved
Python/lock.c Outdated Show resolved Hide resolved
Comment on lines 157 to 175
// Perform a stop-the-world pause for all threads in all interpreters.
//
// Threads in the "attached" state are paused and transitioned to the "GC"
// state. Threads in the "detached" state switch to the "GC" state, preventing
// them from reattaching until the stop-the-world pause is complete.
//
// NOTE: This is a no-op outside of Py_GIL_DISABLED builds.
extern void _PyRuntimeState_StopTheWorld(_PyRuntimeState *runtime);
extern void _PyRuntimeState_StartTheWorld(_PyRuntimeState *runtime);

// Perform a stop-the-world pause for threads in the specified interpreter.
//
// NOTE: This is a no-op outside of Py_GIL_DISABLED builds.
extern void _PyInterpreterState_StopTheWorld(PyInterpreterState *interp);
extern void _PyInterpreterState_StartTheWorld(PyInterpreterState *interp);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I like the common prefix for the global and per-interpreter APIs.
  • I don't think we should combine the APIs because you don't need a thread-state or interpreter state to start a global stop the world pause.

Comment on lines 32 to 36
// (bound thread) (stop-the-world thread)
// [attached] <-> [detached] <-> [gc]
// [attached] <-> [detached] <-> [suspended]
// + ^
// +--------------------------------------------------------+
// (bound thread)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the text above makes it sound like thread a thread can go from "suspended" straight to "attached".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, (bound thread) shows up at the top of the diagram and at the bottom. I'm not sure how to interpret that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By text above do you mean the diagram or the paragraph above it? I've updated both in an attempt to make them more clear.

The "(bound thread)" and "(stop-the-world thread)" labels indicate which thread can perform the transition. It's at the bottom because it's labeling the one-way transition from "attached" to "suspended".

Comment on lines 156 to 159
// If there is no stop-the-world pause in progress, then this function is
// a no-op. Returns one if the thread was switched to the "suspended" state and
// zero otherwise.
extern int _PyThreadState_Suspend(PyThreadState *tstate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to fall back to detaching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's conceptually simpler.

Include/internal/pycore_runtime.h Show resolved Hide resolved
Python/pystate.c Show resolved Hide resolved
Comment on lines 157 to 175
// Perform a stop-the-world pause for all threads in all interpreters.
//
// Threads in the "attached" state are paused and transitioned to the "GC"
// state. Threads in the "detached" state switch to the "GC" state, preventing
// them from reattaching until the stop-the-world pause is complete.
//
// NOTE: This is a no-op outside of Py_GIL_DISABLED builds.
extern void _PyRuntimeState_StopTheWorld(_PyRuntimeState *runtime);
extern void _PyRuntimeState_StartTheWorld(_PyRuntimeState *runtime);

// Perform a stop-the-world pause for threads in the specified interpreter.
//
// NOTE: This is a no-op outside of Py_GIL_DISABLED builds.
extern void _PyInterpreterState_StopTheWorld(PyInterpreterState *interp);
extern void _PyInterpreterState_StartTheWorld(PyInterpreterState *interp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main reservation is that the PyInterpreterState_ and PyRuntimeState_ prefixes are used for either simple getters/setters or struct lifecycle. For anything beyond that we either use a Py_ prefix or a prefix that relates to a specific feature. (We're a bit freer with the PyThreadState_ prefix, but I'm okay with that.) In this case I got the sense that this is a ceval feature, hence suggesting the _PyEval_ prefix.

Regardless, let's find a prefix other than _Py*State_.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly things are looking good. There's the _PyRWMutex PR, I have one comment here, and there are a couple still open from my earlier review.

Copy link
Contributor Author

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the functions to _PyEval_StopTheWorldAll, etc. at least as a placeholder. I couldn't think of anything better. We could also go with only the _Py prefix.

Comment on lines 32 to 36
// (bound thread) (stop-the-world thread)
// [attached] <-> [detached] <-> [gc]
// [attached] <-> [detached] <-> [suspended]
// + ^
// +--------------------------------------------------------+
// (bound thread)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By text above do you mean the diagram or the paragraph above it? I've updated both in an attempt to make them more clear.

The "(bound thread)" and "(stop-the-world thread)" labels indicate which thread can perform the transition. It's at the bottom because it's labeling the one-way transition from "attached" to "suspended".

Comment on lines 156 to 159
// If there is no stop-the-world pause in progress, then this function is
// a no-op. Returns one if the thread was switched to the "suspended" state and
// zero otherwise.
extern int _PyThreadState_Suspend(PyThreadState *tstate);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's conceptually simpler.

Python/pystate.c Show resolved Hide resolved
The `--disable-gil` builds occasionally need to pause all but one thread. Some
examples include:

* Cyclic garbage collection, where this is often called a "stop the world event"
* Before calling `fork()`, to ensure a consistent state for internal data structures
* During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects

This adds the following functions to implement global and per-interpreter pauses:

* `_PyRuntimeState_StopTheWorld` and `_PyRuntimeState_StartTheWorld`
* `_PyInterpreterState_StopTheWorld` and `_PyInterpreterState_StartTheWorld`

These functions are no-ops outside of the `--disable-gil` build.

This also adds `_PyRWMutex`, a "readers-writer" lock, which is used to
serialize global stop-the-world pauses with per-interpreter pauses.
Starting in the "suspended" state blocks any _PyThreadState_Attach()
call until the stop-the-world event completes.
_PyThreadState_Suspend now switches to "detached" if there is no active
stop-the-world request.
@colesbury
Copy link
Contributor Author

@ericsnowcurrently, I've rebased the PR onto main to get the changes from gh-112859 (_PyRWMutex).

@colesbury
Copy link
Contributor Author

Hi @ericsnowcurrently - happy New Year! Would you please look this over and let me know if there are any issues that still need to be addressed?

I don't think we came to a decision on the names of _PyEval_StopTheWorldAll(), etc. functions. Especially as these are internal-only, I'm happy to go with whatever naming you prefer.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. LGTM

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +2057 to +2087
void
_PyEval_StopTheWorldAll(_PyRuntimeState *runtime)
{
#ifdef Py_GIL_DISABLED
stop_the_world(&runtime->stoptheworld);
#endif
}

void
_PyEval_StartTheWorldAll(_PyRuntimeState *runtime)
{
#ifdef Py_GIL_DISABLED
start_the_world(&runtime->stoptheworld);
#endif
}

void
_PyEval_StopTheWorld(PyInterpreterState *interp)
{
#ifdef Py_GIL_DISABLED
stop_the_world(&interp->stoptheworld);
#endif
}

void
_PyEval_StartTheWorld(PyInterpreterState *interp)
{
#ifdef Py_GIL_DISABLED
start_the_world(&interp->stoptheworld);
#endif
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think these and the related static functions make more sense in Python/ceval_gil.c, but not enough to hold up this PR.

We can revisit some the discussion (including some of the alternate names I suggested) separately.

@colesbury
Copy link
Contributor Author

Thanks @ericsnowcurrently! Would you please merge the PR?

@ericsnowcurrently ericsnowcurrently merged commit 441affc into python:main Jan 23, 2024
31 of 32 checks passed
@ericsnowcurrently
Copy link
Member

Thanks, @colesbury!

aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
The `--disable-gil` builds occasionally need to pause all but one thread.  Some
examples include:

* Cyclic garbage collection, where this is often called a "stop the world event"
* Before calling `fork()`, to ensure a consistent state for internal data structures
* During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects

This adds the following functions to implement global and per-interpreter pauses:

* `_PyEval_StopTheWorldAll()` and `_PyEval_StartTheWorldAll()` (for the global runtime)
* `_PyEval_StopTheWorld()` and `_PyEval_StartTheWorld()` (per-interpreter)

(The function names may change.)

These functions are no-ops outside of the `--disable-gil` build.
@colesbury colesbury deleted the stop-the-world branch February 15, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants