Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-112529: Implement GC for free-threaded builds #114262

Merged
merged 7 commits into from Jan 25, 2024

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Jan 18, 2024

This implements the GC for free-threaded builds. The free-threading GC follows the same basic algorithms as the existing GC, but operates on different data types. Specifically:

  • GC-enabled objects are found by using mimalloc APIs to traverse heaps (instead of the PyGC_Head linked list)
  • The ob_tid field (thread id) is abused for computing gc_refs as well as for "worklists" that keep track of the "unreachable" set of objects, objects with legacy finalizers, and weakref callbacks.
  • The marking stack uses _PyObjectStack, a linked-list of fixed sized chunks. This requires memory allocation during GC for marking. It would be possible to avoid this, but at the cost of extra scans over the whole heap, which I don't think is worthwhile. For context, GC implementations in other languages, like OpenJDK and Go use essentially the same data structure for marking.
  • The resurrection check uses ob_ref_local for computing gc_refs because ob_tid is in use for the "unreachable" worklist

There is a bunch of clean-up and improvements that I'd like to defer to later PRs to keep this size of this manageable:

  • Define NUM_GENERATIONS to 1 in free-threaded builds. As written, every collection in the free-threaded build is a full collection, but we still behave as if we have three generations in a few places.
  • Remove the now unused PyGC_Head pre-header in free-threaded builds
  • Thread-safe counting of the number of allocated objects (used to schedule GCs)
  • Refactor out common code from Python/gc.c and Python/gc_free_threading.c once Mark's incremental GC is landed
  • Add stop-the-world calls (needs gh-111964: Implement stop-the-world pauses #112471)

This implements a mark and sweep GC for the free-threaded builds of
CPython. The implementation relies on mimalloc to find GC tracked
objects (i.e., "containers").
@colesbury colesbury marked this pull request as ready for review January 22, 2024 21:52
@colesbury colesbury requested a review from a team as a code owner January 22, 2024 21:52
@colesbury
Copy link
Contributor Author

@DinoV, @pablogsal, @nascheme - I think this is ready for review now.

There is duplicate code between Python/gc.c and Python/gc_free_threading.c for now. I'd like to avoid any substantial refactoring, while @markshannon's incremental GC PR is still outstanding, even at the cost of some duplicate code.

I'll put up a PR for the devguide as well.

@nascheme
Copy link
Member

The design, structure and code style look good to me.

A couple of minor suggestions. Next to the definition of ob_tid, I think it should also mention that this field is (ab)used for the linked-list and for the GC refs count. Instead of from_block() I'd prefer a bit more descriptive name, e.g. op_from_block().

I think using the marking stack is okay, despite the extra allocations needed. If it is turns out to be a problem, e.g. a program with a very deeply linked object structure, a fairly simple fix is as follows. Limit the size of the stack. If the size is exceeded, re-start the marking process. An additional refinement is to have a "already marked" bit on each mimalloc page and that way if you re-start, those pages can be quickly skipped.

Note that the PyGC_Head info cannot be removed unless a different mechanism for the "trashcan" is made. It could use the _PyObjectStack structure as well.

@colesbury
Copy link
Contributor Author

Thanks for the review Neil. I've renamed from_block to op_from_block and added a comment to the ob_tid definition.

I think using the marking stack is okay, despite the extra allocations needed. If it is turns out to be a problem... a fairly simple fix is as follows...

Yeah, that makes sense.

Note that the PyGC_Head info cannot be removed unless a different mechanism for the "trashcan" is made.

In a subsequent change (not yet a PR), I'm using ob_tid for the trashcan: colesbury@9e06349, which avoids having to worry about allocation failures. I expect to use _PyObjectStack for the (not yet implemented) biased reference counting an inter-thread queue, where ob_tid cannot be (ab)used as a linked list pointer.

}

static int
gc_visit_heaps_locked(PyInterpreterState *interp, mi_block_visit_fun *visitor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to call this gc_visit_heaps_lock_held per our discussion on naming these style of functions?

if (_PyObject_GC_IS_TRACKED(op) && !_Py_IsImmortal(op)) {
// If update_refs hasn't reached this object yet, mark it
// as (tentatively) unreachable and initialize ob_tid to zero.
if (!gc_is_unreachable(op)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to factor this into a helper like gc_init_refs or gc_maybe_init_refs that is used here and in update_refs

if (PyTuple_CheckExact(op)) {
_PyTuple_MaybeUntrack(op);
if (!_PyObject_GC_IS_TRACKED(op)) {
if (gc_is_unreachable(op)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could similarly be a gc_restore_refs or gc_maybe_restore_refs and used below

return NULL;
}

// Append all objects to a worklist. This abuses ob_tid. We will restore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth stopping the world here? It seems like we're fine to race with ob_tid from a correctness stand point, but it seems like a lot of objects are going to end up with merged ref counts in a multithreaded environment!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to stop the world for the heap traversal (gc_visit_heaps) to be thread-safe... which I forgot to add here. I'll also add an assert to gc_visit_heaps() that the world is stopped.

This doesn't merge most refcounts. They get restored from the mimalloc data structure. They'll only get merged if the owning thread has already exited.

// it later. NOTE: We can't append to the list during gc_visit_heaps()
// because PyList_Append() may reclaim an abandoned mimalloc segment
// while we are traversing it.
struct get_objects_args args = { 0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto to get referrers


op->ob_tid = 0;
op->ob_ref_local = 0;
op->ob_ref_shared = _Py_REF_SHARED(refcount, _Py_REF_MERGED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this needs to be an atomic operation? Can this just be a call to _Py_ExplicitMergeRefcount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other threads in the interpreter must be paused, so no atomics necessary. I'll add an assert here.

We could probably use _Py_ExplicitMergeRefcount(), but it seems convenient to avoid the atomic operations.


// Clear weakrefs and enqueue callbacks (but do not call them).
clear_weakrefs(state);
_PyEval_StartTheWorld(interp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think if I understand this correctly what's going to happen with the thread ID and any reference count changes that may happen at this point, but the assumption is that the thread ID is never going to actually clash with an object reference in the work list. And then anything which does have a reference count operation will just become merged when we restore it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the key assumption is that _PyThread_Id() is itself a pointer to a distinct object so never conflicts with an object in the worklists. The implementation of _PyThread_Id() uses the address of the platform's thread control block or equivalent.

On the GC side, we abuse ob_tid for two purposes:

  • gc_refs, which can have arbitrary values (might conflict!), but this is only happens while other threads are paused, so nobody sees it.
  • worklist pointers, which must not conflict with _PyThread_Id()

Some worklists are only used while other threads are paused (e.g., in _PyGC_GetObjects()). Other worklists are created when threads are paused, but still used while other threads may be running (e.g., unreachable, wrcb_to_call).

If the worklist continues to be used while other threads are running, then there are two other important considerations:

  • The worklist holds a strong reference to the object so that it can't be freed while in the worklist.
  • The object's reference count fields are merged before adding it to the worklist. This avoids some other thread trying to merge the refcount fields, which would be messy. (It also makes the deallocation happen more quickly when we get around to calling tp_clear).

@colesbury
Copy link
Contributor Author

Here's the PR so far for the devguide: python/devguide#1263

@DinoV DinoV merged commit b52fc70 into python:main Jan 25, 2024
34 checks passed
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
* pythongh-112529: Implement GC for free-threaded builds

This implements a mark and sweep GC for the free-threaded builds of
CPython. The implementation relies on mimalloc to find GC tracked
objects (i.e., "containers").
@colesbury colesbury deleted the gh-112529-gc branch February 15, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants