Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a public, fail-able Py_INCREF for the free-threaded build #113920

Open
dpdani opened this issue Jan 10, 2024 · 7 comments
Open

Add a public, fail-able Py_INCREF for the free-threaded build #113920

dpdani opened this issue Jan 10, 2024 · 7 comments
Labels
topic-free-threading type-feature A feature request or enhancement

Comments

@dpdani
Copy link

dpdani commented Jan 10, 2024

Feature or enhancement

Proposal:

C extensions targeting the free-threaded build will incur the issue of trying to incref a PyObject that's being concurrently decrefed, and possibly deallocated, by another thread.

This is a normal situation in concurrent reference counting, but ATM in main there's no public C API to handle it.
To handle references to objects that may be concurrently de-allocated, there needs to be a routine that attempts to increment the reference count and returns a failure status when such incref is not safe.
The caller is then supposed to handle the failure according to its own logic.

There is a function called _Py_TRY_INCREF in the original nogil fork that does precisely this and is mentioned in PEP-703, but is marked as private.
The implementation in the nogil-3.12 fork differs, and there additionally is another function, with different semantics.

I'll briefly provide a somewhat concrete example taken from my library cereggii here (makes use of nogil-3.9). Consider the case where a reference to a PyObject is stored inside another PyObject,1 and that can be modified by more than one thread:

PyObject *AtomicRef_Get(AtomicRef *self)
{
    PyObject *reference;
    reference = self->reference;  // L1
    while (!_Py_TRY_INCREF(reference)) {  // L2
        reference = self->reference;
    }
    return reference;
}

Between lines L1 and L2 it could be that another thread decrements the refcount of the referenced object and deallocates it.

If Py_INCREF was used (unconditionally) instead of _Py_TRY_INCREF, the interpreter could encounter a segmentation fault if the memory page was freed, or, possibly even worse, the location at which the referenced object was stored could be recycled for another PyObject and suddenly this example method returns an object that was never stored inside self->reference.

This issue is of course due to the fact that no lock protects the reference count of objects and their deallocation, and is a normal issue of concurrent reference counting.

I know there's a lot of work still to be done on the free-threaded build, but I think there can be a general-enough interest among C extension authors to have such a fail-able Py_INCREF public ABI at disposal to justify adding this item to the to-do list.

During a brief exchange with @colesbury, he noted to me that the implementation present in nogil-3.12 is quite difficult to explain, and thus a bad candidate for a public ABI.
Though, the public macro doesn't necessarily have to be what CPython uses internally, it could just as well be something purposefully made for C extensions.

I'm willing to invest some time in porting the conditional incref functions of nogil-3.12 into main, and possibly add another public C API, if it can be helpful. (This isn't specifically tracked in #108219, I think Sam wants to implement it before or while working on dict/list.)

Also, notice that compatibility with the default build can achieved trivially:

#if !defined(Py_NOGIL)
#  define Py_TRY_INCREF Py_INCREF
#endif

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

https://discuss.python.org/t/towards-a-python-util-concurrent/41325/16

Footnotes

  1. As it is in this case, but the same reasoning applies to any other container data structure as well.

@dpdani dpdani added the type-feature A feature request or enhancement label Jan 10, 2024
@dpdani
Copy link
Author

dpdani commented Jan 10, 2024

Suppose the new public function is Py_TRY_INCREF(o). The expected semantics would be:

  • returns 1 upon successfully incrementing the reference count of o
  • returns 0 if the increment could not be performed, because:
    • the refcount of o is 0,
    • o is in the process of being finalized/deallocated, or
    • possibly, other conditions I might be missing?

Note that it doesn't return 0 if the increment needed to be retried before succeeding.


I have an open question on how the ABA problem is avoided in the current refcounting implementation for the free-threaded build, might get back to edit this message after some research.

@dpdani
Copy link
Author

dpdani commented Mar 12, 2024

I've only skimmed through the work on QSBR (gh-115103) and it seems to me like a good candidate to implement this special incref.

That is, IIUC, the failable incref could come be something quite trivial, like:

  1. check the reference count > 0, if not then fail;
  2. try incrementing it; and report success/failure.

This should be enough because a concurrent call to Py_DECREF would not free any object until a quiescent state is reached, and it cannot be reached during a call to this routine because the thread executing it cannot yet have yielded to the eval breaker. (Please, do correct me if I'm wrong.)

It is implicitly expected that the object pointer is read just before this call, in the sense that the read and the incref cannot be interleaved by some code that may reach a quiescent state.
The signature could thus be changed to something like:

PyObject *Py_TRY_INCREF_AND_FETCH(PyObject **ref_holder);

which could return NULL to signal a failure. And calling it could look like:

PyObject *ob_ref = Py_TRY_INCREF_AND_FETCH(&spam->ob_ref);

if (ob_ref != NULL) {
    ...
}

Possibly the implementation could be something along these lines:

PyObject *
Py_TRY_INCREF_AND_FETCH(PyObject **ref_holder)
{
    read:
    PyObject *ref = *ref_holder;
    if (ref->ob_ref_local + ref->ob_ref_shared == 0) {
        return NULL;
    }
    if (_Py_atomic_add_uint32(&ref->ob_ref_shared, 1)) {
        return ref;
    }
    goto read;
}

@colesbury wdyt?

@colesbury
Copy link
Contributor

If I understand correctly, your proposal requires that objects have their deallocation delayed until safe using QSBR. We do that for a few objects under specific circumstances, but not for most objects. In general, we want to free objects as soon as their refcounts reach zero -- delaying destruction increases memory usage.

@dpdani
Copy link
Author

dpdani commented Mar 13, 2024

I see. Would it be feasible to have the option to opt-into QSBR for some objects?

edit: If it would be, I'd be glad to work on it.

@colesbury
Copy link
Contributor

It is technically feasible, but I don't think it's a good candidate for a public API at this time.

@dpdani
Copy link
Author

dpdani commented Mar 13, 2024

And you say that because you think the QSBR itself is subject to change?
Or do you consider a first free-threading release to be required to know how a public API would look like?

@colesbury
Copy link
Contributor

colesbury commented Mar 13, 2024

It's a combination of things:

  • The implementation may change in the future
  • I'm concerned that implementation details may leak through. In other words, callers may depend on how soon object is deallocated, or the internal batching we use, or other similar details that I haven't thought enough about.
  • The semantics are complex, which makes them difficult to describe accurately
  • I'm not yet convinced that there are a sufficient number of use cases to warrant a new public API
  • I think there are more important and better candidates for free-threading related public C-APIs, like PyMutex and critical sections that should be addressed first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants