Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-110481: Implement _Py_DECREF_NO_DEALLOC for free-threaded build #111560

Closed
wants to merge 4 commits into from

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Oct 31, 2023

I simply implement _Py_DECREF_NO_DEALLOC for the free-threaded build.
I assumed refcount will not be zero even after calling _Py_DECREF_NO_DEALLOC.

Please let me know if I missed something.

@corona10 corona10 changed the title gh-110481: Implement _Py_DECREF_NO_DEALLOC for free-threaded build [WIP] gh-110481: Implement _Py_DECREF_NO_DEALLOC for free-threaded build Oct 31, 2023
@corona10 corona10 changed the title [WIP] gh-110481: Implement _Py_DECREF_NO_DEALLOC for free-threaded build gh-110481: Implement _Py_DECREF_NO_DEALLOC for free-threaded build Oct 31, 2023
Py_ssize_t refcount = _Py_atomic_load_ssize_relaxed(&op->ob_ref_shared);
Py_ssize_t new_shared;
// Shared refcount can be zero but we should consider local refcount.
int should_queue = (refcount == 0 || refcount == _Py_REF_MAYBE_WEAKREF);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesbury Out of curiosity, Don't we have to consider that shared_refcount can be a negative value at this moment due to imbalance refcounting?

Same question for the

should_queue = (shared == 0 || shared == _Py_REF_MAYBE_WEAKREF);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand your question. Yes, shared_refcount may be negative. should_queue is always false if shared is negative because:

  1. we only queue objects once
  2. we queue them the first time the shared refcount becomes negative

So if it's already negative then we must have already queued it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if it's already negative then we must have already queued it.

Make sense, thank you for explain.

assert(refcount != 0);
refcount--;
_Py_atomic_store_uint32_relaxed(&op->ob_ref_local, refcount);
if (refcount == 0) {
Copy link
Member Author

@corona10 corona10 Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question, do we have to handle zero local refcounting cases from _Py_DECREF_NO_DEALLOC
or it can be handled as deferred merging from somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you need _Py_MergeZeroLocalRefcount. You can't defer it -- that would break a bunch of invariants. For example, the same thread may try calling Py_DECREF again leading to a negative local refcount.

@corona10 corona10 closed this Oct 31, 2023
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this change will actually improve performance. In the default build, _Py_DECREF_NO_DEALLOC can avoid a comparison and avoids emitting a function call. With Py_NOGIL, we still need the comparison and call to _Py_MergeZeroLocalRefcount.

In the end, I think the Py_NOGIL version of _Py_DECREF_NO_DEALLOC may look exactly like Py_DECREF.

assert(refcount != 0);
refcount--;
_Py_atomic_store_uint32_relaxed(&op->ob_ref_local, refcount);
if (refcount == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you need _Py_MergeZeroLocalRefcount. You can't defer it -- that would break a bunch of invariants. For example, the same thread may try calling Py_DECREF again leading to a negative local refcount.

Py_ssize_t refcount = _Py_atomic_load_ssize_relaxed(&op->ob_ref_shared);
Py_ssize_t new_shared;
// Shared refcount can be zero but we should consider local refcount.
int should_queue = (refcount == 0 || refcount == _Py_REF_MAYBE_WEAKREF);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand your question. Yes, shared_refcount may be negative. should_queue is always false if shared is negative because:

  1. we only queue objects once
  2. we queue them the first time the shared refcount becomes negative

So if it's already negative then we must have already queued it.

@corona10 corona10 reopened this Oct 31, 2023
@corona10
Copy link
Member Author

corona10 commented Oct 31, 2023

I'm not sure this change will actually improve performance. In the default build, _Py_DECREF_NO_DEALLOC can avoid a > comparison and avoids emitting a function call. With Py_NOGIL, we still need the comparison and call to > _Py_MergeZeroLocalRefcount.

In the end, I think the Py_NOGIL version of _Py_DECREF_NO_DEALLOC may look exactly like Py_DECREF.

I totally agree. If we consider edge cases of _Py_DECREF_NO_DEALLOC, it will be similar to Py_DECREF of Py_NOGIL version.

I close the PR, and if there is a way to improve the performance, I will re-investigate it.
Thank you!

@corona10 corona10 closed this Oct 31, 2023
@corona10 corona10 deleted the _Py_DECREF_NO_DEALLOC branch November 1, 2023 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants