Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.11] gh-106883 Fix deadlock in threaded application #117332

Closed
wants to merge 5 commits into from

Conversation

diegorusso
Copy link
Contributor

@diegorusso diegorusso commented Mar 28, 2024

When using threaded applications, there is a high risk of a deadlock in the intepreter.
It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL.

It has been suggested to disable GC during the
_PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls.

When using threaded applications, there is a high risk of a deadlock in
the intepreter.
It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL.

It has been suggested to disable GC during the
_PyThread_CurrentFrames() and _PyThread_CurrentExceptions() calls.
@pablogsal
Copy link
Member

This will break users that have disabled the gc, we need to do the dance only if it has not been disabled

@pablogsal
Copy link
Member

Also there are return paths for errors where this leaves the gc disabled

@diegorusso
Copy link
Contributor Author

diegorusso commented Mar 28, 2024

@pablogsal thanks for your comments! Of course this was more an "attempt" to check if the approach was in the right direction and to check if the C API I used are the correct ones. Next week I will produce a more complete patch.

@colesbury colesbury self-requested a review March 28, 2024 19:35
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is the right general approach. I left some inline comments to expand on what @pablogsal wrote.

Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
diegorusso and others added 3 commits April 5, 2024 15:03
When using threaded applications, there is a high risk of a deadlock in
the intepreter.
It's a lock ordering deadlock with HEAD_LOCK(&_PyRuntime); and the GIL.

By disabling the GC during the _PyThread_CurrentFrames() and
_PyThread_CurrentExceptions() calls fixes the issue.
@diegorusso
Copy link
Contributor Author

@pablogsal @colesbury I've updated the PR addressing your comments and adding a test to verify the patch. Please let me know what you think.

@diegorusso
Copy link
Contributor Author

It is currently failing on the Address Sanitiser build, I'm investigating what the problem is.

@diegorusso diegorusso marked this pull request as draft April 5, 2024 17:13
@diegorusso diegorusso marked this pull request as ready for review April 9, 2024 10:31
@diegorusso
Copy link
Contributor Author

@pablogsal @colesbury

I have pushed a new test for it. It is slower compare to the existent tests: it takes about 10 seconds on my machine (Parallels on MBP) when it succeeds. If it fails (it shouldn't) then it's the timeout of 40 seconds.
When used in the build with the address sanitizer enabled, it is skipped because it will be way slower.
My questions to you are: do we really need the test? if yes, can we mark it as slow?

Thoughts?

@pablogsal
Copy link
Member

pablogsal commented Apr 9, 2024

We have a thing to consider which is that 3.11.9 was the last bug fix release of 3.11 and that is already released. I could make an extra bug fix release to include a fix for this PR but technically this is too late. I will check with the other release managers to see what they think

@diegorusso
Copy link
Contributor Author

Sure, no problem. Whatever the outcome is, it has been entertaining debugging it :)

@arhadthedev arhadthedev added the pending The issue will be closed if no feedback is provided label Apr 10, 2024
@diegorusso
Copy link
Contributor Author

I just want to give a gentle ping on this PR. What will be its fate? Merge or close? :) Joking apart, it would be nice to have a closure.

@pablogsal
Copy link
Member

pablogsal commented May 28, 2024

Unfortunately unless you want to argue that this is a security fix, we cannot backport bug fixes anymore to 3.11.

If this issue is something that's affecting a considerable number of users I am willing to reconsider, though. But in any case the release will be folded with the next security release of 3.11.

@diegorusso
Copy link
Contributor Author

Hello, I don't think I can argue that this is a security fix (it isn't :) ) but also I left a comment on dask/distributed#8616 issue as they bumped into the same problem. I requested them to comment directly here: if they have found a way to workaround the issue, we can just close the PR otherwise we need to understand what the blockers are and what's the magnitude of the issue.

@pablogsal
Copy link
Member

Let's check what the other RMs think just to get some consensus on this.

CC @ambv @Yhg1s @hugovk

@diegorusso diegorusso deleted the branch python:3.11 June 25, 2024 15:34
@diegorusso diegorusso closed this Jun 25, 2024
@diegorusso diegorusso deleted the 3.11 branch June 25, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review pending The issue will be closed if no feedback is provided
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants