Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-44422: Fix threading.enumerate() reentrant call #26727

Merged
merged 3 commits into from
Jun 15, 2021
Merged

bpo-44422: Fix threading.enumerate() reentrant call #26727

merged 3 commits into from
Jun 15, 2021

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jun 14, 2021

The threading.enumerate() function now uses a reentrant lock to
prevent a hang on reentrant call.

https://bugs.python.org/issue44422

@vstinner
Copy link
Member Author

@pablogsal @pitrou @serhiy-storchaka @methane: I'm not sure about this change, would you mind to have a look?

@vstinner
Copy link
Member Author

If the CI tests pass, I will try the buildbot label to check on more platforms.

#
# bpo-44422: Use a reentrant lock to allocate reentrant calls to functions like
# threading.enumerate().
_active_limbo_lock = RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RLock() is the right way to go to prevent self-deadlock.

We could temporarily turn GC off but that seems hard to do in a way that reliably restores the prior enabled/disabled mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could temporarily turn GC off but that seems hard to do in a way that reliably restores the prior enabled/disabled mode.

Sure, that's the other option that I considered. But gc.disable() is process-wide: it affects all Python threads, and so it might have surprising side effects. Some code might rely on the current exact GC behavior.

Another option would be to rewrite the code in C. The problem is that other functions rely on this lock, like active_count(). I would prefer to not have to rewrite "half" of threading.py in C. Using a RLock is less intrusive.

@vstinner vstinner added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 14, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @vstinner for commit 2907ca88719503493acd930790ffdb978e73dbbb 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 14, 2021
@pitrou
Copy link
Member

pitrou commented Jun 14, 2021

(@iritkatriel : a concrete example of GC-induced reentrancy issue)

@vstinner
Copy link
Member Author

vstinner commented Jun 14, 2021

I ran manual tests on this PR (my laptop has 8 logical CPUs, 4 physical Intel CPUs):

  • Run ./python rec_threading.py: SUCCESS (rec_threading.py is attached to bpo-44422)
  • Run ./python -m test -j10 -r twice: SUCCESS
  • Run ./python -m test test_threading -F -j10 --timeout=60 for 15 minutes: SUCCESS (no timeout)

@vstinner
Copy link
Member Author

Oops, there is a typo in a comment :-( "Use a reentrant lock to allocate reentrant calls": you should read "to allow". I will update my PR later, but I prefer to wait until the buildbot jobs complete.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, a RLock seems like the best effort solution here. Presumably, only the non-mutating code protected by _active_limbo_lock can be called reentrantly (why would a reentrate call mutate the locks table?).

Still, a reminder that a lot of things can unfortunately happen in destructors and trigger reentrancy into unsuspecting code.

@vstinner
Copy link
Member Author

(@iritkatriel : a concrete example of GC-induced reentrancy issue)

These bugs are surprising. Nobody expects the GC reentrancy! In OpenStack, it's even more surprising since threading.current_thread() is monkey-patched for even more fun!

@pitrou
Copy link
Member

pitrou commented Jun 14, 2021

In OpenStack, it's even more surprising since threading.current_thread() is monkey-patched for even more fun!

Really? For what kind of fun? :-o

@vstinner
Copy link
Member Author

Really? For what kind of fun? :-o

Python became too deterministic and boring. It's time to make it non deterministic again!

@vstinner
Copy link
Member Author

9 Refleak builds failed:

  • buildbot/AMD64 Fedora Stable Refleaks PR
  • buildbot/AMD64 RHEL7 Refleaks PR
  • buildbot/AMD64 RHEL8 Refleaks PR
  • buildbot/PPC64LE Fedora Stable Refleaks PR
  • buildbot/PPC64LE RHEL7 Refleaks PR
  • buildbot/PPC64LE RHEL8 Refleaks PR
  • buildbot/s390x Fedora Refleaks PR
  • buildbot/s390x RHEL7 Refleaks PR
  • buildbot/s390x RHEL8 Refleaks PR

test__xxsubinterpreters and/or test_threading failed:

test__xxsubinterpreters leaked [55, 55, 55] references, sum=165
test__xxsubinterpreters leaked [31, 31, 31] memory blocks, sum=93
...
test_threading leaked [110, 110, 110] references, sum=330
test_threading leaked [62, 62, 62] memory blocks, sum=186

@iritkatriel
Copy link
Member

The same lock is used in a few other places. Maybe in one of them it's not right to make it re-entrant?

@iritkatriel
Copy link
Member

There is also another _active_limbo_lock = _allocate_lock() assignment in _after_fork:
https://github.com/python/cpython/blob/2907ca88719503493acd930790ffdb978e73dbbb/Lib/threading.py#L1562

@vstinner
Copy link
Member Author

test__xxsubinterpreters and/or test_threading failed

Ah, the https://bugs.python.org/issue42972#msg385297 bug strikes back.

I wrote #26727 to fix the leak.

@vstinner
Copy link
Member Author

There is also another _active_limbo_lock = _allocate_lock() assignment in _after_fork:

Oh, nicely stopped @iritkatriel! I will update the PR, once #26727 is merged.

@vstinner
Copy link
Member Author

Ooops sorry, the PR to fix the leak is: #26734

The threading.enumerate() function now uses a reentrant lock to
prevent a hang on reentrant call.
Issue spotted by Irit.
@vstinner
Copy link
Member Author

I fixed _after_fork() and the typo (allocate => allow), and I rebased it on top of the merged #26734 fix.

@iritkatriel: Would you mind to review the updated PR?

@iritkatriel
Copy link
Member

This comment is now obsolete:

                    # We don't call self._delete() because it also
                    # grabs _active_limbo_lock.
                    del _active[get_ident()]

@vstinner
Copy link
Member Author

This comment is now obsolete

Right @iritkatriel, I plan to write a second change only for that, but only change it in the main branch to avoid any risk of regression in stable branches. Oh I forgot to mention it here, I have a local patch for it ;-)

So @iritkatriel, does it look good to you?

Copy link
Member

@iritkatriel iritkatriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner vstinner merged commit 243fd01 into python:main Jun 15, 2021
@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.9.
🐍🍒⛏🤖

@vstinner vstinner deleted the enumerate_rlock branch June 15, 2021 14:14
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 15, 2021
The threading.enumerate() function now uses a reentrant lock to
prevent a hang on reentrant call.
(cherry picked from commit 243fd01)

Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-bot
Copy link

GH-26737 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Jun 15, 2021
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 15, 2021
The threading.enumerate() function now uses a reentrant lock to
prevent a hang on reentrant call.
(cherry picked from commit 243fd01)

Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Jun 15, 2021
@bedevere-bot
Copy link

GH-26738 is a backport of this pull request to the 3.9 branch.

@vstinner
Copy link
Member Author

Thanks everyone for your useful reviews ;-) I wasn't confident at all that this change was safe. At least, the responsibility of any possible regression is now distributed :-D

miss-islington added a commit that referenced this pull request Jun 15, 2021
The threading.enumerate() function now uses a reentrant lock to
prevent a hang on reentrant call.
(cherry picked from commit 243fd01)

Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner added a commit that referenced this pull request Jun 15, 2021
)

The threading.enumerate() function now uses a reentrant lock to
prevent a hang on reentrant call.
(cherry picked from commit 243fd01)

Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Member Author

This comment is now obsolete: (...)

@iritkatriel: I created #26741 to use again the _delete() method in _bootstrap_inner().

jdevries3133 pushed a commit to jdevries3133/cpython that referenced this pull request Jun 19, 2021
The threading.enumerate() function now uses a reentrant lock to
prevent a hang on reentrant call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants