Race condition in Thread._wait_for_tstate_lock() #89437
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = None closed_at = <Date 2021-09-27.12:53:47.975> created_at = <Date 2021-09-23.20:13:09.454> labels = ['library', '3.9', '3.10', '3.11'] title = 'Race condition in Thread._wait_for_tstate_lock()' updated_at = <Date 2022-03-12.00:31:02.958> user = 'https://github.com/vstinner'
activity = <Date 2022-03-12.00:31:02.958> actor = 'vstinner' assignee = 'none' closed = True closed_date = <Date 2021-09-27.12:53:47.975> closer = 'vstinner' components = ['Library (Lib)'] creation = <Date 2021-09-23.20:13:09.454> creator = 'vstinner' dependencies =  files = ['50299', '50300'] hgrepos =  issue_num = 45274 keywords = ['patch'] message_count = 12.0 messages = ['402523', '402524', '402532', '402535', '402540', '402542', '402597', '402704', '402706', '402707', '402710', '402718'] nosy_count = 7.0 nosy_names = ['serhiy.storchaka', 'eryksun', 'pablogsal', 'maggyero', 'miss-islington', 'bjs', 'gaborjbernat'] pr_nums = ['28532', '28579', '28580', '31290'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue45274' versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']
The text was updated successfully, but these errors were encountered:
Bernát Gábor found an interesting bug on Windows. Sometimes, when a process is interrupted on Windows with CTRL+C, its parent process hangs in thread.join():
I tried to attach the Python process in Python: there is a single thread, the main thread which is blocked in thread.join(). You can also see it in the faulthandler traceback.
I did a long analysis of the _tstate_lock and I checked that thread really completed. Raw debug traces:
== thread 6200 exit ==
thread_run[pid=3984, thread_id=6200]: clear
== main thread is calling join() but gets a KeyboardInterrupt ==
[pid=3984, thread_id=8000] Lock<obj=000001C1122669C0>.acquire() -> ACQUIRED
== main thread calls repr(thread) ==
ROOT:  KeyboardInterrupt - teardown started
== main thread calls thread.join()... which hangs ==
_wait_for_tstate_lock[pid=3984, current thread_id=8000, self thread_id=6200]: acquire(block=True, timeout=-1): lock obj= 0x1c1122669c0
def _wait_for_tstate_lock(self, block=True, timeout=-1): lock = self._tstate_lock if lock is None: assert self._is_stopped elif lock.acquire(block, timeout): # -- got KeyboardInterrupt here --- lock.release() self._stop()
You can reproduce the issue on Linux with attached patch and script:
$ git apply threading_bug.patch $ ./python threading_bug.py join... join failed with: KeyboardInterrupt() join again...
I'm now working on a PR to fix the race condition.
I am not sure that it can be solved at Python level.
if lock._blink(block, timeout): self._stop()
with suppress_interrupt(): if not lock._blink(block, timeout): return self._stop()
Right. In pure Python, we cannot write code which works in all cases. My PR 28532 fix the most common case: application interrupted by a single CTRL+C.
It's important to be able to interrupt acquire() which can be called in blocking mode with no timeout: it's exactly what tox does, and users expect to be able to interrupt tox in this case.
The acquire()+release() sequence can be made atomic in C, but it doesn't solve the problem of _stop() which can be interrupted by a second exception.
This bug is likely as old as Python. I don't think that we should attempt to design a perfect solution. I only propose to make the race condition (way) less likely.
runcauses briefcase to hang beeware/briefcase#809