New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] KeyboardInterrupt during Thread.join hangs that Thread #66021
Comments
I am attempting to join a thread after a previous join was interrupted by Ctrl-C. I did not find any warning for this case in threading module docs, so I assume this is legal. The full test script is attached, but the essential code is: def work(t):
sleep(t)
twork = 3; twait = 4
t = Thread(target=work, args=(twork,))
try:
t.start()
t.join(twait) # here I do Ctrl-C
except KeyboardInterrupt:
pass
t.join() # this hangs if twork < twait I can observe the following reproduce sequence:
I tried replacing try-except clause with custom signal handler for SIGINT, as shown in the script. If the handler does not raise an exception, the thread can be normally joined. If it does, however, the behavior is the same as with default handler. My _guess_ is that the exception prevents some finishing code that puts Thread into proper stopped state after its target completes. Running Python 3.4.0 on Windows 7 x64 |
This works for me under Linux. |
I've confirmed this issue on Windows. Attached is an SSCCE which seems to reliably reproduce the issue in a Windows environment. Long story short: if a KeyboardInterrupt occurs during Thread.join(), there's a good chance that Thread._is_stopped never gets set. |
Thanks for the update. Since I still can't reproduce on Linux, perhaps you (or one of our resident Windows experts) can try to propose a patch fixing the issue? |
Not sure if I'll do the full fix (need to check w/ my employer), but I'm doing some investigation. Here's what I know so far: At the Python level, the KeyboardInterrupt is being raised within _wait_for_tstate_lock, on "elif lock.acquire(block, timeout)". Going into the C code, it looks like this goes through lock_PyThread_acquire_lock -> acquire_timed -> PyThread_acquire_lock_timed. acquire_timed . lock_PyThread_acquire_lock will abort the lock if it receives PY_LOCK_INTR from acquire_timed. My best guess right now is that PyThread_acquire_lock_timed never returns PY_LOCK_INTR. Indeed, I see this comment at the top of the NT version of that function:
And indeed, the thread_pthread.h implementations both have a path for returning PY_LOCK_INTR, while the thread_nt.h version does not. ...And that's where I am thus far. |
For POSIX systems, try the following test function several times. For the bug to manifest, Thread._wait_for_tstate_lock has to be interrupted in between acquiring and releasing the sentinel lock. Maybe it could use a reentrant lock in order to avoid this problem. import os
import signal
import threading
import time
def raise_sigint():
print('raising SIGINT')
time.sleep(0.5)
if os.name == 'nt':
os.kill(0, signal.CTRL_C_EVENT)
else:
os.kill(os.getpid(), signal.SIGINT)
print('finishing')
def test(f=raise_sigint):
global g_sigint
g_sigint = threading.Thread(target=f, name='SIGINT')
g_sigint.start()
print('waiting')
for i in range(100):
try:
if not g_sigint.is_alive():
break
g_sigint.join(0.01)
except KeyboardInterrupt:
print('KeyboardInterrupt ignored')
print('g_sigint is alive:', g_sigint.is_alive()) POSIX-only code normally wouldn't join() with a small timeout in a loop as in the above example. This is a workaround for the problem that's demonstrated in msg221180, in which a signal doesn't interrupt a wait on the main thread. Other than time.sleep(), most waits in Windows CPython have not been updated to include the SIGINT Event object when waiting in the main thread. (It's possible to fix some cases such as waiting on locks, but only as long as the implementation continues to use semaphores and kernel waits instead of native condition variables with SRW locks.) |
Focusing on the Windows case specifically... One way to possibly make this work (although perhaps not as clean as may be desired) would be to add polling logic into the thread_nt.h version of PyThread_acquire_lock_timed. That would have the benefit of avoiding the complexity of the various "non recursive mutex" implementations (i.e. semaphores vs "emulated" condition variables vs native condition variables) and may be less code than setting up "alertable" WaitForObjectSignleObjectEx calls (plus whatever else needs to be done for the SRW-lock-based code path). Thoughts or feedback? (I've not done any mainline Python commits yet so I'm totally ready to be completely wrong or misguided here.) |
I think adding polling to such a widely-used primitive is out of question. |
I'm guessing this is because of the perceived performance impact? (That's the main thought I have against polling in this case.) Or is it something else? I can certainly look at tweaking the 3 mutex implementations I mentioned previously, but I do expect that will be a bit more code; I at least wanted to put the "simpler" idea out there. Fully knowing that "simpler" isn't necessarily better. |
The problem is polling is pretty detrimental to power saving on modern CPUs. There might also be a performance impact that makes things worse :-) More generally, we really would like locks to be interruptible under Windows, as it would be useful in many more situations than joining threads. Unfortunately, as you have discovered we have several lock implementations for Windows right now. The SRWLock one is probably difficult to make interruptible, but it's never enabled by default. The Semaphore one looks legacy, so could perhaps be removed. Remains the condition variable-based implementation. |
Ahh, thanks for the explanation. I didn't think about that. Let's *not* do that then. :) I'll see if I can squeeze in some time to play with the alternatives. |
Okay, I think I need to abandon my research into this. This does seem to have quite an amount of complexity to it and is probably more than I should be taking on at this point in time. Anyone else who wants to look at this, consider it fair game. Parting thoughts based on my limited expertise in the area, take them or ignore them:
Hopefully the above notes are of some value. |
multiprocessing semaphores support Ctrl-C under Windows, so it should be doable for regular locks as well (notice the |
Good point, I forgot about WaitForMultipleObjectsEx; something like that seems like it would be much simpler for the first 2 cases. |
I think I'm hitting this with subprocesses inside tox (parallel feature), any plans to fix this? |
I am having the same blocked signal issue on Windows when using Thread.join. This program does not print "interrupted" after pressing Ctrl+C: import threading
import time
def f():
while True:
print("processing")
time.sleep(1)
if __name__ == "__main__":
try:
thread = threading.Thread(target=f)
thread.start()
thread.join()
except KeyboardInterrupt:
print("interrupted") For reference, 2 years ago Nathaniel Smith gave an interesting explanation here: |
This is a different issue: bpo-29971. Currently, threading.Lock.acquire() cannot be interrupted by CTRL+C. |
I mark this issue as a duplicate of bpo-45274. -- I fixed bpo-45274 with this change: New changeset a22be49 by Victor Stinner in branch 'main': I tested join.py with the fix. It nows displays: The script no longer hangs. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: