Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_deadlock & test_threads sometimes deadlock on Windows refleaks buildbot #114440

Closed
encukou opened this issue Jan 22, 2024 · 4 comments
Closed
Assignees
Labels
3.12 bugs and security fixes OS-windows tests Tests in the Lib/test dir

Comments

@encukou
Copy link
Member

encukou commented Jan 22, 2024

The “AMD64 Windows11 Refleaks 3.12” sometimes deadlocks with:

3:58:18 load avg: 0.07 running (1): test.test_concurrent_futures.test_deadlock (2 hour 34 min)
3:58:48 load avg: 0.05 running (1): test.test_concurrent_futures.test_deadlock (2 hour 35 min)
3:59:18 load avg: 0.03 running (1): test.test_concurrent_futures.test_deadlock (2 hour 35 min)
3:59:48 load avg: 0.02 running (1): test.test_concurrent_futures.test_deadlock (2 hour 36 min)
command timed out: 14400 seconds elapsed running [b'Tools\\buildbot\\test.bat', b'-p', b'x64', b'-j2', b'-R', b'3:3', b'-u-cpu', b'-j2', b'--timeout', b'12000'], attempting to kill
4:00:18 load avg: 0.03 running (1): test.test_concurrent_futures.test_deadlock (2 hour 36 min)
program finished with exit code 1
elapsedTime=14430.167703

On some builds, a different test fails:

3:58:59 load avg: 0.03 running (1): test.test_multiprocessing_spawn.test_threads (1 hour 43 min)
3:59:29 load avg: 0.06 running (1): test.test_multiprocessing_spawn.test_threads (1 hour 44 min)
command timed out: 14400 seconds elapsed running [b'Tools\\buildbot\\test.bat', b'-p', b'x64', b'-j2', b'-R', b'3:3', b'-u-cpu', b'-j2', b'--timeout', b'12000'], attempting to kill
3:59:59 load avg: 0.04 running (1): test.test_multiprocessing_spawn.test_threads (1 hour 44 min)
program finished with exit code 1
elapsedTime=14430.167536

I am able to reproduce locally, by running these two (or this and test_concurrent_futures.test_shutdown) in parallel. The deadlock occurs in test_crash_big_data.

Linked PRs

@encukou encukou self-assigned this Jan 22, 2024
@encukou encukou added the 3.12 bugs and security fixes label Jan 23, 2024
@erlend-aasland erlend-aasland added tests Tests in the Lib/test dir OS-windows labels Jan 23, 2024
encukou added a commit to encukou/cpython that referenced this issue Jan 23, 2024
…ot concurrent.futures

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by:	Serhiy Storchaka <storchaka@gmail.com>
@encukou
Copy link
Member Author

encukou commented Jan 23, 2024

In the 3.12 backports for gh-109047, gh-109370, and gh-107219, a backporting/rebasing error snuck in:

The backports to 3.12 were applied in a different order (and with different bases), leaving the Windows-specific close call in concurrent.futures. Moving it to multiprocessing.Queue fixes test_concurrent_futures.test_deadlock for me.

encukou added a commit that referenced this issue Jan 24, 2024
…current.futures (GH-114489)

This was left out of the 3.12 backport for three related issues:
- gh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- gh-109370 (which changes this to be only called on Windows)
- gh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@encukou
Copy link
Member Author

encukou commented Jan 24, 2024

Thanks Victor & Serhiy!

I'll close the issue but continue monitoring buildbots.

@encukou encukou closed this as completed Jan 24, 2024
@encukou
Copy link
Member Author

encukou commented Jan 26, 2024

Victor pointed out that this might need a backport to 3.11.
( I haven't seen it on the 3.11 buildbots, and didn't try to reproduce there.)

@encukou encukou reopened this Jan 26, 2024
@encukou
Copy link
Member Author

encukou commented Jan 29, 2024

I cannot reproduce on 3.11. (I am using a simplified reproducer. But I also haven't seen this fail on a 3.11 buildbot yet.)
If it comes up in the future, #109047 (356de02) probably needs to be backported before this.

@encukou encukou closed this as completed Jan 29, 2024
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 19, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 21, 2024
…ot concurrent.futures (pythonGH-114489)

This was left out of the 3.12 backport for three related issues:
- pythongh-107219 (which adds `self.call_queue._writer.close()` to `_ExecutorManagerThread` in `concurrent.futures`)
- pythongh-109370 (which changes this to be only called on Windows)
- pythongh-109047 (which moves the call to `multiprocessing.Queue`'s `_terminate_broken`)

Without this change, ProcessPoolExecutor sometimes hangs on Windows
when a worker process is terminated.

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes OS-windows tests Tests in the Lib/test dir
Projects
None yet
Development

No branches or pull requests

2 participants