🐛 fix(test): prevent PowerShell activation test from crashing xdist workers on Windows#3128
Draft
gaborbernat wants to merge 8 commits intopypa:mainfrom
Draft
🐛 fix(test): prevent PowerShell activation test from crashing xdist workers on Windows#3128gaborbernat wants to merge 8 commits intopypa:mainfrom
gaborbernat wants to merge 8 commits intopypa:mainfrom
Conversation
Add stdin=subprocess.DEVNULL to all Popen calls in activation tests to prevent subprocesses blocking on stdin in xdist workers. Reduce the communicate timeout from 120s to 60s so the explicit timeout handler fires before pytest-timeout (120s), avoiding a raw KeyboardInterrupt that crashes xdist workers on Windows. Add timeouts to previously unbounded communicate calls.
PowerShell hangs indefinitely on Windows Server 2025 GHA runners during activation testing. Mark as xfail(strict=False) so CI stays green while still running the test.
This reverts commit f8a8201.
auto-merge was automatically disabled
April 19, 2026 13:16
Pull request was converted to draft
Add echo markers between each test script command so timeout failures show which step was last completed. Capture and display partial output in the failure message for easier diagnosis.
Each python -c one-liner inside the activation test script was inheriting COVERAGE_PROCESS_START/COVERAGE_RUN from the parent pytest process. This caused every subprocess to start coverage measurement and attempt to write to the same SQLite coverage database on exit. On Windows, the file lock contention with the main pytest/coverage process caused subprocesses to hang indefinitely, never exiting and blocking the PowerShell script.
First PowerShell invocation on Windows 2025 GHA runners takes 47-62 seconds due to cold start (.NET JIT, AMSI scanning). Increase communicate timeout from 60s to 90s to accommodate this, and set pytest-timeout to 180s on the test to prevent the timeout race that crashed xdist workers.
This reverts commit 5171324.
First powershell.exe invocation on Windows 2025 GHA runners takes 47-62s due to .NET JIT and AMSI cold start. Add a warmup step in CI so the cost is paid outside the test timeout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The scheduled CI on
mainhas been failing consistently over the past week — 7 out of 9 failures trace back totest_powershellcrashing pytest-xdist workers onwindows-2025runners. The crash cascades: after the worker dies, remaining tests can't finish within the 30-minute CI timeout, so the entire job fails.The root cause is a timeout race. 🏁 The
communicate(timeout=120)in the activation test conftest matches the globalpytest-timeoutof 120 seconds exactly. When PowerShell hangs, the test setup consumes ~10 seconds beforecommunicate()starts, sopytest-timeoutfires first. On Windows, that plugin uses_thread.interrupt_main()to raiseKeyboardInterruptin the main thread — but when that thread is blocked in the C-levelcommunicate()call, the interrupt doesn't unwind cleanly in xdist workers, killing the worker process ("node down: Not properly terminated").The fix redirects
stdintosubprocess.DEVNULLon allPopencalls in the activation tests, preventing shells from blocking on stdin in xdist workers where stdin state is undefined. 🔧 Thecommunicate()timeout drops from 120s to 60s so the explicit timeout handler always fires beforepytest-timeout, letting the cleanprocess.kill()path run. Previously unboundedcommunicate()calls (get_version, post-kill cleanup,RaiseOnNonSourceCall) now have timeouts too.