fix(browserstack-service): flush build-stop event on process kill (SDK-6050)#2
Open
harshit-browserstack wants to merge 1 commit into
Open
fix(browserstack-service): flush build-stop event on process kill (SDK-6050)#2harshit-browserstack wants to merge 1 commit into
harshit-browserstack wants to merge 1 commit into
Conversation
…K-6050)
When the WDIO process is killed mid-run (Ctrl+C / SIGINT / SIGTERM), the
service previously relied on a synchronous process.on('exit') handler that
cannot await the async build-stop call. As a result, TestHub never received
a terminal event for the build and waited out its inactivity timeout,
leaving the build in "Unknown Test Status" / TEST_TIMEOUT_WITH_BUILD_SUCCESS
for hours.
This change mirrors browserstack-node-agent's intExitHandler /
TestHubHandler.stop(signal) pattern:
- Register async handlers for SIGINT, SIGTERM, SIGHUP, SIGABRT, SIGQUIT
(and SIGBREAK on Windows). Each awaits the existing build-stop call
with a bounded 10-second grace, then exits with the POSIX code (128 +
signum). Idempotent: first signal wins; subsequent signals are ignored.
- Snapshot and remove pre-existing listeners for those signals at install
time, so create-wdio's synchronous process.exit hook and async-exit-hook
(registered transitively by @wdio/cli) cannot kill our async cleanup
mid-await.
- Thread the signal through BrowserstackCLI.stop(signal) and
GrpcClient.stopBinSession(signal) so the StopBinSessionRequest carries
{ exitSignal, exitReason: 'user_killed', exitCode: 1 }. The binary's
onStop reads these and forwards them to the TestHub stop API as
finished_metadata: [{ reason: 'user_killed', signal }] -- matching the
shape node-sdk and java-agent already send.
- For Direct (non-CLI) mode, stopBuildUpstream(signal) adds the same
finished_metadata payload directly to the PUT body.
Verified end-to-end with a 6-cell matrix (Direct + CLI x normal/SIGINT/
SIGTERM). Kill cells now emit the stop tokens, return POSIX exit codes
(130 for SIGINT, 143 for SIGTERM), and TestHub receives the build-stop
PUT with finished_metadata.{reason: 'user_killed', signal} as confirmed
by binary CLI logs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Fixes SDK-6050 — WDIO builds left in "Unknown Test Status" /
TEST_TIMEOUT_WITH_BUILD_SUCCESSwhen the test process is killed mid-run (Ctrl+C / SIGINT / SIGTERM).Root cause. The service registered only a synchronous
process.on('exit', ...)handler, which Node.js restricts to sync work — it cannot await the async build-stop call. On signal, the WDIO parent exited before any stop event reached TestHub, so the backend waited out its inactivity timeout. The gracefulonComplete()path always worked; the signal path was missing.What changes. Mirrors
browserstack-node-agent'sBrowserStackSetup.intExitHandler+TestHubHandler.stop(signal)pattern:SIGINT,SIGTERM,SIGHUP,SIGABRT,SIGQUIT(+SIGBREAKon Windows). Each awaits the existing build-stop call with a bounded 10-second grace period, then exits with POSIX128 + signum. Idempotent — first signal wins, subsequent signals ignored.create-wdio's synchronousprocess.exithook (packages/create-wdio/src/utils.ts:29) andasync-exit-hook's SIGINT hook (registered transitively by@wdio/cli) cannot kill our async cleanup mid-await.signalthroughBrowserstackCLI.stop(signal)→GrpcClient.stopBinSession(signal). TheStopBinSessionRequestnow carries{ exitSignal, exitReason: 'user_killed', exitCode: 1 }. The binary'sonStopreads these and forwards them to the TestHub stop API asfinished_metadata: [{ reason: 'user_killed', signal }]— matching the shapenode-sdkandjava-agentalready send.stopBuildUpstream(signal)adds the samefinished_metadatapayload directly to the PUT body.Verification. 6-cell test matrix (Direct + CLI/BinFlow × normal / SIGINT / SIGTERM). Pre-fix kill cells: silent build, Automate
status=timeout. Post-fix: build-stop PUT confirmed via binary CLI log withfinished_metadata.{reason: 'user_killed', signal: 'SIGINT'|'SIGTERM'}, POSIX exit codes (130 for SIGINT, 143 for SIGTERM),status=skippedon TestHub.Types of changes
Checklist
Backport Request
v9and doesn't need to be back-ported#XXXXXFurther comments
The TestHub dashboard renders build state from
status_stats.in_progress, decremented by per-testTestRunFinishedevents. When a test process is killed mid-test, the framework (Mocha/Playwright/etc.) cannot fire itsafterTesthook, so noTestRunFinishedis emitted for the in-flight test. That state must be reaped on the TestHub backend when it processes astopBuildwithfinished_metadata.reason='user_killed'. The SDK layer's contract ends at sending the signal; reapingin_progresstest_runs is backend responsibility. This PR honors that contract and does not over-reach into per-test enumeration (matching node-sdk and java-agent behavior).Reviewers: @webdriverio/project-committers