Skip to content

fix(browserstack-service): flush build-stop event on process kill (SDK-6050)#2

Open
harshit-browserstack wants to merge 1 commit into
mainfrom
sdk-6050-build-stop-on-signal
Open

fix(browserstack-service): flush build-stop event on process kill (SDK-6050)#2
harshit-browserstack wants to merge 1 commit into
mainfrom
sdk-6050-build-stop-on-signal

Conversation

@harshit-browserstack
Copy link
Copy Markdown
Owner

Proposed changes

Fixes SDK-6050 — WDIO builds left in "Unknown Test Status" / TEST_TIMEOUT_WITH_BUILD_SUCCESS when the test process is killed mid-run (Ctrl+C / SIGINT / SIGTERM).

Root cause. The service registered only a synchronous process.on('exit', ...) handler, which Node.js restricts to sync work — it cannot await the async build-stop call. On signal, the WDIO parent exited before any stop event reached TestHub, so the backend waited out its inactivity timeout. The graceful onComplete() path always worked; the signal path was missing.

What changes. Mirrors browserstack-node-agent's BrowserStackSetup.intExitHandler + TestHubHandler.stop(signal) pattern:

  1. Async signal handlers for SIGINT, SIGTERM, SIGHUP, SIGABRT, SIGQUIT (+ SIGBREAK on Windows). Each awaits the existing build-stop call with a bounded 10-second grace period, then exits with POSIX 128 + signum. Idempotent — first signal wins, subsequent signals ignored.
  2. At install time, snapshot and remove pre-existing listeners for those signals so create-wdio's synchronous process.exit hook (packages/create-wdio/src/utils.ts:29) and async-exit-hook's SIGINT hook (registered transitively by @wdio/cli) cannot kill our async cleanup mid-await.
  3. Thread signal through BrowserstackCLI.stop(signal)GrpcClient.stopBinSession(signal). The StopBinSessionRequest now carries { exitSignal, exitReason: 'user_killed', exitCode: 1 }. The binary's onStop reads these and forwards them to the TestHub stop API as finished_metadata: [{ reason: 'user_killed', signal }] — matching the shape node-sdk and java-agent already send.
  4. For Direct (non-CLI) mode, stopBuildUpstream(signal) adds the same finished_metadata payload directly to the PUT body.

Verification. 6-cell test matrix (Direct + CLI/BinFlow × normal / SIGINT / SIGTERM). Pre-fix kill cells: silent build, Automate status=timeout. Post-fix: build-stop PUT confirmed via binary CLI log with finished_metadata.{reason: 'user_killed', signal: 'SIGINT'|'SIGTERM'}, POSIX exit codes (130 for SIGINT, 143 for SIGTERM), status=skipped on TestHub.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • Polish
  • New feature
  • Breaking change
  • Documentation update
  • Specification changes
  • Internal updates

Checklist

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have added the necessary documentation (if appropriate)
  • I have added proper type definitions for new commands (if appropriate)

Backport Request

  • This change is solely for v9 and doesn't need to be back-ported
  • Back-ported PR at #XXXXX

Further comments

The TestHub dashboard renders build state from status_stats.in_progress, decremented by per-test TestRunFinished events. When a test process is killed mid-test, the framework (Mocha/Playwright/etc.) cannot fire its afterTest hook, so no TestRunFinished is emitted for the in-flight test. That state must be reaped on the TestHub backend when it processes a stopBuild with finished_metadata.reason='user_killed'. The SDK layer's contract ends at sending the signal; reaping in_progress test_runs is backend responsibility. This PR honors that contract and does not over-reach into per-test enumeration (matching node-sdk and java-agent behavior).

Reviewers: @webdriverio/project-committers

…K-6050)

When the WDIO process is killed mid-run (Ctrl+C / SIGINT / SIGTERM), the
service previously relied on a synchronous process.on('exit') handler that
cannot await the async build-stop call. As a result, TestHub never received
a terminal event for the build and waited out its inactivity timeout,
leaving the build in "Unknown Test Status" / TEST_TIMEOUT_WITH_BUILD_SUCCESS
for hours.

This change mirrors browserstack-node-agent's intExitHandler /
TestHubHandler.stop(signal) pattern:

- Register async handlers for SIGINT, SIGTERM, SIGHUP, SIGABRT, SIGQUIT
  (and SIGBREAK on Windows). Each awaits the existing build-stop call
  with a bounded 10-second grace, then exits with the POSIX code (128 +
  signum). Idempotent: first signal wins; subsequent signals are ignored.

- Snapshot and remove pre-existing listeners for those signals at install
  time, so create-wdio's synchronous process.exit hook and async-exit-hook
  (registered transitively by @wdio/cli) cannot kill our async cleanup
  mid-await.

- Thread the signal through BrowserstackCLI.stop(signal) and
  GrpcClient.stopBinSession(signal) so the StopBinSessionRequest carries
  { exitSignal, exitReason: 'user_killed', exitCode: 1 }. The binary's
  onStop reads these and forwards them to the TestHub stop API as
  finished_metadata: [{ reason: 'user_killed', signal }] -- matching the
  shape node-sdk and java-agent already send.

- For Direct (non-CLI) mode, stopBuildUpstream(signal) adds the same
  finished_metadata payload directly to the PUT body.

Verified end-to-end with a 6-cell matrix (Direct + CLI x normal/SIGINT/
SIGTERM). Kill cells now emit the stop tokens, return POSIX exit codes
(130 for SIGINT, 143 for SIGTERM), and TestHub receives the build-stop
PUT with finished_metadata.{reason: 'user_killed', signal} as confirmed
by binary CLI logs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant