Skip to content

fix(browse): disable parent-process watchdog in headed mode#1018

Closed
sanghyuk-seo-nexcube wants to merge 1 commit intogarrytan:mainfrom
sanghyuk-seo-nexcube:fix/headed-mode-parent-watchdog
Closed

fix(browse): disable parent-process watchdog in headed mode#1018
sanghyuk-seo-nexcube wants to merge 1 commit intogarrytan:mainfrom
sanghyuk-seo-nexcube:fix/headed-mode-parent-watchdog

Conversation

@sanghyuk-seo-nexcube
Copy link
Copy Markdown

@sanghyuk-seo-nexcube sanghyuk-seo-nexcube commented Apr 16, 2026

Problem

connect starts a headed Chromium server, prints status, and exits. The server's parent-process watchdog (server.ts:761-769, polling every 15s) detects the CLI process is gone and calls shutdown(), killing the visible browser window the user is looking at.

The connect handler at cli.ts:825-834 set BROWSE_PARENT_PID to process.pid by default. It only passed '0' through when the env var was already set externally (the if (process.env.BROWSE_PARENT_PID === '0') branch), which only happens in pair-agent scenarios. Normal /open-gstack-browser usage always inherited the CLI's own PID.

Repro

# From Claude Code or any shell
$B connect
# Browser opens, "Connected to real Chrome" prints
# Wait ~15 seconds
$B status
# Mode: launched (not headed) — server died and respawned headless
# Browser window is gone

Why this matters

Every user who runs /open-gstack-browser or $B connect sees their browser window disappear ~15 seconds after launch. The workaround (BROWSE_PARENT_PID=0 $B connect) is non-obvious and undocumented for this use case.

Change

Set BROWSE_PARENT_PID: '0' unconditionally in the connect handler's serverEnv. Removes the conditional that only passed it through from the external env.

The safety nets for headed mode are already solid:

  • browser.on('disconnected')process.exit(2) when user closes the window (browser-manager.ts:472-476)
  • SIGTERM/SIGINT handlers call shutdown() (server.ts:1228-1229)
  • disconnect command for programmatic teardown
  • Idle timeout is already skipped in headed mode (server.ts:746)

Verified

  • macOS (Apple Silicon, Darwin 25.3.0): headed browser stays alive 20+ seconds (previously died at ~15s)
  • goto and snapshot work after 20s+ delay
  • Closing browser window still terminates the server (exit 2 via disconnected event)
  • Headless mode unaffected (watchdog still active via startServer default path at cli.ts:229)

Related

PR #1012 fixes the same class of bug on the headless path. This PR fixes the headed path. They are complementary.

🤖 Generated with Claude Code

The `connect` command starts a headed Chromium server, prints status, and
exits. The server's parent-process watchdog (polling every 15s) then detects
the CLI process is gone and shuts down the server, killing the visible
browser window the user is looking at.

In headed mode the browser should stay alive until the user explicitly
closes the window or runs `disconnect`. The watchdog is only useful for
headless mode where the server should be cleaned up when the spawning
Claude Code session ends.

Set BROWSE_PARENT_PID=0 unconditionally in the connect handler's serverEnv
so the watchdog is disabled for headed mode. The existing safety nets
remain intact:
- browser.on('disconnected') → process.exit(2) when user closes the window
- SIGTERM/SIGINT handlers for explicit shutdown
- `disconnect` command for programmatic teardown
- Idle timeout is already skipped in headed mode (server.ts:746)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan
Copy link
Copy Markdown
Owner

Closing — this fix is already live on main (landed via PR #847 in v0.15.13.0). The browse server now skips idle timeout in headed mode and only activates the watchdog when BROWSE_PARENT_PID > 0. Thank you for the contribution!

@garrytan garrytan closed this Apr 16, 2026
garrytan added a commit that referenced this pull request Apr 16, 2026
The parent-process watchdog in server.ts polls the spawning CLI's PID
every 15s and self-terminates if it is gone. The connect command in
cli.ts exits with process.exit(0) immediately after launching the server,
so the watchdog would reliably kill the headed browser within ~15s.

This contradicted the idle timer's own design: server.ts:745 explicitly
skips headed mode because "the user is looking at the browser. Never
auto-die." The watchdog had no such exemption.

Two-layer fix:
1. CLI layer: connect handler always sets BROWSE_PARENT_PID=0 (was only
   pass-through for pair-agent subprocesses). The user owns the headed
   browser lifecycle; cleanup happens via browser disconnect event or
   $B disconnect.
2. CLI layer: startServer() honors caller's BROWSE_PARENT_PID=0 in the
   headless spawn path too. Lets CI, non-interactive shells, and Claude
   Code Bash calls opt into persistent servers across short-lived CLI
   invocations.
3. Server layer: defense-in-depth. Watchdog now also skips when
   BROWSE_HEADED=1, so even if a future launcher forgets PID=0, headed
   browsers won't die. Adds log lines when the watchdog is disabled
   so lifecycle debugging is easier.

Four community contributors diagnosed variants of this bug independently.
Thanks for the clear analyses and reproductions.

Closes #1020 (rocke2020)
Closes #1018 (sanghyuk-seo-nexcube)
Closes #1012 (rodbland2021)
Closes #986 (jbetala7)
Closes #1006
Closes #943

Co-Authored-By: rocke2020 <noreply@github.com>
Co-Authored-By: sanghyuk-seo-nexcube <noreply@github.com>
Co-Authored-By: rodbland2021 <noreply@github.com>
Co-Authored-By: jbetala7 <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request Apr 16, 2026
…1025)

* fix: headed browser no longer auto-shuts down after 15 seconds

The parent-process watchdog in server.ts polls the spawning CLI's PID
every 15s and self-terminates if it is gone. The connect command in
cli.ts exits with process.exit(0) immediately after launching the server,
so the watchdog would reliably kill the headed browser within ~15s.

This contradicted the idle timer's own design: server.ts:745 explicitly
skips headed mode because "the user is looking at the browser. Never
auto-die." The watchdog had no such exemption.

Two-layer fix:
1. CLI layer: connect handler always sets BROWSE_PARENT_PID=0 (was only
   pass-through for pair-agent subprocesses). The user owns the headed
   browser lifecycle; cleanup happens via browser disconnect event or
   $B disconnect.
2. CLI layer: startServer() honors caller's BROWSE_PARENT_PID=0 in the
   headless spawn path too. Lets CI, non-interactive shells, and Claude
   Code Bash calls opt into persistent servers across short-lived CLI
   invocations.
3. Server layer: defense-in-depth. Watchdog now also skips when
   BROWSE_HEADED=1, so even if a future launcher forgets PID=0, headed
   browsers won't die. Adds log lines when the watchdog is disabled
   so lifecycle debugging is easier.

Four community contributors diagnosed variants of this bug independently.
Thanks for the clear analyses and reproductions.

Closes #1020 (rocke2020)
Closes #1018 (sanghyuk-seo-nexcube)
Closes #1012 (rodbland2021)
Closes #986 (jbetala7)
Closes #1006
Closes #943

Co-Authored-By: rocke2020 <noreply@github.com>
Co-Authored-By: sanghyuk-seo-nexcube <noreply@github.com>
Co-Authored-By: rodbland2021 <noreply@github.com>
Co-Authored-By: jbetala7 <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: disconnect handler runs full cleanup before exiting

When the user closed the headed browser window, the disconnect handler
in browser-manager.ts called process.exit(2) directly, bypassing the
server's shutdown() function entirely. That meant:

- sidebar-agent daemon kept polling a dead server
- session state wasn't saved
- Chromium profile locks (SingletonLock, SingletonSocket, SingletonCookie)
  weren't cleaned — causing "profile in use" errors on next $B connect
- state file at .gstack/browse.json was left stale

Now the disconnect handler calls onDisconnect(), which server.ts wires
up to shutdown(2). Full cleanup runs first, then the process exits with
code 2 — preserving the existing semantic that distinguishes user-close
(exit 2) from crashes (exit 1).

shutdown() now accepts an optional exitCode parameter (default 0) so
the SIGTERM/SIGINT paths and the disconnect path can share cleanup code
while preserving their distinct exit codes.

Surfaced by Codex during /plan-eng-review of the watchdog fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pre-existing test flakiness in relink.test.ts

The 23 tests in this file all shell out to gstack-config + gstack-relink
(bash scripts doing subprocess work). Under parallel bun test load, those
subprocess spawns contend with other test suites and each test can drift
~200ms past Bun's 5s default timeout, causing 5+ flaky timeouts per run
in the gate-tier ship gate.

Wrap the `test` import to default the per-test timeout to 15s. Explicit
per-test timeouts (third arg) still win, so individual tests can lower
it if needed. No behavior change — only gives subprocess-heavy tests
more headroom under parallel load.

Noticed by /ship pre-flight test run. Unrelated to the main PR fix but
blocking the gate, so fixing as a separate commit per the test ownership
protocol.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: SIGTERM/SIGINT shutdown exit code regression

Node's signal listeners receive the signal name ('SIGTERM' / 'SIGINT')
as the first argument. When shutdown() started accepting an optional
exitCode parameter in the prior disconnect-cleanup commit, the bare
`process.on('SIGTERM', shutdown)` registration started silently calling
shutdown('SIGTERM'). The string passed through to process.exit(), Node
coerced it to NaN, and the process exited with code 1 instead of 0.

Wrap both listeners so they call shutdown() with no args — signal name
never leaks into the exitCode slot. Surfaced by /ship's adversarial
subagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: onDisconnect async rejection leaves process running

The disconnect handler calls this.onDisconnect() without awaiting it,
but server.ts wires the callback to shutdown(2) — which is async. If
that promise rejects, the rejection drops on the floor as an unhandled
rejection, the browser is already disconnected, and the server keeps
running indefinitely with no browser attached.

Add a sync try/catch for throws and a .catch() chain for promise
rejections. Both fall back to process.exit(2) so a dead browser never
leaves a live server. Also widen the callback type from `() => void`
to `() => void | Promise<void>` to match the actual runtime shape of
the wired shutdown(2) call.

Surfaced by /ship's adversarial subagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: honor BROWSE_PARENT_PID=0 with trailing whitespace

The strict string compare `process.env.BROWSE_PARENT_PID === '0'` meant
any stray newline or whitespace (common from shell `export` in a pipe or
heredoc) would fail the check and re-enable the watchdog against the
caller's intent.

Switch to parseInt + === 0, matching the server's own parseInt at
server.ts:760. Handles '0', '0\n', ' 0 ', and unset correctly; non-numeric
values (parseInt returns NaN, NaN === 0 is false) fail safe — watchdog
stays active, which is the safe default for unexpected input.

Surfaced by /ship's adversarial subagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: preserve bun:test sub-APIs in relink test wrapper

The previous commit wrapped bun:test's `test` to bump the per-test
timeout default to 15s but cast the wrapper `as typeof _bunTest`
without copying the sub-properties (`.only`, `.skip`, `.each`,
`.todo`, `.failing`, `.if`) from the original. The cast was a lie:
the wrapper was a plain function, not the full callable with those
chained properties attached.

The file doesn't use any of them today, but a future test.only or
test.skip would fail with a cryptic "undefined is not a function."
Object.assign the original _bunTest's properties onto the wrapper so
sub-APIs chain correctly forever.

Surfaced by /ship's adversarial subagent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.18.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression tests for parent-process watchdog

End-to-end tests in browse/test/watchdog.test.ts that prove the three
invariants v0.18.1.0 depends on. Each test spawns the real server.ts
(not a mock), so any future change that breaks the watchdog logic fails
here — the thing /ship's adversarial review flagged as missing.

1. BROWSE_PARENT_PID=0 disables the watchdog
   Spawns server with PID=0, reads stdout, confirms the
   "watchdog disabled (BROWSE_PARENT_PID=0)" log line appears and
   "Parent process ... exited" does NOT. ~2s.

2. BROWSE_HEADED=1 disables the watchdog (server-side guard)
   Spawns server with BROWSE_HEADED=1 and a bogus parent PID (999999).
   Proves BROWSE_HEADED takes precedence over a present PID — if the
   server-side defense-in-depth regresses, the watchdog would try to
   poll 999999 and fire on the "dead parent." ~2s.

3. Default headless mode: watchdog fires when parent dies
   The regression guard for the original orphan-prevention behavior.
   Spawns a real `sleep 60` parent and a server watching its PID, then
   kills the parent and waits up to 25s for the server to exit. The
   watchdog polls every 15s so first tick is 0-15s after death, plus
   shutdown() cleanup. ~18s.

Total runtime: ~21s for all 3 tests. They catch the class of bug this
branch exists to fix: "does the process live or die when it should?"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: rocke2020 <noreply@github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants