Skip to content

fix(windows): shared process.IsAlive for forge-cli and forge-ui (closes #59)#60

Merged
initializ-mk merged 1 commit into
mainfrom
bug/windows-pid-alive
May 18, 2026
Merged

fix(windows): shared process.IsAlive for forge-cli and forge-ui (closes #59)#60
initializ-mk merged 1 commit into
mainfrom
bug/windows-pid-alive

Conversation

@initializ-mk
Copy link
Copy Markdown
Contributor

Summary

Fixes #59forge ui on Windows reports agent failed to start for healthy daemons because forge-ui/process.go's pidAlive used os.Process.Signal(syscall.Signal(0)), which is Unix-only. On Windows, Go's stdlib only translates os.Interrupt and os.Kill; every other signal — including Signal(0) — returns "operating system does not support signal". So pidAlive returned false regardless of whether the PID was alive, breaking both startup detection and discovery on Windows.

Symptom chain

Step macOS / Linux Windows
1. UI spawns forge serve start OK OK — daemon detaches and starts
2. waitForPort reads PID from .forge/serve.json OK OK
3. pidAlive(pid) checks the child is still alive Signal(0) succeeds → true Signal(0) errors → false
4. waitForPort outcome Loops until TCP port responds → true Fast-fails at step 3, returns false
5. UI reaction Reports success Reports failure, calls readServeLogs
6. By this time serve.log contains the daemon's normal startup banner n/a UI displays the banner as "the error"

forge-ui/discovery.go:133 had the same broken check — on Windows it deleted .forge/serve.json on every page-load discovery scan, orphaning still-running daemons from the dashboard.

The forge-cli daemon code already had the correct platform-aware variant (forge-cli/cmd/serve_windows.go's isProcessAlive uses OpenProcess + CloseHandle). The duplication is exactly what caused the forge-ui side to miss the Windows-specific implementation.

Approach — Option A from #59

Extract the platform-aware check into a single helper consumed by both forge-cli and forge-ui. Build-tag split lives in exactly one place.

New package: forge-core/util/process/

process_unix.go     //go:build !windows  — FindProcess + Signal(0)
process_windows.go  //go:build windows   — OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION) + CloseHandle
process_test.go     — cross-platform tests

Call sites updated:

  • forge-cli/cmd/serve_{unix,windows}.go — deleted isProcessAlive (two duplicates removed).
  • forge-cli/cmd/serve.go — 2 call sites switched to process.IsAlive.
  • forge-ui/process.go — deleted pidAlive; waitForPort now calls process.IsAlive. Dropped the now-unused syscall import.
  • forge-ui/discovery.go — uses process.IsAlive (this was the actively destructive serve.json-deletion site).

There is now exactly one implementation of "is this PID alive" in the tree.

Tests

forge-core/util/process/process_test.go:

  • TestIsAlive_SelfIsAlive(os.Getpid()) must be true.
  • TestIsAlive_ChildAfterExit — spawn a short-lived subprocess (true on Unix, cmd /c exit 0 on Windows), wait for it to exit, assert IsAlive(pid) returns false. Subprocess-based assertion is deterministic across platforms — arbitrary high PIDs are not.

Verification

gofmt -l forge-core/ forge-cli/ forge-plugins/ forge-ui/                      # clean
golangci-lint run ./{forge-core,forge-cli,forge-plugins,forge-ui}/...         # 0 issues
GOOS=windows go build ./...   (forge-core, forge-cli, forge-ui)               # all succeed
go test ./...   (forge-core, forge-cli, forge-ui)                             # all pass

CI doesn't currently run Windows; the GOOS=windows go build step at minimum proves the build-tag split is correct and the package compiles for Windows. The runtime behavior is covered by the OpenProcess + CloseHandle pattern that forge-cli/cmd/serve_windows.go already shipped — the helper just consolidates it.

Acceptance criteria from #59

  • On Windows, the helper returns true for live PIDs and false for exited PIDs (proven by the build-tag-split implementation and the forge-cli daemon code that has shipped using this pattern).
  • forge ui no longer fast-fails startup on Windows when the daemon is healthy.
  • forge ui no longer deletes .forge/serve.json for still-running daemons on Windows.
  • macOS / Linux behavior is unchanged — Signal(0) is preserved on Unix.
  • Exactly one implementation of "is this PID alive" in the repo, consumed by both forge-cli and forge-ui.

Manual verification

  • On Windows: forge ui, start an agent. Confirm the dashboard shows the running state, not "agent failed to start".
  • On Windows: refresh the dashboard while an agent is running. Confirm .forge/serve.json is still there and the dashboard still sees the agent.
  • On macOS / Linux: same flows, confirm no regression.

Files changed

forge-core/util/process/process_unix.go     | new (30 lines)
forge-core/util/process/process_windows.go  | new (39 lines)
forge-core/util/process/process_test.go     | new (45 lines)
forge-cli/cmd/serve.go                      | 2 call-site swaps + 1 import
forge-cli/cmd/serve_unix.go                 | -8 lines (delete duplicate)
forge-cli/cmd/serve_windows.go              | -10 lines (delete duplicate)
forge-ui/process.go                         | -10 lines duplicate, +1 import, 1 call-site swap
forge-ui/discovery.go                       | +1 import, 1 call-site swap

Total: +122 / -32. Net new code is in the helper package and its tests; net removed code is the duplicates.

#59)

forge-ui's pidAlive used os.Process.Signal(syscall.Signal(0)) — the
standard Unix idiom for "is this PID alive?". On Windows, Go's stdlib
only translates os.Interrupt and os.Kill to platform calls; any other
signal returns "operating system does not support signal". So
pidAlive returned false on Windows regardless of whether the PID was
alive, breaking `forge ui` startup detection:

- waitForPort (forge-ui/process.go) fast-fails at the pidAlive check
  instead of looping until the TCP port responds. The UI shows
  "agent failed to start" and surfaces the daemon's normal startup
  banner (which by then is in serve.log) as the supposed error.
- Discovery (forge-ui/discovery.go:133, now 134) thinks the daemon
  is dead and deletes .forge/serve.json — actively orphaning a
  still-running daemon from the dashboard on every page load.

forge-cli already knew about this — serve_windows.go had a working
OpenProcess-based isProcessAlive. The duplication is what caused the
forge-ui side to miss it.

Option A from the issue: extract a single platform-split helper in
forge-core/util/process and delete the duplicates.

- forge-core/util/process/process_{unix,windows}.go — IsAlive(pid).
  Unix: FindProcess + Signal(0). Windows: OpenProcess with
  PROCESS_QUERY_LIMITED_INFORMATION + CloseHandle.
- forge-core/util/process/process_test.go — cross-platform tests:
  IsAlive(self) is true; IsAlive(spawned-then-waited subprocess pid)
  is false. The subprocess approach is deterministic on both Unix
  and Windows; picking arbitrary high PIDs is not.
- forge-cli/cmd/serve_{unix,windows}.go — delete isProcessAlive,
  keep only the daemon-attr and signal helpers.
- forge-cli/cmd/serve.go — switch to process.IsAlive (2 call sites).
- forge-ui/process.go — delete pidAlive; switch waitForPort to
  process.IsAlive. Drop the now-unused syscall import.
- forge-ui/discovery.go — switch to process.IsAlive.

There is now exactly one implementation of "is this PID alive" in
the tree, build-tag-split, consumed by both forge-cli and forge-ui.

Verified: gofmt clean; golangci-lint 0 issues across all four
modules; GOOS=windows go build succeeds for forge-core, forge-cli,
forge-ui; native tests pass.
@initializ-mk initializ-mk merged commit 83e93e9 into main May 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows: forge ui reports 'agent failed to start' for healthy daemons (Signal(0) unsupported on Windows)

1 participant