Summary
On Windows, forge ui reports agent failed to start even when the agent process is actually running successfully — port is bound, .forge/serve.log contains the normal startup banner (REST: ..., JSON-RPC: ..., Press Ctrl+C to stop) and structured success logs ({"msg":"scheduler reloaded"}). The bug does NOT occur on macOS/Linux.
The user-visible error text literally contains the agent's successful startup output as if it were the failure cause:
agent failed to start: REST: http://localhost:9105/tasks/send JSON-RPC: POST http://localhost:9105/
──────────────────────────────────────── Press Ctrl+C to stop
{"active":0,"level":"info","msg":"scheduler reloaded","time":"2026-05-18T20:21:04Z"}
That's a strong tell: the UI thinks the daemon died, falls through to its "read the log to find the error" path, and shows the daemon's success banner to the user as "the error".
Root cause
forge-ui/process.go:205-211 defines:
func pidAlive(pid int) bool {
proc, err := os.FindProcess(pid)
if err != nil {
return false
}
return proc.Signal(syscall.Signal(0)) == nil
}
Signal(0) is the standard Unix idiom for "is this PID alive?". It does not work on Windows. Go's os.Process.Signal on Windows only handles os.Interrupt (→ GenerateConsoleCtrlEvent) and os.Kill (→ TerminateProcess). For any other signal — including Signal(0) — it returns operating system does not support signal. So pidAlive returns false on Windows regardless of whether the PID is alive. The check is effectively dead code on Windows that always reports "dead".
The forge-cli daemon code already knows this. Compare:
// forge-cli/cmd/serve_unix.go:14-20 (uses Signal(0) — correct on Unix)
func isProcessAlive(pid int) bool {
proc, err := os.FindProcess(pid)
if err != nil { return false }
return proc.Signal(syscall.Signal(0)) == nil
}
// forge-cli/cmd/serve_windows.go:14-22 (uses OpenProcess + CloseHandle — correct on Windows)
func isProcessAlive(pid int) bool {
const processQueryLimitedInfo = 0x1000
h, err := syscall.OpenProcess(processQueryLimitedInfo, false, uint32(pid))
if err != nil { return false }
_ = syscall.CloseHandle(h)
return true
}
Both files implement a function called isProcessAlive, build-tag-gated. The forge-ui code didn't reuse this — it duplicated the logic with a single Unix-style pidAlive and forgot the Windows variant.
Symptom chain
| Step |
macOS / Linux |
Windows |
1. UI spawns forge serve start (DETACHED_PROCESS / Setsid) |
OK |
OK — daemon detaches cleanly |
2. waitForPort (forge-ui/process.go:162) reads PID from .forge/serve.json |
OK |
OK |
3. pidAlive(pid) called inside the poll loop (process.go:174) |
Signal(0) succeeds → returns true |
Signal(0) always errors → returns false |
4. waitForPort outcome |
Loops until TCP port responds → returns true |
Fast-fails at step 3, returns false immediately |
| 5. UI reaction |
Reports success to the browser |
Reports failure, calls readServeLogs |
6. serve.log content by this point |
n/a |
Contains the daemon's normal startup banner |
| 7. User-visible "error" |
n/a |
The banner itself, as in the issue report |
Second occurrence — same bug, different blast radius
forge-ui/discovery.go:133 reuses the same broken pidAlive:
if state.PID > 0 && !pidAlive(state.PID) {
_ = os.Remove(statePath) // ← deletes .forge/serve.json
return 0, false
}
On Windows this is actively destructive: every UI discovery call thinks the daemon is dead and deletes the daemon's serve.json state file. The daemon keeps running but disappears from the UI's view. Worth confirming as a separate symptom — Windows users may also see "agent disappeared from the dashboard" / "I have to recreate the agent every page reload".
Reproduction
- On a Windows machine,
forge init my-agent, configure a model + key.
forge ui, click "Start" on the agent.
- Observed: red banner
agent failed to start: <agent's success banner>. Backend agent is actually healthy — netstat -an | findstr <port> shows LISTEN, .forge/serve.log shows no errors, the JSON-RPC endpoint responds to curl.
- Refresh the dashboard:
.forge/serve.json is gone (deleted by discovery.go:133); the daemon process is still running but the UI can't find it.
Steps 3 and 4 both reproduce on any Windows machine; neither reproduces on macOS or Linux.
Recommended fix (Option A)
Extract the platform-aware liveness check into a shared, build-tag-split utility consumed by both forge-cli/cmd/serve_*.go (existing isProcessAlive) and forge-ui/{process,discovery}.go (broken pidAlive). Eliminates the duplicate function and keeps the build-tag boundary in exactly one place.
Candidate location: forge-core/util/processalive/ (or similar) — both forge-cli and forge-ui already depend on forge-core, so no circular dependency risk.
Proposed signature:
// forge-core/util/processalive/processalive.go (or wherever)
package processalive
// IsAlive reports whether a process with the given PID exists.
// Implementation is platform-specific: Unix uses Signal(0); Windows uses
// OpenProcess. Returns false on any error so callers treat ambiguous
// failures (permission, system call error) as "not alive" — same semantics
// as the original Unix-only check.
func IsAlive(pid int) bool { ... }
The two files:
// processalive_unix.go (//go:build !windows)
// processalive_windows.go (//go:build windows)
Then:
forge-cli/cmd/serve_{unix,windows}.go delete their isProcessAlive and import the helper.
forge-ui/process.go:205-211 deletes pidAlive and calls the helper.
forge-ui/discovery.go:133 calls the helper.
A short alternative (Option B) — duplicate the platform-split inside forge-ui only — was rejected because it carries the same duplicate-implementation risk that caused this bug.
Acceptance criteria
- On Windows, starting an agent from
forge ui shows the success state, not "agent failed to start", for an agent that the backend actually started successfully.
- On Windows,
forge ui does not delete .forge/serve.json for a still-running daemon.
- On macOS/Linux, behavior is unchanged.
- There is exactly one implementation of "is this PID alive" in the repo, build-tag-split.
forge-cli and forge-ui both consume it.
- A test (or unit-test where feasible) covers the Windows OpenProcess path. (CI may not run Windows; at minimum, the build-tag-split is verified to compile on
GOOS=windows.)
Out of scope
- The 5-second timeout in
waitForPort is reasonable; this issue does not change it.
forge-ui/process.go:213+ readServeLogs is fine — the bug is that it gets called at all on Windows, not what it does.
Cross-reference
The forge-cli side already has the correct Windows implementation since the daemon-lifecycle work; this is a regression where the forge-ui handler shipped without picking up the platform-aware variant.
Summary
On Windows,
forge uireportsagent failed to starteven when the agent process is actually running successfully — port is bound,.forge/serve.logcontains the normal startup banner (REST: ...,JSON-RPC: ...,Press Ctrl+C to stop) and structured success logs ({"msg":"scheduler reloaded"}). The bug does NOT occur on macOS/Linux.The user-visible error text literally contains the agent's successful startup output as if it were the failure cause:
That's a strong tell: the UI thinks the daemon died, falls through to its "read the log to find the error" path, and shows the daemon's success banner to the user as "the error".
Root cause
forge-ui/process.go:205-211defines:Signal(0)is the standard Unix idiom for "is this PID alive?". It does not work on Windows. Go'sos.Process.Signalon Windows only handlesos.Interrupt(→GenerateConsoleCtrlEvent) andos.Kill(→TerminateProcess). For any other signal — includingSignal(0)— it returnsoperating system does not support signal. SopidAlivereturnsfalseon Windows regardless of whether the PID is alive. The check is effectively dead code on Windows that always reports "dead".The forge-cli daemon code already knows this. Compare:
Both files implement a function called
isProcessAlive, build-tag-gated. The forge-ui code didn't reuse this — it duplicated the logic with a single Unix-stylepidAliveand forgot the Windows variant.Symptom chain
forge serve start(DETACHED_PROCESS / Setsid)waitForPort(forge-ui/process.go:162) reads PID from.forge/serve.jsonpidAlive(pid)called inside the poll loop (process.go:174)Signal(0)succeeds → returnstrueSignal(0)always errors → returnsfalsewaitForPortoutcometruefalseimmediatelyreadServeLogsserve.logcontent by this pointSecond occurrence — same bug, different blast radius
forge-ui/discovery.go:133reuses the same brokenpidAlive:On Windows this is actively destructive: every UI discovery call thinks the daemon is dead and deletes the daemon's
serve.jsonstate file. The daemon keeps running but disappears from the UI's view. Worth confirming as a separate symptom — Windows users may also see "agent disappeared from the dashboard" / "I have to recreate the agent every page reload".Reproduction
forge init my-agent, configure a model + key.forge ui, click "Start" on the agent.agent failed to start: <agent's success banner>. Backend agent is actually healthy —netstat -an | findstr <port>shows LISTEN,.forge/serve.logshows no errors, the JSON-RPC endpoint responds to curl..forge/serve.jsonis gone (deleted bydiscovery.go:133); the daemon process is still running but the UI can't find it.Steps 3 and 4 both reproduce on any Windows machine; neither reproduces on macOS or Linux.
Recommended fix (Option A)
Extract the platform-aware liveness check into a shared, build-tag-split utility consumed by both
forge-cli/cmd/serve_*.go(existingisProcessAlive) andforge-ui/{process,discovery}.go(brokenpidAlive). Eliminates the duplicate function and keeps the build-tag boundary in exactly one place.Candidate location:
forge-core/util/processalive/(or similar) — bothforge-cliandforge-uialready depend onforge-core, so no circular dependency risk.Proposed signature:
The two files:
Then:
forge-cli/cmd/serve_{unix,windows}.godelete theirisProcessAliveand import the helper.forge-ui/process.go:205-211deletespidAliveand calls the helper.forge-ui/discovery.go:133calls the helper.A short alternative (Option B) — duplicate the platform-split inside
forge-uionly — was rejected because it carries the same duplicate-implementation risk that caused this bug.Acceptance criteria
forge uishows the success state, not "agent failed to start", for an agent that the backend actually started successfully.forge uidoes not delete.forge/serve.jsonfor a still-running daemon.forge-cliandforge-uiboth consume it.GOOS=windows.)Out of scope
waitForPortis reasonable; this issue does not change it.forge-ui/process.go:213+readServeLogsis fine — the bug is that it gets called at all on Windows, not what it does.Cross-reference
The forge-cli side already has the correct Windows implementation since the daemon-lifecycle work; this is a regression where the forge-ui handler shipped without picking up the platform-aware variant.