Bug description
thv stop and thv rm report success but do not actually stop or clean up the workload when its status file is missing. The workload's proxy process is left running and keeps holding the workload's port:
- After
thv stop, the surviving proxy's supervisor restarts the container, so the workload returns to running on its own.
- After
thv rm, the container is removed but the orphaned proxy keeps holding the port.
Either way there is no thv-only recovery: a later thv run/thv start on the same port fails with "address already in use", and the user has to find and kill the proxy manually (e.g. lsof -ti :<port> | xargs kill). The documented stop behavior is "Stops proxy process → stops container → preserves state" (docs/arch/08-workloads-lifecycle.md), so the surviving proxy is a contract violation.
Steps to reproduce
The precondition is a missing status file. The steps below remove it deliberately to reproduce on demand; we have also seen workloads reach this state on their own (listed in thv ls but with no status file). This report is about the broken stop/rm behavior, not the cause of the missing file.
- Start a container workload with a pinned proxy port:
thv run --name demo --proxy-port 51999 fetch
- Confirm the proxy is listening and note its PID:
lsof -i :51999 -sTCP:LISTEN
- Remove the workload's status file:
- macOS:
~/Library/Application Support/toolhive/statuses/demo.json
- Linux:
~/.local/share/toolhive/statuses/demo.json
thv ls still lists demo as running (this entry comes from the runtime, not the status file).
- Run
thv stop demo (or thv rm demo). It exits successfully.
- Observe that the proxy was not terminated (the PID from step 2 is still alive, e.g.
ps -p <pid>, and :51999 is still held):
- After
thv stop: within ~10s thv ls shows demo back to running, because the surviving supervisor restarted the container.
- After
thv rm: demo is gone from thv ls, but the proxy still holds :51999, so thv run --proxy-port 51999 ... fails with "address already in use".
Expected behavior
thv stop and thv rm terminate the workload's proxy and free its port, even when the status file is missing — matching the documented "stops proxy process" behavior.
Actual behavior
The proxy is not terminated. After thv stop, the supervisor restarts the container and the workload returns to running. After thv rm, the container is removed but the orphaned proxy keeps holding the port. Both commands report success regardless.
Environment (if relevant)
- OS/version: macOS 15 (Darwin 25.5.0), OrbStack runtime.
- ToolHive version: v0.28.3 (also reproduced against
main at 8932f938).
Additional context
Root cause: the proxy-stop path (stopProcess in pkg/workloads/manager.go) terminates the proxy using only the PID recorded in the status file (GetWorkloadPID). When the status file is missing, GetWorkloadPID returns (0, nil), KillProcess(0) fails, and no proxy is killed. The container is stopped/removed through the runtime, but the detached proxy (thv start <name> --foreground) keeps running; its supervisor (the RunWorkload retry loop) then restarts the container on stop. This path reads the PID from the status file on every platform, so the issue is not macOS-specific.
Bug description
thv stopandthv rmreport success but do not actually stop or clean up the workload when its status file is missing. The workload's proxy process is left running and keeps holding the workload's port:thv stop, the surviving proxy's supervisor restarts the container, so the workload returns torunningon its own.thv rm, the container is removed but the orphaned proxy keeps holding the port.Either way there is no
thv-only recovery: a laterthv run/thv starton the same port fails with "address already in use", and the user has to find and kill the proxy manually (e.g.lsof -ti :<port> | xargs kill). The documented stop behavior is "Stops proxy process → stops container → preserves state" (docs/arch/08-workloads-lifecycle.md), so the surviving proxy is a contract violation.Steps to reproduce
The precondition is a missing status file. The steps below remove it deliberately to reproduce on demand; we have also seen workloads reach this state on their own (listed in
thv lsbut with no status file). This report is about the broken stop/rm behavior, not the cause of the missing file.thv run --name demo --proxy-port 51999 fetchlsof -i :51999 -sTCP:LISTEN~/Library/Application Support/toolhive/statuses/demo.json~/.local/share/toolhive/statuses/demo.jsonthv lsstill listsdemoasrunning(this entry comes from the runtime, not the status file).thv stop demo(orthv rm demo). It exits successfully.ps -p <pid>, and:51999is still held):thv stop: within ~10sthv lsshowsdemoback torunning, because the surviving supervisor restarted the container.thv rm:demois gone fromthv ls, but the proxy still holds:51999, sothv run --proxy-port 51999 ...fails with "address already in use".Expected behavior
thv stopandthv rmterminate the workload's proxy and free its port, even when the status file is missing — matching the documented "stops proxy process" behavior.Actual behavior
The proxy is not terminated. After
thv stop, the supervisor restarts the container and the workload returns torunning. Afterthv rm, the container is removed but the orphaned proxy keeps holding the port. Both commands report success regardless.Environment (if relevant)
mainat8932f938).Additional context
Root cause: the proxy-stop path (
stopProcessinpkg/workloads/manager.go) terminates the proxy using only the PID recorded in the status file (GetWorkloadPID). When the status file is missing,GetWorkloadPIDreturns(0, nil),KillProcess(0)fails, and no proxy is killed. The container is stopped/removed through the runtime, but the detached proxy (thv start <name> --foreground) keeps running; its supervisor (theRunWorkloadretry loop) then restarts the container onstop. This path reads the PID from the status file on every platform, so the issue is not macOS-specific.