Skip to content

thv stop/rm leave the port held when the status file is missing #5393

@gkatz2

Description

@gkatz2

Bug description

thv stop and thv rm report success but do not actually stop or clean up the workload when its status file is missing. The workload's proxy process is left running and keeps holding the workload's port:

  • After thv stop, the surviving proxy's supervisor restarts the container, so the workload returns to running on its own.
  • After thv rm, the container is removed but the orphaned proxy keeps holding the port.

Either way there is no thv-only recovery: a later thv run/thv start on the same port fails with "address already in use", and the user has to find and kill the proxy manually (e.g. lsof -ti :<port> | xargs kill). The documented stop behavior is "Stops proxy process → stops container → preserves state" (docs/arch/08-workloads-lifecycle.md), so the surviving proxy is a contract violation.

Steps to reproduce

The precondition is a missing status file. The steps below remove it deliberately to reproduce on demand; we have also seen workloads reach this state on their own (listed in thv ls but with no status file). This report is about the broken stop/rm behavior, not the cause of the missing file.

  1. Start a container workload with a pinned proxy port:
    thv run --name demo --proxy-port 51999 fetch
  2. Confirm the proxy is listening and note its PID:
    lsof -i :51999 -sTCP:LISTEN
  3. Remove the workload's status file:
    • macOS: ~/Library/Application Support/toolhive/statuses/demo.json
    • Linux: ~/.local/share/toolhive/statuses/demo.json
  4. thv ls still lists demo as running (this entry comes from the runtime, not the status file).
  5. Run thv stop demo (or thv rm demo). It exits successfully.
  6. Observe that the proxy was not terminated (the PID from step 2 is still alive, e.g. ps -p <pid>, and :51999 is still held):
    • After thv stop: within ~10s thv ls shows demo back to running, because the surviving supervisor restarted the container.
    • After thv rm: demo is gone from thv ls, but the proxy still holds :51999, so thv run --proxy-port 51999 ... fails with "address already in use".

Expected behavior

thv stop and thv rm terminate the workload's proxy and free its port, even when the status file is missing — matching the documented "stops proxy process" behavior.

Actual behavior

The proxy is not terminated. After thv stop, the supervisor restarts the container and the workload returns to running. After thv rm, the container is removed but the orphaned proxy keeps holding the port. Both commands report success regardless.

Environment (if relevant)

  • OS/version: macOS 15 (Darwin 25.5.0), OrbStack runtime.
  • ToolHive version: v0.28.3 (also reproduced against main at 8932f938).

Additional context

Root cause: the proxy-stop path (stopProcess in pkg/workloads/manager.go) terminates the proxy using only the PID recorded in the status file (GetWorkloadPID). When the status file is missing, GetWorkloadPID returns (0, nil), KillProcess(0) fails, and no proxy is killed. The container is stopped/removed through the runtime, but the detached proxy (thv start <name> --foreground) keeps running; its supervisor (the RunWorkload retry loop) then restarts the container on stop. This path reads the PID from the status file on every platform, so the issue is not macOS-specific.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcliChanges that impact CLI functionalitygoPull requests that update go code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions