Simplify execution model: remove attach/detach, fix run lifecycle#196
Conversation
Simplifying to two modes: interactive (attached) and non-interactive (monitor via moat logs). The attach command allowed reconnecting to running containers, which added complexity without sufficient value.
Non-interactive runs now always start in background. Interactive runs always own the terminal. Removes RunAttached and RunInteractive in favor of the single RunInteractiveAttached path.
Only Ctrl-/ k (stop) remains. Ctrl-/ d is no longer an escape sequence and passes through to the container.
Interactive sessions now run until the process exits or user sends Ctrl-/ k to stop. No detach path exists.
Only StartAttached remains for interactive sessions. The separate Attach method was only needed for the removed moat attach command.
Point users to moat logs and moat stop instead of the removed moat attach command.
Remove documentation for moat attach command, --detach flag, and Ctrl-/ d escape sequence. Document the simplified two-mode model.
TTY was always set identically to Interactive by every caller and was never read inside internal/run/. The actual TTY decision is made at runtime via term.IsTerminal(os.Stdin) in manager.StartAttached().
StreamLogs was never true in production - all callers passed false. Non-interactive runs tell users to use 'moat logs -f' instead of streaming to stdout. Removes the streamLogs method and its stdcopy import.
Extract three shared helpers that were duplicated between Start() and StartAttached(): - setLogContext: configures structured logger with run fields - setupPortBindings: resolves host ports and registers routes - setupFirewall: configures iptables isolation for strict network policy Reduces ~80 lines of duplication while preserving method-specific ordering (Start does firewall before ports, StartAttached the reverse).
With only one escape action (EscapeStop), the separate escapeCh channel was unnecessary. Escape errors now flow through attachDone directly and are handled inline, eliminating a channel and simplifying the goroutine to a one-liner.
Consolidate resource cleanup duplicated across Stop(), Wait(), monitorContainerExit(), and Destroy() into a single cleanupResources() method guarded by sync.Once. This ensures proxy unregistration, SSH agent stop, service/BuildKit teardown, container removal, network cleanup, route unregistration, and temp dir removal happen exactly once regardless of which code path runs first. Reduces ~300 lines of duplicated cleanup code across four methods.
All three provider CLIs (claude, codex, gemini) followed an identical ~170 line pattern for workspace resolution, grant dedup, config setup, dry run, and execution. Extract a RunProvider() helper in internal/cli that handles the shared boilerplate. Each provider is now ~30-50 lines of config + flag registration, with provider-specific logic isolated in BuildCommand and ConfigureAgent callbacks. The ProviderRunConfig struct makes the differences between providers explicit and declarative.
…etadata Three fixes for the simplified execution model: 1. Non-interactive runs now block until the container exits instead of returning immediately. This ensures monitorContainerExit has time to capture logs and update state before manager.Close() cancels its context. Signal handling (SIGINT/SIGTERM) gracefully stops the run. 2. Interactive runs now update state to stopped/failed after the container exits normally. Previously state stayed "running" and stopped_at was never set. 3. Docker ContainerLogsAll now waits for the container to reach not-running state before reading logs. This prevents empty log capture for fast-exiting containers where Docker's log driver hasn't flushed yet.
Non-interactive runs block until exit, not background. Remove references to the removed -d/--detach flag from worktree examples.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 6m 24s —— View job PR Review: Simplify execution model
OverviewThis is a solid simplification: removing the attach/detach model and replacing it with two clean modes (interactive/blocking and non-interactive/blocking) removes a lot of complexity. The root causes of the bugs fixed here (orphaned containers, stuck metadata) are correctly identified and addressed. A few issues worth addressing before merging: Bug: Escape-stop (
|
Address code review feedback: 1. Escape-stop (Ctrl-/ k) no longer sets state to "failed". The state update and log capture in StartAttached are skipped when the attach ended due to an escape or context cancellation, letting the caller's Stop() handle state and monitorContainerExit capture complete logs. 2. captureLogs no longer fires while the container is still running on escape-stop. Previously this caused a 10s stall (ContainerWait timeout) and captured incomplete logs. 3. Provider "Started..." message now prints via OnRunCreated callback before ExecuteRun blocks, not after the run completes.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 5m 8s —— View job PR Review: Simplify execution model
What was fixed since the previous reviewAll four concrete bugs called out in the previous review are addressed:
Issues still open from previous reviewData race:
The real concern is m.mu.RLock()
dc := m.daemonClient
m.mu.RUnlock()
r.cleanupOnce.Do(func() {
...
if r.ProxyAuthToken != "" && dc != nil {
dc.UnregisterRun(ctx, r.ProxyAuthToken)
}
...
})
Not a correctness bug today, but an easy source of future divergence. New issue: Apple TTY interactive logs lost on escape-stop
When // captureLogs, line 2525
if r.Interactive && m.runtime.Type() == container.RuntimeApple {
return
}The This breaks the audit trail for any Apple-container interactive run that is escape-stopped or SIGTERM-stopped. The fix is to write the buffer even when // Write Apple TTY buffer regardless of callerWillStop —
// the buffer is complete (tee stops when StartAttached returns)
// and Stop()/captureLogs() has no way to get Apple TTY output.
if r.Interactive && r.Store != nil && m.runtime.Type() == container.RuntimeApple {
if r.logsCaptured.CompareAndSwap(false, true) {
if lw, err := r.Store.LogWriter(); err == nil {
if logBuffer.Len() > 0 {
_, _ = lw.Write(logBuffer.Bytes())
}
lw.Close()
} else {
r.logsCaptured.Store(false)
}
}
}MinorStale comment in case <-ctx.Done():
// Context canceled - caller chose to detach, don't stop the run
return ctx.Err()"detach" is removed from the model. The comment should say something like "Context canceled — caller is responsible for stopping the run." Fix this → What looks goodThe core lifecycle redesign is solid. The escape-stop fixes are correct and the state machine is now consistent. Interactive runs use Summary: The Apple TTY log-loss bug (#7) is a correctness issue for a real user scenario (macOS, interactive, escape-stop). The data race (#5) should be fixed before merge. Everything else is minor. |
Address second round of code review feedback: 1. Apple TTY log buffer is now written unconditionally in StartAttached, not gated on callerWillStop. The buffer holds all output up to the escape point, and captureLogs() returns early for Apple interactive runs so this is the only path that writes logs. 2. cleanupResources snapshots m.daemonClient under m.mu.RLock before the cleanupOnce closure, preventing a data race with Create() which writes daemonClient under m.mu.Lock. 3. Stale "detach" comment in Wait() updated.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 4m 51s —— View job PR Review: Simplify execution model
What was fixed since the previous two reviewsAll five concrete bugs from previous rounds are addressed:
Bug: Partial data race fix —
|
…omment Complete the daemonClient race fix: the UnregisterRoutes call inside cleanupOnce.Do was still accessing m.daemonClient directly instead of the snapshot taken under m.mu.RLock. Also update monitorContainerExit comment to remove stale "detached mode" reference.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 4m 43s —— View job PR Review: Simplify execution model
What was fixed since the last reviewAll three outstanding items from round 3 have been addressed:
Bug:
|
1. Move state update (SetStateFailedAt/SetStateWithTime) before close(exitCh) in monitorContainerExit so Wait() reads the final state when it unblocks. Previously Wait() could race and return nil for a failed container. 2. Align StartAttached to use same firewall→portBindings ordering as Start, preventing future divergence. 3. Update stale "detached completion" comment in captureLogs.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 7m 40s —— View job PR Review: Simplify execution model
What was fixed since the last reviewAll four items from round 4 are addressed in the latest commit (
Remaining:
|
Snapshot m.daemonClient under m.mu.RLock before calling RegisterRoutes, same pattern as cleanupResources. Prevents data race with Create() which writes daemonClient under lock.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 8m 18s —— View job PR Review: Simplify execution model
What was fixed since the last reviewThe single outstanding issue from round 5 is addressed in commit
Minor: Apple TTY log completeness on SIGTERM (carried from rounds 3–5)Still unresolved and still minor. When SIGTERM arrives, That goroutine writes the Apple TTY In practice the write is a local disk No new issues foundThe complete pass over The lock ordering ( OverallThe PR is ready to merge. The core redesign is solid:
Six rounds of review and ten fixes later, the lifecycle semantics are clean and correct. |
- Remove duplicate run ID: one "Started" line from OnRunCreated (provider path) or exec.go (moat run path), not both - Remove stale "Stop with Ctrl+C" hint - Add post-completion hint: 'moat logs <id>' to view output - Remove post-ExecuteRun print in run.go (ExecuteRun blocks now)
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 6m 26s —— View job PR Review: Simplify execution model
What was fixed since the last reviewAll items from round 6 are addressed in the latest commit (
Bug: Normal interactive exit doesn't clean up resources
When
Everything persists until The fix is one line in the m.cleanupResources(ctx, r)
Minor:
|
Non-interactive runs now stream container output to the terminal in real-time instead of blocking silently. This gives docker-run-like UX where users see output as it happens. Key changes: - monitorContainerExit uses context.Background() so it survives Close() - Manager.Close() waits for tracked monitors via sync.WaitGroup before closing the runtime, ensuring logs are always captured - Only monitors started via Start() are tracked; inherited monitors from loadPersistedRuns are not, so Close() doesn't hang on old containers - Docker ContainerLogs demuxes multiplexed format for non-TTY containers - Non-interactive path streams logs via FollowLogs while blocking on Wait - Added Ctrl+C hint on start and moat logs hint on completion (dimmed) - Updated help text and docs to describe streaming behavior
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 6m 52s —— View job PR Review: Simplify execution model
What was fixed since the last reviewAll items from round 7 are addressed in the latest commit (
Bug:
|
Address code review feedback: - cleanupResources now called on normal interactive exit (natural container exit path). Without this, Docker containers, networks, sidecars, and temp dirs leaked until moat destroy. - Docker ContainerLogs returns error on inspect failure instead of silently producing garbled multiplexed output. - Run ID printed before both interactive and non-interactive sessions so moat run -i shows the run ID before the session starts.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 16s —— View job PR Review: Simplify execution model
What was fixed in
|
Apple's "container wait" CLI command hangs indefinitely when a container is stopped or removed externally (e.g., via "moat stop" from another process). This blocked monitorContainerExit forever, causing the original "moat run" process to hang. Switch to always using the polling-based wait, which checks container status via inspect every 500ms. If inspect fails (container removed), treat it as a clean exit instead of returning an error.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 2m 20s —— View job PR Review: Simplify execution model
What was fixed in
|
Implementation plan for the fix/interactive-only branch. Updated to reflect the final design: non-interactive runs block with log streaming instead of fire-and-forget, plus additional fixes for monitor lifecycle, Docker log demuxing, and interactive resource cleanup.
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 2m 29s —— View job PR Review: Simplify execution model
What changed in the latest commit (
|
… errors Address code review feedback on the polling-based wait: - Distinguish "not found" errors (container removed) from transient XPC errors by checking stderr, matching RemoveContainer's pattern - Handle Apple's inspect returning an empty JSON array for removed containers (exits 0 with "[]" rather than erroring) - Log and retry on transient errors instead of treating all failures as container removal
|
Claude finished @dpup's task in 2m 4s —— View job PR Review: Simplify execution model
What was fixed in
|
Summary
moat attachcommand,--detachflag, and detach escape sequence (Ctrl-/ d)manager.Close()was cancelingmonitorContainerExitbefore it could finish)ProviderRunnerhelper to deduplicate provider CLI boilerplatecleanupResourcesandpostStarthelpers from run managerTest plan
make test-unit, 38 packages, race detector)make lint, 0 issues)echocaptures logs and setsstate: stoppedsleep+ SIGTERM stops gracefullyechosetsstate: stoppedandstopped_atmoat stopfrom another process cleanly unblocks the CLI🤖 Generated with Claude Code