Skip to content

Pipeline Design 16

Seth Ford edited this page Feb 13, 2026 · 1 revision

Now I have everything I need. Here's the ADR:


Design: Add Linux systemd support for process supervision

Context

Shipwright's sw-launchd.sh (364 lines) provides macOS-only process supervision via launchd user agents. It generates three plist files (daemon, dashboard, connect) under ~/Library/LaunchAgents/ and manages them via launchctl load/unload. A hard check_macos() gate at line 43 blocks execution on non-Darwin platforms. Linux users — particularly those running headless CI servers or remote development machines — have no equivalent way to auto-start the daemon, dashboard, or connect services.

Constraints from the codebase:

  • All scripts must be Bash 3.2 compatible (no associative arrays, no readarray, no ${var,,})
  • scripts/lib/compat.sh already provides is_macos(), is_linux(), and the _COMPAT_UNAME override for testing
  • The daemon traps SIGINT SIGTERM at line 4330 of sw-daemon.sh, touching a $SHUTDOWN_FLAG file for graceful shutdown. The poll loop checks this flag in 1-second intervals and waits up to 30 seconds for workers to finish (lines 4137–4199)
  • CI already runs on a macos-latest / ubuntu-latest matrix (.github/workflows/test.yml)
  • The test harness pattern uses mock binaries in $PATH, PASS/FAIL counters, TEMP_DIR sandboxing, and HOME redirection (see sw-daemon-test.sh lines 22–60)

Decision

Extend sw-launchd.sh with platform-dispatching install/uninstall/status commands that branch on is_macos() / is_linux() from lib/compat.sh. The macOS plist logic stays intact (extracted into install_launchd(), uninstall_launchd(), status_launchd() helper functions). Parallel install_systemd(), uninstall_systemd(), status_systemd() functions generate systemd user-level unit files.

systemd unit design

Property Value Rationale
Unit directory ~/.config/systemd/user/ User-level — no root required, mirrors launchd's user-agent model
KillSignal SIGTERM Daemon traps SIGTERM → touches shutdown flag → graceful drain
TimeoutStopSec 35 (daemon), 10 (dashboard, connect) Daemon waits 30s for workers; 35s gives margin before SIGKILL
Restart on-failure Auto-restart on crash, but not on clean shipwright daemon stop
StandardOutput journal Journald provides log rotation free; queryable via journalctl --user -u shipwright-daemon
WantedBy default.target Standard user session target
Environment PATH + HOME Matches the macOS plist EnvironmentVariables pattern

loginctl enable-linger is called during install so services survive user logout on headless servers. The install function checks for loginctl availability and warns (non-fatal) if absent.

Three service units are generated:

Unit ExecStart Notes
shipwright-daemon.service <sw_bin> daemon start 35s stop timeout
shipwright-dashboard.service <bun_bin> run <repo>/dashboard/server.ts 10s stop timeout
shipwright-connect.service <sw_bin> connect start Only created if ~/.shipwright/team-config.json exists

Error handling: Each systemctl --user call is guarded with || true + warning output, matching the existing launchd pattern of non-fatal load/unload failures (see lines 227–244 of current sw-launchd.sh). Unsupported platforms (neither macOS nor Linux) get a clear error message and exit 1.

Platform dispatch pattern:

cmd_install() {
    if is_macos; then install_launchd
    elif is_linux; then install_systemd
    else error "Unsupported platform"; exit 1
    fi
}

Alternatives Considered

  1. Separate sw-systemd.sh script + new CLI subcommand — Pros: no risk of breaking existing macOS behavior; clear separation. Cons: duplicates sw binary resolution, log directory setup, and connect-conditional logic; requires new CLI router entry in scripts/sw; users must learn a different command per platform. The plan correctly chose extending the existing script since the user-facing command should be platform-agnostic.

  2. System-level systemd units (/etc/systemd/system/) — Pros: survive reboots without linger; visible to all users. Cons: requires sudo for install/uninstall, which breaks the no-root pattern matching launchd user agents; multi-user install is out of scope. User-level units are the correct analog.

  3. Docker/container-based supervision — Pros: works on any OS. Cons: heavy dependency; the daemon itself spawns tmux sessions and Claude processes that assume a host environment; container isolation would break the core workflow.

Implementation Plan

  • Files to create:

    • scripts/sw-launchd-test.sh — New test suite (~13 tests) covering both platforms via _COMPAT_UNAME override
  • Files to modify:

    • scripts/sw-launchd.sh — Replace check_macos() with platform dispatch; extract macOS logic into install_launchd()/uninstall_launchd()/status_launchd(); add install_systemd()/uninstall_systemd()/status_systemd(); update help text and header
    • package.json (line 32) — Append && bash scripts/sw-launchd-test.sh to the test script chain
    • .github/workflows/test.yml — Add Run launchd tests step after the tmux tests block (line 92)
  • Dependencies: None. systemctl, loginctl, and journalctl are standard on all modern Linux distributions. The script gracefully degrades if they're missing.

  • Risk areas:

    • _COMPAT_UNAME override fidelity — Tests mock the platform but can't exercise real systemctl/launchctl calls. The mock binaries in $PATH must faithfully simulate exit codes and stdout patterns. Mitigated by testing file generation content (grep for KillSignal=SIGTERM, TimeoutStopSec=35, etc.) rather than relying on tool behavior.
    • loginctl enable-linger on CI — GitHub Actions ubuntu runners may not have a full systemd user session. The install test should verify unit file generation without requiring systemctl daemon-reload to succeed. Mock loginctl in the test PATH.
    • Plist regression — Extracting macOS logic into helper functions could introduce bugs if variable scoping (local) is wrong. The existing macOS tests (if run on macOS CI) serve as regression guard.
    • Bash 3.2 compatibility — All new code must avoid associative arrays, readarray, ${var,,} lowercase, ${var^^} uppercase. No new syntax risks in this change since it's straightforward conditionals and heredocs.

Validation Criteria

  • shipwright launchd install on Linux generates three .service files in ~/.config/systemd/user/ with correct KillSignal=SIGTERM, TimeoutStopSec=35, Restart=on-failure, and StandardOutput=journal directives
  • shipwright launchd install on macOS continues to generate three .plist files in ~/Library/LaunchAgents/ (no regression)
  • shipwright launchd uninstall on Linux removes unit files and calls systemctl --user disable/stop for each service
  • shipwright launchd status on Linux queries systemctl --user is-active and shows journal entries
  • Connect service is only generated when ~/.shipwright/team-config.json exists (both platforms)
  • loginctl enable-linger is called during install; failure is a warning, not a hard error
  • Unsupported platforms (neither macOS nor Linux) get a clear error and exit 1
  • New sw-launchd-test.sh passes on both macos-latest and ubuntu-latest CI runners
  • All 23 test suites pass (npm test — 22 existing + 1 new)
  • No Bash 3.2 violations (no associative arrays, no readarray, no ${var,,})
  • Unit file ExecStart paths are resolved to absolute paths (no relative sw references)

Clone this wiki locally