Add Unix domain socket transport for daemon communication#541
Add Unix domain socket transport for daemon communication#541wesm merged 29 commits intoroborev-dev:mainfrom
Conversation
Introduces a DaemonEndpoint abstraction to support both TCP loopback and Unix domain sockets for daemon communication. Motivated by security hardening (filesystem permissions) and NixOS container isolation via bind-mounted sockets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add waitForServerReady endpoint-aware signature (critical: loopback check would reject Unix sockets and block daemon startup) - Document all ~15 CLI command files that construct raw http.Client - Add getDaemonHTTPClient() convenience helper for CLI commands - Address postcommit.go hookHTTPClient lazy initialization - Add TUI reconnectMsg struct changes and handleReconnectMsg handler - Document daemon_cmd.go --addr flag accepting unix:// values - Document tui_cmd.go --addr flag ParseEndpoint treatment - Add ParseEndpoint http:// prefix stripping for backwards compat - Fix CleanupZombieDaemons to use PID-based detection for Unix sockets - Add container bind-mount note about explicit paths - Add macOS path length safety net note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ort flow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9 tasks: DaemonEndpoint type, RuntimeInfo changes, probe/kill migration, server Start() flow, client update, CLI migration (atomic), TUI changes, test updates, dead code cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ime signature Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change ProbeDaemon, IsDaemonAlive, waitForServerReady, and KillDaemon to accept DaemonEndpoint instead of raw address strings. This lets Unix socket endpoints flow through the same probe/kill paths as TCP, using ep.HTTPClient() and ep.BaseURL() for transport-aware HTTP requests. Add isProcessAlive() on Unix (signal 0) and Windows (processExists) for fast PID-based liveness checks in CleanupZombieDaemons, avoiding slow HTTP probes against dead Unix sockets. KillDaemon now also cleans up Unix socket files when removing runtime artifacts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… cleanup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace getDaemonAddr() with getDaemonEndpoint() which returns a
DaemonEndpoint instead of a raw URL string. This enables Unix socket
support throughout the CLI by using the endpoint's HTTPClient() and
BaseURL() methods for all daemon communication.
Key changes:
- getDaemonEndpoint() resolves from runtime file first, then parses
the --server flag via ParseEndpoint (handles "", "host:port",
"http://host:port", "unix://path")
- getDaemonHTTPClient(timeout) creates transport-aware HTTP clients
- All bare http.Post() and &http.Client{Timeout: N} replaced
- probeDaemonServerURL removed (ProbeDaemon used directly)
- ensureDaemon/startDaemon no longer mutate the serverAddr global
- --server flag default changed from "http://127.0.0.1:7373" to ""
(ParseEndpoint treats "" as default TCP endpoint)
- hookHTTPClient changed from var to func for testability
- fix.go recovery functions return getDaemonEndpoint().BaseURL()
instead of the global serverAddr
3 fix daemon-recovery tests fail because they relied on mutating the
serverAddr global to simulate daemon crashes; these need updates in
the test migration task.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ication Replace the string-based serverAddr field with a typed DaemonEndpoint in the TUI model, Config, and reconnectMsg. This enables Unix socket transport by using endpoint.BaseURL() for URL construction and endpoint.HTTPClient() for transport-aware HTTP clients. The tui_cmd.go now parses --addr via daemon.ParseEndpoint instead of manual http:// prefix logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the DaemonEndpoint migration, getDaemonEndpoint() discovers the daemon via runtime files before falling back to serverAddr. Tests that simulated daemon death by setting serverAddr to a dead address no longer worked because the runtime file still pointed at the live mock server. Fix getDaemonEndpoint() to use the test-overridable getAnyRunningDaemon var instead of calling daemon.GetAnyRunningDaemon() directly. Fix the three failing tests by removing daemon runtime files when simulating death, so getDaemonEndpoint falls back to serverAddr: - TestFixSingleJobRecoversPostFixDaemonCalls - TestRunFixOpenRecoversFromDaemonRestartOnRequery - TestRunFixWithSeenDiscoveryAbortsOnConnectionError Also add /api/ping to the recovery server in the restart-on-requery test so ensureDaemon() succeeds after recovery, and save/restore getAnyRunningDaemon in NewMockDaemon cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
validateDaemonBindAddr had no callers after the DaemonEndpoint migration. parseDaemonBindAddr is kept because FindAvailablePort still uses it. getDaemonAddr and probeDaemonServerURL were already removed in prior tasks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The original stated reason I thought I wanted this (nixos container network isolation) doesn't work for other reasons (the daemon accesses user directories to do git operations). That said, I put this up anyway in case there's a desire for the daemon to support unix sockets. However, I don't want to bloat the app with stuff that's not being used, so I am 100% happy to close this out as YAGNI. |
roborev: Combined Review (
|
|
I understand the need for an alternative to TCP, even if TCP/HTTP is the main route to expose the daemon, so if it can be hardened and not overcomplicate things, it seems reasonable to me (on Windows we might need to support other non-HTTP transport for certain applications, too) |
- Snapshot DaemonEndpoint once per operation to prevent split endpoint/client resolution (high: all CLI command files) - Validate Unix socket parent directory permissions (0700, no group/world access) in Server.Start() (high) - Add DisableKeepAlives to Unix socket HTTP transport to prevent file descriptor leaks from transient clients (medium) - Validate auto-generated DefaultSocketPath() length at parse time, same as explicit paths (medium) - Warn on stderr when --server flag has an invalid value instead of silently falling back to default (medium) - Fix 2 test regressions from endpoint snapshotting in fix.go (daemon recovery tests need fresh endpoint resolution per call) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use os.Stat instead of os.Lstat for socket directory permission check so symlinked directories are evaluated correctly - Update waitForPromptJob to accept DaemonEndpoint and use endpoint-aware HTTP client (fixes Unix socket support for roborev run --wait) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
…omptJob Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
…ag early - analyze.go: pass DaemonEndpoint instead of base URL string through runSingleAnalysis, runPerFileAnalysis, enqueueAnalysisJob, runAnalyzeAndFix, waitForAnalysisJob — fixes Unix socket support for analyze --wait (base URL "http://localhost" can't round-trip) - main.go: add PersistentPreRunE to validate --server flag at startup so invalid values fail fast instead of silently falling back to default - daemon_lifecycle.go: cache validated endpoint from flag, remove warning-and-fallback pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Replaces http.DefaultClient with getDaemonHTTPClient so the fix.go retry infrastructure works with Unix socket transports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…test The custom http.Transport already has nil Proxy (zero value), but make it explicit and add a regression test that verifies HTTP_PROXY env vars don't interfere with Unix socket communication. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Re: Combined Review ( fix.go Daemon startup/discovery: |
Unix domain socket paths use forward-slash absolute paths which filepath.IsAbs rejects on Windows, and os.Getuid is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Protect Server.endpoint with a mutex to prevent a race between Start() (write) and Stop() (read from signal handler goroutine) - Skip Unix socket parse/path tests on Windows where Unix-style absolute paths are invalid and os.Getuid is unavailable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move Unix socket cleanup to after httpServer.Shutdown() completes, which guarantees Start() has finished writing s.endpoint. No mutex needed — the happens-before relationship is established by Shutdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
roborev: Combined Review (
|
The ordering-based approach (socket cleanup after Shutdown) is not formally correct under the Go memory model when Stop() races with early Start() execution. The mutex is a single lock/unlock pair in a shutdown path — negligible overhead, clear correctness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
|
Hey! @wesm asked me (Claude) to do a holistic review of this branch focused on functionality and ergonomics rather than code quality. The DaemonEndpoint abstraction is well-designed and the threading through the codebase is thorough. A few concerns worth discussing: 1.
|
1. Explicit --server flag now takes precedence over auto-discovered daemons. Only falls back to runtime file discovery when no flag is set. This matters when targeting a specific daemon instance. 2. fixSingleJob snapshots the daemon address once per operation instead of re-resolving on every API call. The retry helpers (withFixDaemonRetryContext) handle re-resolution internally on connection errors. Tests updated to simulate daemon death with connection-level failures (hijack+close) rather than 404s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressed in #1 (--server flag precedence): Fixed. #2 (fixSingleJob re-resolution): Fixed. Snapshots the daemon address once at the top. The #3 (Stat vs Lstat): Keeping #4 (parsedServerEndpoint global): Acknowledged. The test infrastructure ( #5 (hookHTTPClient): Acknowledged as minor. One client per commit is fine. #6 (DaemonEndpoint.Equal): YAGNI for now. Both fields are strings; |
roborev: Combined Review (
|
|
I think we're good here, thanks @cpcloud |
Motivation
The daemon HTTP API has no authentication or TLS — security relies entirely on the TCP loopback restriction. This works but has two gaps:
127.0.0.1:7373. Unix sockets give per-user access control via filesystem permissions (0600) without needing mTLS.Summary
DaemonEndpointtype that encapsulates TCP vs Unix socket transport, withParseEndpoint()supporting"","host:port","http://host:port","unix://"(auto tmpdir path), and"unix:///explicit/path"DaemonEndpointthroughRuntimeInfo, serverStart(),ProbeDaemon/IsDaemonAlive/KillDaemon,HTTPClient, all ~20 CLI command files, and the TUI0700parent dir and0600socket permissions; cleanup on shutdown and zombie detection via PID livenessRuntimeInfoJSON gains anetworkfield (backwards compatible — empty defaults to TCP)server_addr = "unix://"in config or--server unix://on CLI🤖 Generated with Claude Code