Skip to content

Add Unix domain socket transport for daemon communication#541

Merged
wesm merged 29 commits intoroborev-dev:mainfrom
cpcloud:worktree-witty-dancing-crystal
Mar 18, 2026
Merged

Add Unix domain socket transport for daemon communication#541
wesm merged 29 commits intoroborev-dev:mainfrom
cpcloud:worktree-witty-dancing-crystal

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Mar 18, 2026

Motivation

The daemon HTTP API has no authentication or TLS — security relies entirely on the TCP loopback restriction. This works but has two gaps:

  1. Other local users can connect. Any process on the same machine can reach 127.0.0.1:7373. Unix sockets give per-user access control via filesystem permissions (0600) without needing mTLS.
  2. NixOS containers with network isolation can't reach loopback. Bind-mounting a Unix socket across container boundaries works without exposing a TCP port or adding HTTPS.

Summary

  • Introduce DaemonEndpoint type that encapsulates TCP vs Unix socket transport, with ParseEndpoint() supporting "", "host:port", "http://host:port", "unix://" (auto tmpdir path), and "unix:///explicit/path"
  • Thread DaemonEndpoint through RuntimeInfo, server Start(), ProbeDaemon/IsDaemonAlive/KillDaemon, HTTPClient, all ~20 CLI command files, and the TUI
  • Server creates Unix sockets with 0700 parent dir and 0600 socket permissions; cleanup on shutdown and zombie detection via PID liveness
  • RuntimeInfo JSON gains a network field (backwards compatible — empty defaults to TCP)
  • TCP loopback remains the default; users opt in via server_addr = "unix://" in config or --server unix:// on CLI

🤖 Generated with Claude Code

cpcloud and others added 15 commits March 18, 2026 13:59
Introduces a DaemonEndpoint abstraction to support both TCP loopback
and Unix domain sockets for daemon communication. Motivated by security
hardening (filesystem permissions) and NixOS container isolation via
bind-mounted sockets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add waitForServerReady endpoint-aware signature (critical: loopback
  check would reject Unix sockets and block daemon startup)
- Document all ~15 CLI command files that construct raw http.Client
- Add getDaemonHTTPClient() convenience helper for CLI commands
- Address postcommit.go hookHTTPClient lazy initialization
- Add TUI reconnectMsg struct changes and handleReconnectMsg handler
- Document daemon_cmd.go --addr flag accepting unix:// values
- Document tui_cmd.go --addr flag ParseEndpoint treatment
- Add ParseEndpoint http:// prefix stripping for backwards compat
- Fix CleanupZombieDaemons to use PID-based detection for Unix sockets
- Add container bind-mount note about explicit paths
- Add macOS path length safety net note

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ort flow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9 tasks: DaemonEndpoint type, RuntimeInfo changes, probe/kill
migration, server Start() flow, client update, CLI migration
(atomic), TUI changes, test updates, dead code cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ime signature

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change ProbeDaemon, IsDaemonAlive, waitForServerReady, and KillDaemon to
accept DaemonEndpoint instead of raw address strings. This lets Unix
socket endpoints flow through the same probe/kill paths as TCP, using
ep.HTTPClient() and ep.BaseURL() for transport-aware HTTP requests.

Add isProcessAlive() on Unix (signal 0) and Windows (processExists) for
fast PID-based liveness checks in CleanupZombieDaemons, avoiding slow
HTTP probes against dead Unix sockets. KillDaemon now also cleans up
Unix socket files when removing runtime artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… cleanup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace getDaemonAddr() with getDaemonEndpoint() which returns a
DaemonEndpoint instead of a raw URL string. This enables Unix socket
support throughout the CLI by using the endpoint's HTTPClient() and
BaseURL() methods for all daemon communication.

Key changes:
- getDaemonEndpoint() resolves from runtime file first, then parses
  the --server flag via ParseEndpoint (handles "", "host:port",
  "http://host:port", "unix://path")
- getDaemonHTTPClient(timeout) creates transport-aware HTTP clients
- All bare http.Post() and &http.Client{Timeout: N} replaced
- probeDaemonServerURL removed (ProbeDaemon used directly)
- ensureDaemon/startDaemon no longer mutate the serverAddr global
- --server flag default changed from "http://127.0.0.1:7373" to ""
  (ParseEndpoint treats "" as default TCP endpoint)
- hookHTTPClient changed from var to func for testability
- fix.go recovery functions return getDaemonEndpoint().BaseURL()
  instead of the global serverAddr

3 fix daemon-recovery tests fail because they relied on mutating the
serverAddr global to simulate daemon crashes; these need updates in
the test migration task.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ication

Replace the string-based serverAddr field with a typed DaemonEndpoint in the
TUI model, Config, and reconnectMsg. This enables Unix socket transport by
using endpoint.BaseURL() for URL construction and endpoint.HTTPClient() for
transport-aware HTTP clients. The tui_cmd.go now parses --addr via
daemon.ParseEndpoint instead of manual http:// prefix logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the DaemonEndpoint migration, getDaemonEndpoint() discovers the
daemon via runtime files before falling back to serverAddr. Tests that
simulated daemon death by setting serverAddr to a dead address no longer
worked because the runtime file still pointed at the live mock server.

Fix getDaemonEndpoint() to use the test-overridable getAnyRunningDaemon
var instead of calling daemon.GetAnyRunningDaemon() directly.

Fix the three failing tests by removing daemon runtime files when
simulating death, so getDaemonEndpoint falls back to serverAddr:
- TestFixSingleJobRecoversPostFixDaemonCalls
- TestRunFixOpenRecoversFromDaemonRestartOnRequery
- TestRunFixWithSeenDiscoveryAbortsOnConnectionError

Also add /api/ping to the recovery server in the restart-on-requery test
so ensureDaemon() succeeds after recovery, and save/restore
getAnyRunningDaemon in NewMockDaemon cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
validateDaemonBindAddr had no callers after the DaemonEndpoint migration.
parseDaemonBindAddr is kept because FindAvailablePort still uses it.
getDaemonAddr and probeDaemonServerURL were already removed in prior tasks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Mar 18, 2026

The original stated reason I thought I wanted this (nixos container network isolation) doesn't work for other reasons (the daemon accesses user directories to do git operations).

That said, I put this up anyway in case there's a desire for the daemon to support unix sockets.

However, I don't want to bloat the app with stuff that's not being used, so I am 100% happy to close this out as YAGNI.

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (da904b7)

Summary verdict: The PR successfully introduces transport-aware daemon endpoints and Unix-
socket support, but requires fixes for local-IPC security, endpoint resolution consistency, and resource leaks.

High Severity Findings

  • Location: cmd/roborev/daemon_client.go:22, cmd/roborev/job_helpers.go:19,
    cmd/roborev/stream.go:47-73
    Problem: Call sites split daemon identity between a saved BaseURL() string and a freshly resolved client from getDaemonHTTPClient(...). With Unix sockets, every endpoint uses the same base URL (http://localhost), meaning
    paths can silently hit whichever socket getDaemonEndpoint() returns at request time. This breaks the guarantee that jobs will continue polling the same daemon that received them.
    Fix: Snapshot a single daemon.DaemonEndpoint per operation and derive both the URL and HTTPClient from that same value. Where a function currently
    takes serverAddr string, pass the endpoint itself instead.

  • Location: internal/daemon/server.go:160
    Problem: Unix-socket path hardening is incomplete. Start() calls MkdirAll and Chmod for the socket itself, but never verifies
    or tightens permissions on a pre-existing parent directory. If configured to use a shared directory (e.g., unix:///tmp/roborev.sock), other local users may be able to connect to the socket and drive daemon endpoints with the victim’s privileges.
    Fix: After resolving a Unix
    endpoint, Lstat the parent directory and fail closed unless it is owned by the current UID and has no group/world access bits. For auto-generated paths, explicitly Chmod the created parent directory to 0700; for explicit paths, reject unsafe existing directories rather than silently accepting them
    .

Medium Severity Findings

  • Location: cmd/roborev/daemon_lifecycle.go:67-75
    Problem: getDaemonEndpoint() silently swallows ParseEndpoint errors (such as invalid --server flags, relative Unix paths, or non-loop
    back IPs) and falls back to connecting to 127.0.0.1:7373. This is a regression that causes the CLI to connect to the default daemon instead of failing fast and informing the user.
    Fix: Return the parse error to callers to fail fast, or explicitly
    parse and validate the endpoint once during command setup (e.g., in a PersistentPreRunE hook) and store the validated result.

  • Location: internal/daemon/endpoint.go:55-57
    Problem: ParseEndpoint("unix://") returns DefaultSocketPath() without running the length validation used for explicit Unix paths. On systems with long os.TempDir() values, the code succeeds at parse time but fails later in net.Listen with a less actionable runtime error.
    Fix: Validate the auto-generated socket path before returning it
    , using the same null-byte and max-length checks applied to explicit unix:///... paths.

  • Location: internal/daemon/endpoint.go (HTTPClient method)
    Problem: Creating a new http.Transport for every Unix socket client without DisableKeep Alives: true causes file descriptor leaks. Because the client is often instantiated per-request, idle connections are left lingering in background goroutines until the CLI process exits.
    Fix: Set DisableKeepAlives: true on the custom Unix http.Transport to ensure connections are closed when
    the transient client is garbage collected.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Mar 18, 2026

I understand the need for an alternative to TCP, even if TCP/HTTP is the main route to expose the daemon, so if it can be hardened and not overcomplicate things, it seems reasonable to me (on Windows we might need to support other non-HTTP transport for certain applications, too)

- Snapshot DaemonEndpoint once per operation to prevent split
  endpoint/client resolution (high: all CLI command files)
- Validate Unix socket parent directory permissions (0700, no
  group/world access) in Server.Start() (high)
- Add DisableKeepAlives to Unix socket HTTP transport to prevent
  file descriptor leaks from transient clients (medium)
- Validate auto-generated DefaultSocketPath() length at parse time,
  same as explicit paths (medium)
- Warn on stderr when --server flag has an invalid value instead of
  silently falling back to default (medium)
- Fix 2 test regressions from endpoint snapshotting in fix.go
  (daemon recovery tests need fresh endpoint resolution per call)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (f59c5b4)

Verdict: The PR successfully introduces transport-agnostic daemon endpoints and Unix-socket support, but contains a few medium-severity issues regarding configuration parsing, error handling, and client instantiation.

Medium Severity

  • Location: cmd/roborev/daemon _lifecycle.go:67

    • Problem: getDaemonEndpoint() no longer reads the configured server_addr; it only checks runtime files and then parses the raw --server flag value. With the global flag default now set to "", any code path that needs the configured
      endpoint before a runtime file exists falls back to hard-coded TCP 127.0.0.1:7373, so the new Unix-socket transport is ignored in the exact no-runtime/manual-daemon fallback path this migration was meant to support.
    • Fix: Resolve server _addr from the real config chain here (or once in root command setup) instead of parsing only the flag string.
  • Location: cmd/roborev/daemon_lifecycle.go:72

    • Problem: Invalid endpoint values are only warned about and then silently replaced with the
      default TCP endpoint. A typo in --server or config can therefore make commands talk to the wrong daemon instead of failing fast.
    • Fix: Propagate the parse error to the caller and abort command execution; do not fall back to the default endpoint on invalid user input.
  • Location
    :
    cmd/roborev/job_helpers.go:19

    • Problem: Several follow-up helpers still accept a serverAddr/base URL parameter, but now create their client with getDaemonHTTPClient(), which ignores that argument and dials whatever getDaemonEndpoint () currently resolves. That breaks the “talk to the same daemon” contract for Unix sockets, because the transport is selected by the client, not the URL, and http://localhost is shared by every Unix endpoint.
    • Fix: Thread a daemon.DaemonEndpoint or prebuilt
      *http.Client through these helpers and use that exact endpoint for polling, fetching, and canceling jobs.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

cpcloud and others added 2 commits March 18, 2026 15:20
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use os.Stat instead of os.Lstat for socket directory permission
  check so symlinked directories are evaluated correctly
- Update waitForPromptJob to accept DaemonEndpoint and use
  endpoint-aware HTTP client (fixes Unix socket support for
  roborev run --wait)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (09dc243)

Verdict: The PR implements Unix
domain socket transport but introduces significant security vulnerabilities regarding arbitrary file deletion and SSRF via sockets, alongside functional regressions in endpoint resolution.

High Severity

  • Location: internal/daemon/server.go:177, internal/daemon/endpoint.go:58
    Problem:
    ParseEndpoint accepts any absolute unix:///... path, and daemon startup blindly calls os.Remove(socketPath) before binding. If server_addr is sourced from repo-local config, a malicious shared repo can point it at an arbitrary user-writable path and cause file deletion when the
    daemon starts or restarts. This is a concrete destructive primitive.
    Fix: Only allow Unix socket paths under a roborev-owned directory, or only honor custom socket paths from explicit CLI/global-user config. Before unlinking, use os.Lstat and refuse to remove anything that is
    not an existing Unix socket owned by the current UID.

  • Location: cmd/roborev/daemon_lifecycle.go:67, cmd/roborev/main.go:21
    Problem: getDaemonEndpoint() only looks at a runtime file or the
    raw --server flag value. After changing --server to default to "", client commands no longer read server_addr from config when no runtime file exists, so a configured Unix socket endpoint falls back to 127.0.0.1:7373. That breaks the new transport for
    first-start/manual-start flows.
    Fix: Resolve the endpoint from config when the flag is unset, and have getDaemonEndpoint() use that resolved value instead of only the raw flag/global default.

Medium Severity

  • Location: internal/daemon/endpoint.go: 58, cmd/roborev/daemon_lifecycle.go:67
    Problem: Arbitrary absolute Unix socket paths are treated as trusted daemon endpoints. That widens the transport trust boundary from loopback TCP to any local Unix socket, so attacker-influenced config can redirect roborev
    ’s HTTP requests to privileged local services exposed only over sockets and use roborev as a confused deputy.
    Fix: Restrict accepted Unix socket paths to the default per-user roborev socket location, or require explicit CLI-only opt-in for non-default socket paths. For any custom endpoint
    , perform a positive identity check (/api/ping service/version) before sending mutating requests.

  • Location: cmd/roborev/daemon_lifecycle.go:73
    Problem: An invalid explicit --server value is only warned about and then silently replaced with the
    default TCP endpoint. That can send mutating commands to the wrong daemon instead of failing fast.
    Fix: Treat parse failures for user-supplied --server values as fatal and return the error before making any daemon request.

  • Location: cmd/roborev/analyze.go:37 2, cmd/roborev/analyze.go:475
    Problem: Analysis jobs are enqueued against one resolved endpoint, but --wait re-resolves the daemon with getDaemonEndpoint() before polling. If the runtime entry changes between enqueue and wait, the command can
    poll a different daemon than the one that owns the job.
    Fix: Capture the DaemonEndpoint once before enqueueing and reuse that same endpoint for all follow-up polling/result fetches.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

…omptJob

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (39aa801)

Verdict: The transport migration successfully introduces Unix socket support and preserves security restrictions, but
contains several endpoint resolution regressions that will cause failures or incorrect targeting when using Unix sockets.

High

  • Location: cmd/roborev/analyze.go:369, cmd/roborev/analyze.go:473
    Problem: The analysis wait path reconstruct
    s the endpoint from BaseURL(). That works for TCP, but for Unix sockets BaseURL() is always http://localhost, so reparsing it cannot recover the socket path. roborev analyze --wait will poll the wrong target or fail entirely when the daemon is using a Unix socket.

    Fix: Pass the original daemon.DaemonEndpoint through runAnalysis/runSingleAnalysis/runPerFileAnalysis and hand that directly to waitForPromptJob instead of reparsing the base URL.

  • Location: cmd/roborev/job_helpers. go:19, cmd/roborev/run.go:285, cmd/roborev/fix.go:1244, cmd/roborev/analyze.go:669
    Problem: Multiple helpers still take a serverAddr string, but now create their client with getDaemonHTTPClient(). For Unix transport, the URL is always http://localhost, so the actual socket is chosen only by that global client resolution. That means these helpers can silently talk to a different daemon than the caller intended if the resolved endpoint changes or if
    multiple sockets exist.
    Fix: Change these helpers to accept a daemon.DaemonEndpoint and derive both the http.Client and the request URL from that same endpoint.

Medium

  • Location: cmd/roborev/daemon_lifecycle.go:68

    Problem: getDaemonEndpoint() catches ParseEndpoint failures, prints a warning, and falls back to 127.0.0.1:7373. A mistyped --server value can therefore make commands operate on the default daemon instead of failing fast.

Fix: Validate --server up front and return an error to the command, or make getDaemonEndpoint() return an error instead of silently retargeting.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (f57f175)

Verdict: The PR successfully introduces Unix-socket transport support, but contains a high-severity bug in the analyze --wait command and a medium-severity issue with endpoint parsing error handling.

High

  • Location: cmd/roborev/analyze. go:369-372, 469-472
  • Problem: The runAnalysis functions store getDaemonEndpoint().BaseURL(), which resolves to "http://localhost" for Unix socket transports. Reparising this string before the --wait phase drops the socket path
    , causing the client to incorrectly dial localhost:80 via TCP instead of the Unix socket. This breaks roborev analyze --wait against Unix-socket daemons.
  • Fix: Thread the original daemon.DaemonEndpoint through the analyze helpers and pass it directly to waitForPrompt Job instead of converting to and reparsing a base URL, or simply call getDaemonEndpoint() inside the wait blocks.

Medium

  • Location: cmd/roborev/daemon_lifecycle.go:67
  • Problem: getDaemonEndpoint() swallows ParseEndpoint errors, prints a warning, and silently falls back to the default 127.0.0.1:7373. A malformed --server or invalid server_addr can cause mutating commands to hit the default daemon instead of failing fast, representing a correctness regression.
  • **
    Fix:** Validate the endpoint once during CLI startup and return the parse error to the caller. Only use the default endpoint when the configured value is actually empty.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

…ag early

- analyze.go: pass DaemonEndpoint instead of base URL string through
  runSingleAnalysis, runPerFileAnalysis, enqueueAnalysisJob,
  runAnalyzeAndFix, waitForAnalysisJob — fixes Unix socket support
  for analyze --wait (base URL "http://localhost" can't round-trip)
- main.go: add PersistentPreRunE to validate --server flag at startup
  so invalid values fail fast instead of silently falling back to default
- daemon_lifecycle.go: cache validated endpoint from flag, remove
  warning-and-fallback pattern

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (a6af201)

Verdict: This PR successfully introduces Unix socket support via a transport-agnostic DaemonEndpoint, but contains a medium-severity issue regarding proxy
configurations.

Medium

  • Location: internal/daemon/endpoint.go (DaemonEndpoint.HTTPClient)
  • Problem: The new Unix-socket client path appears to rely on the default http.Transport, which means ProxyFromEnvironment is still active. If the synthetic base URL host
    is something like http://unix, users with HTTP_PROXY/HTTPS_PROXY set will send daemon requests to the proxy instead of the local socket, breaking CLI/TUI communication in proxied environments.
  • Fix: For Unix-socket endpoints, use a transport with Proxy = nil (or otherwise force proxy bypass for the synthetic host) and add a regression test that sets HTTP_PROXY while exercising socket transport.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Replaces http.DefaultClient with getDaemonHTTPClient so the fix.go
retry infrastructure works with Unix socket transports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…test

The custom http.Transport already has nil Proxy (zero value), but
make it explicit and add a regression test that verifies HTTP_PROXY
env vars don't interfere with Unix socket communication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (aa4380b)

Verdict: The Unix-socket migration is mostly complete, but a few core CLI paths still require updates to properly support Unix sockets.

High Severity

  • Location: cmd/roborev/fix.go

    • Problem:
      The Unix-socket migration is incomplete in the fix flow. fetchJob and fetchReview still construct requests with a plain http.Client against a string base URL instead of using the endpoint-aware transport. Consequently, roborev fix will fail when the daemon is bound to a Unix socket.

    • Fix: Thread daemon.DaemonEndpoint through these helpers and execute their requests with endpoint.HTTPClient() against the endpoint’s transport-specific base URL. Add an end-to-end Unix-socket test that covers the normal fix path.

  • Location:
    cmd/roborev/main.go

    • Problem: Daemon discovery and startup code still derives http://<addr> from runtime info and probes/registers the daemon with default HTTP clients. This remains TCP-specific, meaning version checks, auto-restart, and repo registration can break once the runtime advertises
      a Unix-socket endpoint.
    • Fix: Convert the remaining startup/discovery helpers to use RuntimeInfo.Endpoint() / ParseEndpoint(...) and send requests through the endpoint-aware client instead of assuming an HTTP host:port address. Add coverage for daemon auto-start and repo registration over a
      Unix socket.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Mar 18, 2026

Re: Combined Review (aa4380b)

fix.go fetchJob/fetchReview: These already use getDaemonHTTPClient(30 * time.Second) which calls getDaemonEndpoint().HTTPClient(30s) — the transport IS endpoint-aware. The serverAddr string parameter is only used for URL construction (path component), not for transport selection. The Unix socket dialer is on the http.Client, not derived from the URL host. Additionally, doFixDaemonRequest was updated in 69704de to use getDaemonHTTPClient instead of http.DefaultClient.

Daemon startup/discovery: ensureDaemon already uses ProbeDaemon(info.Endpoint(), ...) (updated in 885860b, Task 3). registerRepo was the one remaining split pattern and is fixed in d3b52c9.

cpcloud and others added 3 commits March 18, 2026 16:26
Unix domain socket paths use forward-slash absolute paths which
filepath.IsAbs rejects on Windows, and os.Getuid is unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Protect Server.endpoint with a mutex to prevent a race between
  Start() (write) and Stop() (read from signal handler goroutine)
- Skip Unix socket parse/path tests on Windows where Unix-style
  absolute paths are invalid and os.Getuid is unavailable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move Unix socket cleanup to after httpServer.Shutdown() completes,
which guarantees Start() has finished writing s.endpoint. No mutex
needed — the happens-before relationship is established by Shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (c64cd67)

Verdict: Review identified a high-severity issue with diff size limits preventing full analysis, and a medium-
severity bug in daemon auto-start endpoint propagation; security review passed.

High

  • Location: All files in commit range b1bb2f03..c64cd67
    • Problem: The diff content was replaced with a placeholder "(Diff too large to include...)", making
      static analysis impossible.
    • Fix: Review smaller batches of commits (e.g., 1-5 at a time) or increase the diff size limits for the roborev tool to allow the full context to be analyzed.

Medium

  • Location: cmd/robore v/daemon_lifecycle.go (startDaemon / ensureDaemon auto-start path)
    • Problem: The CLI now accepts non-default daemon endpoints, including Unix sockets, but the auto-start path still spawns the daemon as roborev daemon run without forwarding the
      already-parsed --server value. If a command is run against --server unix:///... (or a custom TCP address) while no daemon is running, it will start a daemon on the default/configured endpoint instead of the requested one, so the new transport selection silently does not work in the most common cold
      -start case.
    • Fix: Propagate the effective endpoint when spawning the daemon, e.g. by invoking roborev --server <endpoint> daemon run (or equivalent explicit config/env handoff), and add a regression test that auto-starts against a custom Unix socket endpoint.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (b3bb416)

Verdict: Clean. All agents agree the code is free of issues.

Summary:
The changes successfully implement transport-agnostic daemon endpoint handling, adding Unix socket support alongside HTTP across the daemon, CLI
, and TUI. The implementation correctly handles endpoint propagation, socket creation/cleanup, transport-specific HTTP clients, proxy bypasses, and race condition fixes without introducing any security vulnerabilities or structural issues.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The ordering-based approach (socket cleanup after Shutdown) is not
formally correct under the Go memory model when Stop() races with
early Start() execution. The mutex is a single lock/unlock pair in
a shutdown path — negligible overhead, clear correctness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (4293940)

Verdict: Clean

All agents agree the code is clean. No medium, high, or critical severity issues were found in the implementation of transport-agnostic endpoints and Unix socket support.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Mar 18, 2026

Hey! @wesm asked me (Claude) to do a holistic review of this branch focused on functionality and ergonomics rather than code quality. The DaemonEndpoint abstraction is well-designed and the threading through the codebase is thorough. A few concerns worth discussing:


1. --server flag is silently ignored when a daemon is running

getDaemonEndpoint() always prefers a discovered running daemon over the explicit --server flag:

func getDaemonEndpoint() daemon.DaemonEndpoint {
    if info, err := getAnyRunningDaemon(); err == nil {
        return info.Endpoint()   // ← always wins
    }
    if parsedServerEndpoint != nil {
        return *parsedServerEndpoint
    }
    ...
}

If someone runs roborev status --server unix:///my/socket but a TCP daemon happens to be running, the flag is silently ignored. With Unix sockets in the mix, it becomes more plausible that a user has multiple daemons or wants to explicitly target one. An explicit --server probably should take precedence over auto-discovery.

2. fixSingleJob re-resolves endpoint on every API call

Each call to fetchJob, fetchReview, markJobClosed, etc. independently calls getDaemonEndpoint().BaseURL(). The comment says this is for daemon recovery, but it means a single fix operation could talk to different daemon instances if the daemon restarts between calls — fetching a job from one daemon and its review from another, with different databases.

runFixBatch already does the right thing by capturing the endpoint once:

batchAddr := getDaemonEndpoint().BaseURL()

fixSingleJob should do the same, with an explicit re-resolve only when a connection error triggers daemon recovery.

3. Socket directory permission check follows symlinks

In server.go, the permission check uses os.Stat (follows symlinks) rather than os.Lstat:

if fi, err := os.Stat(socketDir); err == nil {
    if perm := fi.Mode().Perm(); perm&0077 != 0 {

If the socket directory were a symlink to a world-readable directory, Stat would check the target's permissions, not the link itself. The practical attack window is narrow since MkdirAll was just called, but Lstat would be more defensive.

4. parsedServerEndpoint global state is fragile for tests

parsedServerEndpoint is a package-level *DaemonEndpoint set by PersistentPreRunE and read by getDaemonEndpoint(). Tests that set serverAddr directly without calling validateServerFlag() could see stale values from a prior test. The test helpers handle this by setting parsedServerEndpoint directly, which works but is easy to forget.

5. Minor: new HTTP client per post-commit hook call

hookHTTPClient changed from a reused *http.Client to a function that creates a fresh client (and transport) each time. Fine for post-commit hooks (one call per commit), but worth noting if this pattern gets adopted in hotter paths.

6. Minor: DaemonEndpoint compared with !=

In handleReconnectMsg:

if msg.err == nil && msg.endpoint != m.endpoint {

Works today because both fields are strings, but adding a non-comparable field later would be a compile error. A DaemonEndpoint.Equal() method would be more future-proof.


Overall this looks good — the core design is solid, backward compatibility is handled well (empty Network defaults to TCP), and the Unix socket security model (0700 dir, 0600 socket, permission validation) is appropriate. Items 1 and 2 are the ones I'd want addressed before merging.

1. Explicit --server flag now takes precedence over auto-discovered
   daemons. Only falls back to runtime file discovery when no flag
   is set. This matters when targeting a specific daemon instance.

2. fixSingleJob snapshots the daemon address once per operation
   instead of re-resolving on every API call. The retry helpers
   (withFixDaemonRetryContext) handle re-resolution internally
   on connection errors. Tests updated to simulate daemon death
   with connection-level failures (hijack+close) rather than 404s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Mar 18, 2026

Addressed in 6e6904b:

#1 (--server flag precedence): Fixed. getDaemonEndpoint() now checks serverAddr != "" first — an explicit --server flag takes precedence over auto-discovered daemons. Only falls back to runtime file discovery when no flag is set.

#2 (fixSingleJob re-resolution): Fixed. Snapshots the daemon address once at the top. The withFixDaemonRetryContext helpers handle re-resolution internally on connection errors. Tests updated to simulate daemon death with connection-level failures (hijack+close) instead of HTTP 404s, which properly exercises the retry path.

#3 (Stat vs Lstat): Keeping os.Stat (follows symlinks). Lstat would always reject symlinked directories on Linux because symlink permissions are 0777 regardless of the target. Stat checks the actual target directory's permissions, which is the correct behavior — if the target is 0700, it's safe.

#4 (parsedServerEndpoint global): Acknowledged. The test infrastructure (patchServerAddr) already handles this, but it's worth noting for future test authors.

#5 (hookHTTPClient): Acknowledged as minor. One client per commit is fine.

#6 (DaemonEndpoint.Equal): YAGNI for now. Both fields are strings; != is correct and we'd get a compile error if a non-comparable field were added.

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (6e6904b)

Verdict: Code review could not be completed because the combined diff exceeds inclusion limits and was omitted.

High

  • Global
    • Problem: The provided diff is empty or replaced with a truncation message ("Diff too large to include - please review the commits directly"), making code review impossible.
    • Fix: Reduce the commit range or provide the diff in smaller, reviewable segments so the code changes
      can be analyzed.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Mar 18, 2026

I think we're good here, thanks @cpcloud

@wesm wesm merged commit 406e1fa into roborev-dev:main Mar 18, 2026
7 of 8 checks passed
@cpcloud cpcloud deleted the worktree-witty-dancing-crystal branch March 18, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants