feat(routing): add hostname-based service routing for agents by dpup · Pull Request #1 · majorcontext/moat

dpup · 2026-01-13T01:17:26Z

Add hostname-based service routing for agents, enabling predictable URLs like http://web.myapp.localhost:8080 for local development with OAuth flows, webhooks, and multi-service architectures.

Key features:

Agent naming: Agents get names via --name flag, agent.yaml, or auto-generated (e.g., fluffy-chicken)
Port exposure: Configure ports: in agent.yaml to expose container services
Hostname routing: Access services at http://..localhost:
Standalone proxy: New agent proxy start/stop/status commands for privileged port binding (e.g., sudo agent proxy start --port=80)
Environment injection: Containers receive AGENTOPS_URL_* vars for OAuth callback configuration

CLI improvements:

Simplified agent run [path] syntax (removed redundant positional arg)
New --name flag to override agent name
Enhanced agent list output showing services

Example usage:

  # agent.yaml
  name: myapp
  runtime:
    node: 20
  ports:
    web: 3000
    api: 8080

  $ agent run ./project
  Started agent "myapp" (run run-a1b2c3d4)
  Services:
    web: http://web.myapp.localhost:8080 (container :3000)
    api: http://api.myapp.localhost:8080 (container :8080)

  $ agent run --name=otherbranch ./project
  Started agent "otherbranch" (run run-a1b2c3d4)
  Services:
    web: http://web.otherbranch.localhost:8080 (container :3000)
    api: http://api.otherbranch.localhost:8080 (container :8080)

Design for exposing container services via hostname routing so multiple agent instances can run with predictable URLs (e.g., web.myapp.localhost). Key features: - Agent naming: --name flag, agent.yaml, or random adjective-animal - Port config via agent.yaml ports map - Shared reverse proxy with automatic lifecycle - Environment variable injection for OAuth-friendly URLs

Detailed step-by-step plan with TDD approach for implementing: - Random name generation (adjective-animal) - Global config for proxy port - Port binding for Docker and Apple containers - Route table and reverse proxy - Proxy lifecycle management - CLI integration with --name flag - Environment variable injection

Add port binding support to Docker runtime: - Update CreateContainer to configure exposed ports and port bindings - Add GetPortBindings method to retrieve actual assigned host ports - Add stub GetPortBindings to Apple runtime for interface compliance Port bindings let Docker assign random host ports for container ports, which is used for hostname routing where DNS lookup returns the host IP and HTTP traffic is routed through the port binding to the proxy.

Remove agent routes and runsByName index entry when destroying a run. This ensures the route table stays clean and agent names can be reused after a run is destroyed.

Add Lifecycle type that manages the shared reverse proxy: - EnsureRunning: starts proxy if not running, reuses if already running - Port mismatch detection: errors if requested port differs from running proxy - ShouldStop: checks if proxy can be stopped (no more agents) - Proper lock file cleanup on stop

- Add proxyLifecycle field to Manager struct - Start routing proxy automatically when first agent with ports starts - Stop routing proxy when last agent with ports is destroyed - Use global config for proxy port setting

- run: Show agent name, run ID, and service URLs after start - list: Show NAME, RUN ID, STATE, SERVICES columns instead of ID/AGENT/STATE/CREATED

Add integration test that verifies: - Proxy lifecycle start/stop - Route registration for multiple services - Host header routing to correct backend - Default service routing - Unknown agent 404 response - Route removal and cleanup Also fix lifecycle port handling: - Port 0 now means "use random available port" - Port < 0 means "use default (8080)" - Port mismatch check only triggers when specific port requested

- Add "Hostname-Based Service Routing" section - Document agent naming (--name flag, config, random) - Document ports configuration in agent.yaml - Document proxy URL structure and environment variables - Document global proxy configuration - Add --name flag to `agent run` command documentation

Two simple Python HTTP servers demonstrating hostname-based routing: - web_server.py: responds "Hello from port 3000" - api_server.py: responds "Hello from port 8080" - agent.yaml: exposes both as web.demo.localhost and api.demo.localhost

The proxy runs in a separate process from agents. When a new agent registers routes by writing to routes.json, the proxy's in-memory RouteTable was stale. Now Lookup, LookupDefault, AgentExists, and Agents reload from disk to pick up changes from other processes.

The <agent> positional arg was redundant with the name field in agent.yaml and the --name flag. Now the CLI is simpler: agent run [path] [-- command] The agent name comes from: 1. --name flag (highest priority) 2. name field in agent.yaml 3. Random generated name (e.g., fluffy-chicken) Also removed the Agent field from Run struct and Options since only Name is needed for identification and routing.

Add `agent proxy start/stop/status` commands to run the routing proxy as a standalone daemon, enabling privileged port binding: sudo agent proxy start --port=80 agent run examples/multi-service -- bash /workspace/start.sh Fix port binding issues: - Use bridge network mode when ports need publishing (host mode ignores port bindings) - Bind to 0.0.0.0 so Docker properly assigns random host ports - Clean up routes when agents exit via Stop() or Wait() Fix typo in example start.sh script.

- Remove deprecated rand.Seed() call (Go 1.20+ auto-seeds) - Remove unused runsByName field from Manager - Consolidate CLI output (remove redundant fmt.Printf calls)

- Rename Agent field to Name in run.Options struct usage in E2E tests - Fix gofmt formatting issue in internal/run/run.go

Addresses code review issue #1: BuildKit sidecar lifecycle management gap. When Destroy() is called, the BuildKit sidecar container and network were not being cleaned up, causing resource leaks. Changes: - Remove BuildKit sidecar container in Manager.Destroy() if present - Remove BuildKit network in Manager.Destroy() if present - Both operations are best-effort with appropriate error handling - Network removal uses debug logging (may already be removed) Prevents resource leaks when runs with docker:dind are destroyed.

* feat(deps): add docker dependency for Docker access inside containers Add support for running Docker commands inside moat containers via two modes: **docker:host** - Mounts the host Docker socket - Fast startup, shared image cache with host - Full access to host Docker daemon - Use when you trust the agent and want speed **docker:dind** - Runs isolated Docker daemon inside container - Complete isolation from host Docker - Requires privileged mode (set automatically) - Slower startup (~5-10s for daemon init) - Use for untrusted code or when isolation is required Key implementation details: - DockerMode type with host/dind variants (explicit mode required) - moat-init script handles setup for both modes: - Host: detects socket GID inside container, adds moatuser to group - Dind: starts dockerd, waits for readiness, adds moatuser to docker group - Docker CLI installed from official Docker repo (docker-ce-cli) - Dind mode also installs docker-ce, containerd.io, docker-buildx-plugin - BuildKit disabled in dind mode (DOCKER_BUILDKIT=0) to avoid session issues - Image tags include DockerMode to prevent cache collisions Example usage in agent.yaml: ```yaml dependencies: - docker:host # or docker:dind ``` * docs(plans): add gVisor integration design * docs(plans): add gVisor implementation plan * feat(container): add GVisorAvailable detection function * feat(container): add sandbox parameter to NewDockerRuntime * feat(container): pass OCI runtime to container creation * feat(container): add RuntimeOptions with sandbox support * feat(cli): add --no-sandbox flag * feat(run): wire NoSandbox through run manager * feat(config): add sandbox field to agent.yaml * feat(cli): honor sandbox setting from agent.yaml * docs: document --no-sandbox flag and sandbox config * fix(container): make gVisor install instructions copy-pasteable Consolidate the Linux installation steps into a single command that users can copy and run directly. Also adds -y flag to apt install and includes systemctl reload step. * docs(plans): add buildkit sidecar design Add design document for automatic BuildKit sidecar with docker:dind mode. Key points: - BuildKit sidecar automatically deployed with docker:dind - Shared Docker network for container communication - Fast builds (buildkit) + full Docker daemon (runtime ops) - No user configuration required Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(cli): admin commands don't require gVisor sandbox Commands like 'moat status', 'moat clean', 'moat system images', and 'moat system containers' only query/manage existing resources - they don't create new containers that need sandboxing. These commands now explicitly use Sandbox: false to avoid requiring gVisor when it's not needed. * fix(container): address PR review feedback - Deduplicate gVisor detection: NewDockerRuntime now uses GVisorAvailable() - Use Docker's default runtime when sandbox disabled (empty string vs 'runc') - Add config validation for sandbox field (only '' or 'none' allowed) - Add timeout to TestGVisorAvailable to prevent CI hangs - Add tests for sandbox config validation * fix(e2e): disable sandbox for E2E tests E2E tests verify moat functionality, not gVisor specifically. Using NoSandbox: true allows tests to run in CI without gVisor. * feat(container): add network create/remove methods * fix(container): ensure RemoveNetwork properly ignores not-found errors - Update RemoveNetwork to match RemoveContainer pattern by checking errdefs.IsNotFound - Add test case TestDockerRuntime_RemoveNetwork_NotFound to verify best-effort behavior - Add comment explaining bridge driver choice in CreateNetwork This ensures the documentation matches actual behavior: RemoveNetwork is truly best-effort and won't fail when the network doesn't exist. * feat(container): add sidecar start method * feat(storage): add buildkit sidecar metadata fields * feat(run): add buildkit configuration logic * feat(container): add network create/remove methods * wip: integrate buildkit lifecycle (partial) * feat(container): add sidecar start method * feat(run): integrate buildkit sidecar in create lifecycle * feat(container): add inspect container helper * feat(run): cleanup buildkit sidecar and network on stop * feat(audit): add buildkit sidecar audit events * docs(reference): document buildkit sidecar for docker:dind * fix(container): make RemoveNetwork truly best-effort * fix: add missing InspectContainer method after merge * docs(plans): add buildkit client integration plan Detailed plan to fix image building by using BuildKit Go client instead of Docker SDK when BuildKit sidecar is active. Key points: - Use github.com/moby/buildkit/client for proper BuildKit support - Automatic fallback to Docker SDK when BuildKit not available - Full feature support: cache mounts, multi-platform, progress streaming - Backward compatible with existing docker:host configurations * feat(deps): add buildkit client library * feat(buildkit): add client wrapper for image builds * feat(container): use buildkit client for image builds when available * feat(buildkit): improve error messages and logging Enhance error handling and observability for BuildKit integration: Error Messages: - BuildKit connection failures now suggest checking docker:dind sidecar status and BUILDKIT_HOST configuration - Build failures include context about Dockerfile syntax and build context path - Ping failures provide actionable suggestions for network troubleshooting Debug Logging: - Log routing decision (BuildKit vs Docker SDK) with relevant parameters - Log BuildKit connection address during client creation - Log build options (tag, platform, no_cache, context_dir) when starting build - Log successful operations (ping, build completion) for debugging - Log errors with tag context for easier troubleshooting All logging follows existing patterns from internal/log package using structured key-value pairs at appropriate levels (Debug/Error). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * test(buildkit): add integration tests for client Add comprehensive integration and E2E tests for BuildKit integration: **Integration tests (internal/buildkit/client_test.go):** - TestClient_Build: Basic image build with BuildKit - TestClient_BuildWithArgs: Build with build arguments - TestClient_BuildWithInvalidDockerfile: Error handling for invalid Dockerfile - TestClient_BuildWithUnreachableBuildKit: Error handling for unreachable BuildKit All integration tests skip gracefully when BUILDKIT_HOST is not set, following the pattern established by TestClient_Ping. **E2E tests (internal/e2e/docker_test.go):** - TestDockerDindBuildKitImageBuild: Full end-to-end test that: - Verifies BUILDKIT_HOST is set correctly in docker:dind containers - Builds an image using BuildKit - Verifies the built image exists and can be run - Tests the complete BuildKit integration flow **Test fixes:** - Fixed TestDockerRuntime_BuildImage_PathSelection to handle BuildKit connection errors correctly All tests pass locally and skip appropriately when dependencies (BuildKit, Docker) are not available. * docs(buildkit): document client integration and troubleshooting Add BuildKit client usage documentation to agent.yaml reference: - Explain how BuildKit integration works with docker:dind - Document fallback behavior when BuildKit is unavailable - Add troubleshooting section for common error messages - Clarify BuildKit vs Docker SDK usage patterns * fix(buildkit): add privileged mode and proxy bypass for sidecar Fixes two critical issues preventing BuildKit from working: 1. **Proxy bypass for BuildKit connections**: Added 'buildkit' hostname to NO_PROXY list. Without this, BuildKit gRPC connections were being routed through the HTTP proxy, causing "error reading server preface: EOF" errors because the proxy doesn't understand gRPC protocol. 2. **Privileged mode for BuildKit sidecar**: BuildKit requires privileged mode to perform bind mounts for its snapshot operations. Without this, builds fail with "operation not permitted" when BuildKit tries to mount its internal filesystem snapshots. Changes: - internal/run/manager.go: Add 'buildkit' to NO_PROXY list - internal/container/runtime.go: Add Privileged field to SidecarConfig - internal/container/docker.go: Honor Privileged field in StartSidecar - internal/run/manager.go: Set Privileged=true for BuildKit sidecar These fixes allow BuildKit to communicate directly without proxy interference and have the necessary permissions to manage build contexts. * fix(cli): show 'Initializing...' progress indicator on startup Users reported no visible feedback when running 'moat run'. Now shows 'Initializing...' immediately while the runtime, proxy, and container are being set up. * fix(buildkit): mount docker socket for image export Fixes image export failure in docker:dind mode. BuildKit sidecar now mounts the dind Docker socket to export built images to the daemon. Problem: - BuildKit would build images successfully but couldn't export them - Docker daemon would try to pull "moat/run:xxx" from Docker Hub - Error: "pull access denied for moat/run, repository does not exist" Root cause: - BuildKit's "image" exporter requires access to Docker API to export - Without socket access, images stayed in BuildKit's cache - Docker daemon had no way to access the built images Solution: - Mount /var/run/docker.sock from dind into BuildKit sidecar - This is the dind container's socket, NOT the host's socket - Maintains isolation: BuildKit can only affect dind daemon, not host - Allows BuildKit to export images directly to dind daemon Security note: - Socket mounting only affects docker:dind mode (already privileged) - BuildKit mounts the isolated dind socket, not host socket - docker:host mode remains unchanged (mounts host socket directly) - dind provides better isolation: cannot access host filesystem Changes: - internal/container/runtime.go: Add Mounts field to SidecarConfig - internal/container/docker.go: Process mounts in StartSidecar - internal/run/manager.go: Mount docker socket to BuildKit sidecar * fix(buildkit): use docker exporter instead of image exporter The 'image' exporter writes to containerd image store, not Docker daemon. Use 'docker' exporter which writes directly to Docker daemon via socket. This ensures images built by BuildKit are immediately available to the Docker daemon for running containers, fixing 'pull access denied' errors. * fix(buildkit): mount /tmp for build context access BuildKit sidecar needs access to the same /tmp directory where build contexts are created. Without this, BuildKit cannot read the Dockerfile and context files, resulting in 'method not supported' errors. The build flow: 1. Main container creates temp dir: /tmp/moat-build-* 2. Main container writes Dockerfile to temp dir 3. Main container calls BuildKit with path to temp dir 4. BuildKit reads files from temp dir (needs access!) Both containers now share /tmp, allowing BuildKit to access build contexts created by the main container. * fix(buildkit): use session-based file sync for build context Replace deprecated LocalDirs with proper session-based file sync. This fixes the 'method not supported by the client' error. The issue: - LocalDirs is deprecated and doesn't work reliably - BuildKit couldn't access build context via LocalDirs - Error: 'method /moby.filesync.v1.FileSend/diffcopy not supported' The solution: - Create a BuildKit session for each build - Attach filesync.NewFSSyncProvider with StaticDirSource - Upload build context from client to BuildKit via session protocol - This is the modern, supported way to provide context to BuildKit The build context is now properly uploaded to BuildKit over the session protocol, regardless of filesystem boundaries between containers. * fix(container): default to legacy builder for docker:host reliability Changes Docker SDK build behavior to use legacy builder (BuilderV1) by default instead of BuilderBuildKit. This fixes 'no active sessions' errors on Docker Desktop and other environments where Docker SDK's BuildKit integration is unreliable. The problem: - Docker SDK's BuildKit integration (BuilderBuildKit flag) is buggy - Causes 'no active sessions' errors with buildx and Docker Desktop - Users had to set MOAT_DISABLE_BUILDKIT=1 workaround The solution: - docker:dind mode: Uses our BuildKit client (fast, reliable) - docker:host mode: Uses legacy builder (slower, but works everywhere) - Users can opt-in to Docker SDK's BuildKit via DOCKER_BUILDKIT=1 This ensures moat works out-of-the-box on all platforms while still providing the fast BuildKit path for docker:dind users. Future work: Detect and connect to buildx's BuildKit for docker:host mode to get BuildKit benefits without docker:dind overhead. * revert: restore BuildKit as default for Docker SDK Reverts the previous change that made legacy builder the default. BuildKit should be the default for better performance - the 'no active sessions' issue was environment-specific and can be worked around with MOAT_DISABLE_BUILDKIT=1. This restores the original behavior where: - Default: Uses BuildKit (BuilderBuildKit) for fast builds - Opt-out: Set MOAT_DISABLE_BUILDKIT=1 for legacy builder The previous commit incorrectly assumed all Docker Desktop setups have broken BuildKit, when it's actually working for most users. * fix(buildkit): resolve filesync error with docker exporter Fixes "method /moby.filesync.v1.FileSend/diffcopy not supported by the client" error that occurred during BuildKit image export phase. Root cause: The docker exporter requires an Output function to handle the tar stream. Manual session management created race conditions and complexity. Solution: - Use LocalMounts for build context (BuildKit auto-manages session) - Provide Output function that pipes tar stream to `docker load` - Remove manual session lifecycle management - Stream image directly without intermediate files This simplifies the BuildKit integration while fixing nested moat execution (moat-in-moat) and following BuildKit's recommended patterns. Tested: All E2E tests pass including TestDockerDindBuildKitImageBuild and all dependency runtime tests. * chore: Fixups * docs: add Docker dependency and BuildKit sidecar documentation Updates conceptual documentation to explain docker:host and docker:dind dependency modes, their security implications, and BuildKit sidecar integration. Changes: - concepts/01-sandboxing.md: Add Docker access modes section explaining security tradeoffs between docker:host and docker:dind - concepts/06-dependencies.md: Add comprehensive Docker dependencies section with mode comparison table and BuildKit sidecar details - getting-started/02-installation.md: Fix misleading BuildKit installation note (BuildKit sidecar is automatic with docker:dind, no manual install needed) The reference documentation (agent-yaml.md) already has comprehensive technical details. These updates provide conceptual understanding and decision-making guidance for users. * fix(run): add BuildKit sidecar cleanup in Destroy Addresses code review issue #1: BuildKit sidecar lifecycle management gap. When Destroy() is called, the BuildKit sidecar container and network were not being cleaned up, causing resource leaks. Changes: - Remove BuildKit sidecar container in Manager.Destroy() if present - Remove BuildKit network in Manager.Destroy() if present - Both operations are best-effort with appropriate error handling - Network removal uses debug logging (may already be removed) Prevents resource leaks when runs with docker:dind are destroyed. * fix: address linter errors in BuildKit integration - Check error returns for cleanup operations - Remove unused types import - Fix variable shadowing in buildImageWithBuildKit - Pre-allocate mounts slice capacity - Replace deprecated types.ContainerJSON with container.InspectResponse Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * test(e2e): skip nested dind tests in CI environments Add skipIfNestedDind() helper to detect environments where nested dind is unreliable. Nested dind (dind inside dind) is not a supported configuration. Detection strategy (checked in order): 1. GITHUB_ACTIONS=true - GitHub Actions always uses dind 2. /.dockerenv + /var/run/docker.sock - we're in a container with docker 3. CI=true + docker info shows dind indicators - other CI systems This fixes test failures in CI where dind tests fail with "dial unix /var/run/docker.sock: connect: no such file or directory" because the inner dockerd can't start properly in nested dind setups. The tests will still run in local development environments where nested dind restrictions don't apply. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs: add runtime separation design Design for improving separation between Docker and Apple container runtimes using the Feature Manager pattern. Problem: Current Runtime interface mixes universal operations with Docker-only features, forcing type assertions and stub implementations. Solution: Split Runtime into minimal core + optional feature managers (NetworkManager, SidecarManager) that Docker provides and Apple returns nil for. Manager code checks for nil and fails gracefully. Benefits: - No type assertions - No stub implementations - Clear feature boundaries - Type safety - Easy to extend with new Docker features Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs: add runtime separation implementation plan Detailed 14-task implementation plan for migrating Runtime interface to Feature Manager pattern. Migration strategy: - Phase 1: Add feature managers alongside existing methods - Phase 2: Migrate manager.go to use feature managers - Phase 3: Remove old methods from Runtime interface Each task is a 2-5 minute action with exact code, file paths, test commands, and expected output. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat(container): add NetworkManager and SidecarManager interfaces * feat(container): add Docker feature manager implementations - Add dockerNetworkManager and dockerSidecarManager structs - Move CreateNetwork, RemoveNetwork to dockerNetworkManager - Move StartSidecar, InspectContainer to dockerSidecarManager - Add NetworkManager() and SidecarManager() accessors to DockerRuntime - Keep old methods on DockerRuntime as delegates for backward compatibility - Add nil managers to AppleRuntime (returns nil for both) - InspectContainer on manager returns common ContainerInspectResponse type * test(container): add feature manager detection tests * refactor(run): migrate BuildKit network creation to feature manager * refactor(run): migrate remaining type assertions to feature managers Migrates all remaining DockerRuntime type assertions in manager.go to use feature managers (SidecarManager and NetworkManager). This continues the runtime separation pattern established in previous commits. Migrated operations: - StartSidecar (BuildKit sidecar start) → SidecarManager - InspectContainer (BuildKit ready check) → SidecarManager - RemoveNetwork (5 locations across Create, Stop, Destroy) → NetworkManager All cleanup paths preserved with proper nil checks for runtime compatibility. Error messages now indicate when operations require Docker runtime. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor(container): remove delegating methods from DockerRuntime Remove old delegate methods that are no longer needed after migration to feature managers: - CreateNetwork/RemoveNetwork from DockerRuntime (use NetworkManager) - StartSidecar from DockerRuntime (use SidecarManager) - InspectContainer from DockerRuntime (use SidecarManager) - CreateNetwork/RemoveNetwork/StartSidecar from AppleRuntime stubs - CreateNetwork/RemoveNetwork/StartSidecar from Runtime interface All callers now use the feature manager pattern via NetworkManager() and SidecarManager() accessors. Also updated docker_test.go to call feature managers directly instead of the old delegate methods. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat(container): add BuildManager interface * feat(container): add Docker BuildManager implementation - Add dockerBuildManager struct implementing BuildManager interface - Move BuildImage, ImageExists, GetImageHomeDir to dockerBuildManager - Keep temporary delegate methods on DockerRuntime for compatibility - Add stub BuildManager method to AppleRuntime (implemented in Task 3) - All build-related helper methods moved to manager * feat(container): add Apple BuildManager implementation Adds appleBuildManager struct to isolate build functionality: - BuildImage (including fixBuilderDNS helper) - ImageExists - GetImageHomeDir The AppleRuntime.BuildManager() accessor now returns the manager. Temporary delegate methods remain on AppleRuntime for compatibility (will be removed in Task 5). * refactor(run): migrate BuildImage to BuildManager pattern * refactor(container): remove build methods from Runtime interface All callers now use BuildManager() to access build functionality. Removed BuildImage, ImageExists, and GetImageHomeDir from Runtime interface and their delegate implementations in DockerRuntime and AppleRuntime. Updated docker_test.go to call BuildManager().BuildImage() instead of calling the removed Runtime.BuildImage() method. Fixed ensureImage helper to use buildMgr.ImageExists() directly. * refactor(run): migrate ImageExists and GetImageHomeDir to BuildManager * refactor(container): extract user-facing messages to output package Extract common user-facing messages (image pulling, building) into a shared output package for consistency across Docker and Apple runtimes. Benefits: - Consistent messaging across both runtimes - Single source of truth for display format - Easy to add progress bars, colors uniformly later - Reduces duplication without coupling core logic Changed messages: - PullingImage: Used in 5 locations (3 Docker, 2 Apple) - BuildingImage: Used in 2 locations (1 Docker, 1 Apple) * docs(concepts): add comprehensive container runtimes guide Created docs/content/concepts/07-runtimes.md covering: - Runtime detection and selection (Docker vs Apple containers) - gVisor sandbox modes (runsc vs runc) - --no-sandbox flag usage and security implications - Proxy security models per runtime - BuildKit and Docker-in-Docker integration - Platform-specific guidance - Troubleshooting common issues Updated docs/content/concepts/01-sandboxing.md: - Simplified runtime overview section - Added reference to new runtimes page - Maintained focus on isolation concepts Follows STYLE-GUIDE.md: - Objective tone (avoids marketing language) - Factual claims (specific mechanisms) - Practical examples (commands with output) - Clear cross-references * docs(container): document BuildKit + gVisor compatibility assumption Add comment to dockerSidecarManager documenting that BuildKit sidecars inherit the main container's OCI runtime and noting that BuildKit + gVisor combination has not been extensively tested in production. * feat(apple): add 30-second timeout to DNS lock acquisition Add timeout to DNS lock in fixBuilderDNS to prevent indefinite blocking when multiple moat processes configure builder DNS concurrently. Uses goroutine + select pattern to enforce 30-second timeout with clear error message. Previously, lock acquisition would block indefinitely, causing all concurrent builds to serialize. With timeout, builds will fail fast if unable to acquire lock within 30 seconds. * feat(sidecar): add container labels for orphan cleanup Add RunID field to SidecarConfig and use it to label BuildKit sidecars with moat.run-id and moat.role labels. These labels enable cleanup of orphaned sidecars if moat crashes before calling Destroy(). Changes: - Add RunID field to SidecarConfig struct - Create labels map in StartSidecar with run-id and role - Pass r.ID when creating BuildKit sidecar in manager.go Follow-up: Implement orphan cleanup in loadPersistedRuns() to scan containers and remove sidecars whose run-id is not in m.runs. * feat(docker): add gVisor availability check before container creation Add runtime check in CreateContainer to detect if gVisor became unavailable after DockerRuntime initialization (e.g., daemon restart, config change). Provides clearer error message indicating the runtime was previously available. This is an edge case - gVisor is checked during NewDockerRuntime(), but this additional check catches configuration changes during long- running moat processes. * feat(firewall): add port range validation in SetupFirewall Add validation that proxyPort is in valid range (1-65535) before using in iptables script. While port comes from trusted source (proxy.Server.Start()), validation adds defense in depth. Applied to both Docker and Apple container runtimes. * fix(runtime): auto-disable gVisor on macOS and Windows Make default behavior platform-aware to avoid requiring --no-sandbox on platforms where gVisor is unavailable. Changes: - Update DefaultRuntimeOptions() to detect platform - On Linux: sandbox=true (requires gVisor, enhanced isolation) - On macOS/Windows: sandbox=false (gVisor unavailable in Docker Desktop) - Log clear reason when gVisor disabled on non-Linux platforms This eliminates the error "gVisor (runsc) is required but not available" on macOS and Windows, where gVisor is never available in Docker Desktop. Users can still override with --no-sandbox on Linux if needed. Updated documentation to reflect automatic platform detection and clarify that --no-sandbox only affects Linux. * fix(docker): suppress gVisor warning on macOS and Windows Only show "running without gVisor sandbox" warning on Linux where gVisor is available but explicitly disabled via --no-sandbox. On macOS and Windows, gVisor is unavailable in Docker Desktop, so running without it is the expected default behavior, not a security downgrade. Suppressing the warning provides cleaner output on these platforms. User experience on macOS/Windows: - No warning when running normally (gVisor auto-disabled) - No --no-sandbox flag needed - Clean startup without confusing security messages * fix(manager): use platform-aware defaults when NoSandbox not specified Change ManagerOptions.NoSandbox from bool to *bool to distinguish: - nil (default): Use platform-aware defaults from DefaultRuntimeOptions() - &true: Explicitly disable sandbox (--no-sandbox flag) - &false: Not used (would mean explicitly enable, but we use nil instead) This fixes the issue where NewManager() was forcing sandbox=true on all platforms, causing gVisor requirement errors on macOS/Windows where gVisor is unavailable in Docker Desktop. Before: NewManager() always required gVisor After: NewManager() uses platform defaults (gVisor on Linux, standard on macOS/Windows) Updated CLI callers: - exec.go: Only set NoSandbox when --no-sandbox flag provided - status.go, clean.go: Explicitly set NoSandbox=true (no sandbox needed) * fix(e2e): update tests for NoSandbox pointer type Update all E2E tests to use pointer syntax for NoSandbox field. Changed from 'NoSandbox: true' to 'NoSandbox: &[]bool{true}[0]' to match the new *bool type. All E2E tests passing. * chore: Delete review doc * fix(runtime): address code review feedback - Add warning when creating privileged containers (docker:dind mode) - Document BuildKit + gVisor experimental status in runtime docs The privileged container warning helps users understand security implications of docker:dind mode. The BuildKit + gVisor note clarifies that while expected to work, this combination hasn't been extensively tested in production. Code review verified that resource cleanup already has proper nil checks for feature managers, and dockerLoadWriter.Close() correctly waits for the docker load process. * fix: address second round of code review feedback Security improvements: - Audit all privileged containers unconditionally (not just docker:dind) - Validate Docker socket GID and permissions - Add BuildKit readiness check with exponential backoff The audit logging now captures any privileged container creation regardless of the reason, improving security visibility. Socket validation warns when permissions are unexpected. BuildKit readiness check handles ~5-10s daemon initialization time with retry loop (30s timeout, 100ms-2s backoff). * fix: address critical concurrency and resource leak issues Critical fixes (MUST FIX before merge): 1. Fix proxy server shutdown race condition - use sync.Once pattern for both ProxyServer and SSHAgentServer to prevent concurrent Stop() calls 2. Fix incomplete BuildKit cleanup - add RemoveContainer call in error path when main container creation fails (was only stopping, not removing) These fixes prevent race conditions in proxy shutdown and resource leaks when container creation fails after BuildKit sidecar has started. * fix: add thread-safe state access for Run struct Critical fix for race condition in run state access. Run state fields (State, Error, StartedAt, StoppedAt) were accessed with inconsistent locking - Manager lock (m.mu) protected the runs map but not the Run struct fields themselves. Solution: Add dedicated stateMu mutex to Run struct with helper methods: - GetState() / SetState() for thread-safe state access - SetStateWithError() for atomic state+error updates - SetStateWithTime() for atomic state+timestamp updates Updated critical paths: - monitorContainerExit goroutine (updates state after container exits) - Stop() method (checks and sets state) - Wait() method (reads error state) This prevents data races between concurrent access from multiple goroutines (monitorContainerExit, Stop, Wait). --------- Co-authored-by: Andy Bonventre <andybons@meetneptune.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

dpup added 30 commits January 12, 2026 22:29

feat(name): add random name generator for agents

cf5bbeb

feat(config): add name field to agent.yaml

354290a

feat(config): add global config for proxy port

16ee206

feat(container): add port binding config and interface method

7bb34b7

feat(container): implement Apple container port bindings

b45d297

feat(routing): add route table for agent services

6f380ce

feat(routing): add proxy lock file management

6531335

feat(routing): add hostname-based reverse proxy

9b06919

feat(run): add name and ports fields to Run struct

305ef63

feat(cli): add --name flag to agent run

7bc5d13

feat(run): integrate naming and port binding in manager

37a55bf

feat(run): register routes and inject host environment

f9bd36c

feat(run): unregister routes on destroy

df1714f

Remove agent routes and runsByName index entry when destroying a run. This ensures the route table stays clean and agent names can be reused after a run is destroyed.

feat(run): integrate proxy lifecycle in manager

183559f

- Add proxyLifecycle field to Manager struct - Start routing proxy automatically when first agent with ports starts - Stop routing proxy when last agent with ports is destroyed - Use global config for proxy port setting

feat(cli): update output to show agent name and services

75584d4

- run: Show agent name, run ID, and service URLs after start - list: Show NAME, RUN ID, STATE, SERVICES columns instead of ID/AGENT/STATE/CREATED

refactor: cleanup before PR

fbfb4c2

- Remove deprecated rand.Seed() call (Go 1.20+ auto-seeds) - Remove unused runsByName field from Manager - Consolidate CLI output (remove redundant fmt.Printf calls)

fix(routing): delete routes.json when last agent exits

687f636

docs: update README for new CLI syntax and add proxy command

9a87c54

docs: expand hostname routing section with naming and use cases

46e0f9b

fix: update E2E tests for renamed Options.Name field and fix formatting

b40c380

- Rename Agent field to Name in run.Options struct usage in E2E tests - Fix gofmt formatting issue in internal/run/run.go

claude Bot mentioned this pull request Jan 28, 2026

refactor: Decouple runtime implementations more aggressively #79

Merged

This was referenced Feb 17, 2026

test(run): add race detector and edge case tests #179

Merged

feat(mcp): support OAuth authentication for remote MCP servers #188

Closed

This was referenced Feb 24, 2026

feat(proxy): support claude.base_url for host-side LLM proxies #191

Merged

Simplify execution model: remove attach/detach, fix run lifecycle #196

Merged

claude Bot mentioned this pull request Mar 24, 2026

fix(claude): SSH fallback for marketplace clone and clearer error output #285

Merged

3 tasks

This was referenced Apr 11, 2026

feat: multi-runtime manager for cross-engine CLI commands #311

Merged

ci(release): publish gatekeeper image to GHCR #313

Merged

dpup mentioned this pull request Apr 16, 2026

feat(config): honor MOAT_HOME to relocate moat state directory #323

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(routing): add hostname-based service routing for agents#1

feat(routing): add hostname-based service routing for agents#1
dpup merged 31 commits into
mainfrom
dan

dpup commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dpup commented Jan 13, 2026

Key features:

CLI improvements:

Example usage:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants