Skip to content

feat(ssh): add SSH agent proxy with host-based filtering#34

Merged
dpup merged 1 commit into
mainfrom
ssh
Jan 20, 2026
Merged

feat(ssh): add SSH agent proxy with host-based filtering#34
dpup merged 1 commit into
mainfrom
ssh

Conversation

@dpup
Copy link
Copy Markdown
Collaborator

@dpup dpup commented Jan 19, 2026

  • Add SSH agent proxy that enables containers to use SSH keys for git operations without exposing private
    keys
  • Keys are filtered by granted hosts—containers only see keys for hosts they're authorized to access
  • Sign requests are forwarded to the user's real SSH agent; private keys never enter the container
  • All SSH agent operations are logged to the tamper-proof audit store

How it works

  # Grant SSH access to a host (one time)
  moat grant ssh --host github.com

  # Use in runs
  moat run --grant ssh:github.com -- git clone git@github.com:org/repo.git

The proxy connects to the user's local SSH agent and exposes a filtered socket to the container. Only keys mapped to granted hosts are visible, and sign requests are validated against the grant before forwarding.

Runtime modes:

  • Docker on Linux / Apple containers: Unix socket mounted into container
  • Docker on macOS/Windows: TCP bridge with socat (Unix sockets can't cross the VM boundary)

Changes

  • internal/sshagent/ - New package implementing the filtering SSH agent proxy
  • internal/credential/ssh.go - SSH host-to-key mapping storage
  • internal/run/manager.go - SSH proxy lifecycle integration
  • internal/audit/ - SSH event types for audit logging
  • cmd/moat/cli/grant_ssh.go - moat grant ssh --host command
  • internal/deps/ - Adds openssh-client and socat to images when SSH grants are present

@claude
Copy link
Copy Markdown

claude Bot commented Jan 19, 2026

PR Review: SSH Agent Proxy Feature

This is an excellent implementation of SSH agent proxying with host-based filtering. The code quality is high, security considerations are well thought out, and the integration is clean. Here's my detailed feedback:


✅ Strengths

Architecture & Design

  • Clean abstraction layers: The separation between Proxy, Server, and AgentClient is well-designed and testable
  • Runtime-aware implementation: Smart handling of Docker vs Apple containers and TCP vs Unix socket modes
  • Minimal interface: AgentClient interface is focused and makes testing straightforward
  • Proper lifecycle management: SSH server cleanup is handled in all code paths (Create errors, Stop, Destroy)

Security

  • Defense in depth: Private keys never enter containers - only sign requests are forwarded
  • Host-based filtering: Keys are filtered per-host, preventing unauthorized access
  • Audit logging: All SSH operations (list, sign_allowed, sign_denied) are logged to tamper-proof audit store
  • Socket permissions: Properly set to 0666 for container access while limiting scope via directory isolation
  • Single-host fallback: Reasonable fallback for non-git SSH when key maps to exactly one host (proxy.go:131-134)

Code Quality

  • Comprehensive tests: Unit tests cover filtering, signing, host validation, upstream errors, and fallback scenarios
  • Good error messages: Clear, actionable errors (e.g., "SSH grants require SSH_AUTH_SOCK...")
  • Proper concurrency: Uses sync.RWMutex correctly, atomic.Value for current host
  • Cleanup handling: Socket files removed on shutdown, directories cleaned up on error

Documentation

  • Clear examples: README shows both CLI usage and agent.yaml configuration
  • Design docs: Detailed planning documents explain architecture decisions
  • Code comments: Security model documented in server.go:82-87

🔍 Issues & Recommendations

1. Potential Resource Leak (Medium Priority)

Location: internal/run/manager.go:440-446

When connecting to the upstream SSH agent, if a subsequent error occurs, the agent connection may not be closed:

upstreamAgent, err := sshagent.ConnectAgent(upstreamSocket)
if err != nil {
    if proxyServer != nil {
        _ = proxyServer.Stop(context.Background())
    }
    return nil, fmt.Errorf("connecting to SSH agent: %w", err)
}
// If error occurs after this, upstreamAgent.Close() may not be called

Recommendation: Add cleanup for upstreamAgent in all error paths after it's created, or use a defer with error handling:

defer func() {
    if err != nil && upstreamAgent != nil {
        _ = upstreamAgent.Close()
    }
}()

2. Race Condition in Audit Callback (Low Priority)

Location: internal/sshagent/proxy.go:55-62

The audit() method reads auditFunc under RLock, but the function could theoretically be called while being set. While unlikely in practice (it's set once during initialization), consider setting it during construction:

// Option 1: Pass audit func to NewProxy
func NewProxy(upstream AgentClient, auditFunc AuditFunc) *Proxy

// Option 2: Use sync.Once for the first call

3. Missing Error Check (Low Priority)

Location: internal/run/manager.go:477

_, tcpPort, _ := parseHostPort(tcpAddr)

The error from parseHostPort is silently ignored. If parsing fails, the proxy may not work correctly.

Recommendation: Check the error and handle appropriately:

_, tcpPort, err := parseHostPort(tcpAddr)
if err != nil {
    // cleanup and return error
}

4. Test Permissions Check is Weak (Low Priority)

Location: internal/sshagent/server_test.go:76-81

if mode&0077 != 0 {
    t.Logf("Socket permissions: %o (note: some systems allow different permissions)", mode)
}

This test logs a message but doesn't fail. The comment in server.go:88 states permissions should be 0666, but the test expects 0600 or similar. There's a mismatch between intention and validation.

Recommendation: Either:

  • Make the test fail if permissions are incorrect
  • Or update the test to verify the actual requirement (0666)

5. Incomplete Host Parsing Error Handling (Low Priority)

Location: cmd/moat/cli/grant_ssh.go:157-171

The fingerprintFromAuthorizedKey function returns an empty string on error but doesn't indicate what went wrong (bad format, bad base64, etc.). Consider returning an error for better diagnostics.

6. Socket Directory Cleanup (Low Priority)

Location: internal/run/manager.go:494-502

The socket directory is created but only cleaned up on immediate error. If the run fails later, the directory may be left behind. Consider cleanup in Destroy().


📝 Minor Suggestions

Code Style

  1. Consistent error wrapping: Most errors use fmt.Errorf with %w, but a few use %v. Stick with %w for better error chains.

  2. Magic numbers: Consider constants for:

    • Socket permissions (0666 in server.go:88)
    • TCP bind address ("127.0.0.1:0" in manager.go:462)

Testing

  1. E2E test coverage: The E2E tests verify basic setup but don't test actual SSH operations (e.g., git clone). Consider adding a test that performs a real SSH operation if feasible.

  2. Mock consistency: The mock agent in tests is defined inline. Consider moving it to a test helper file for reuse across test files.

Documentation

  1. Security model: The proxy security model differs between runtimes (localhost vs 0.0.0.0). This is documented in CLAUDE.md but might be worth a comment in the code near the binding logic (manager.go:287-297).

  2. TCP bridge mode: The socat bridge setup for Docker on macOS is clever but could use more explanation in the code comments about why it's needed (VM boundary issue).


🎯 Performance Considerations

  1. No obvious performance issues: The proxy uses efficient Go stdlib SSH libraries
  2. Concurrency: Proper use of RWMutex allows concurrent reads
  3. Memory: No unbounded allocations or leaks detected

🔒 Security Assessment

✅ Security Done Right

  • ✅ Private keys never exposed to containers
  • ✅ Host-based filtering prevents key misuse
  • ✅ Audit logging for compliance/forensics
  • ✅ Proper socket permissions and isolation
  • ✅ No credential storage in containers

⚠️ Security Considerations

  1. Unix socket permissions (0666): While documented as safe due to directory isolation, this is broader than typical SSH agent permissions (0600). Consider if there's any way a malicious process could access the socket directory.

  2. Single-host fallback: The fallback in proxy.go:131-134 is reasonable but could be exploited if a user accidentally grants the same key to multiple hosts. Document this behavior clearly for users.

  3. TCP mode on macOS: TCP binding to 127.0.0.1 is secure, but ensure Docker Desktop's networking doesn't expose this to other containers or the network.


📊 Test Coverage Assessment

Strong Coverage

  • ✅ Proxy filtering logic
  • ✅ Host-based authorization
  • ✅ Upstream error propagation
  • ✅ Server lifecycle
  • ✅ Socket cleanup

Could Be Improved

  • ⚠️ Integration between proxy and actual SSH operations
  • ⚠️ TCP mode end-to-end testing
  • ⚠️ socat bridge setup verification
  • ⚠️ Concurrent access patterns

📚 Conventional Commits

The commit message format is good, but note per CLAUDE.md:

Do not include Co-Authored-By lines for Claude in commit messages

The current commit includes Claude attribution, which should be removed per project conventions.


✨ Overall Assessment

Rating: Excellent (9/10)

This is production-quality code with thoughtful design, good test coverage, and strong security considerations. The issues identified are mostly minor and don't block merging. The architecture is extensible and well-integrated with the existing moat codebase.

Recommended Action

Approve with minor fixes

Address the resource leak (issue #1) and error handling (issue #3) before merging. Other issues can be handled in follow-up PRs if desired.

Great work! 🎉

@claude
Copy link
Copy Markdown

claude Bot commented Jan 19, 2026

Code Review: SSH Agent Proxy Implementation

I've reviewed this PR and found it to be a well-architected implementation of SSH agent forwarding with host-based filtering. Here's my detailed feedback:

✅ Strengths

Security Model

  • Strong security design: Private keys never enter containers, only signing requests are forwarded
  • Host-based filtering correctly limits key visibility per grant
  • Audit logging captures all SSH operations (list, sign_allowed, sign_denied) for tamper-proof tracking
  • Proper use of cryptographic fingerprints (SHA256) for key identification
  • Fallback logic for single-host keys is sensible (proxy.go:131-134)

Code Quality

  • Clean separation of concerns: protocol parsing, proxy filtering, server lifecycle
  • Proper use of concurrency primitives: atomic.Value for currentHost, sync.RWMutex for allowedKeys
  • Comprehensive test coverage with unit tests for filtering, signing, and protocol handling
  • Good error messages with actionable guidance (grant_ssh.go:53-56, 71-74, 105-106)
  • Proper cleanup in error paths and graceful shutdown

Architecture

  • Runtime-aware TCP vs Unix socket handling for Docker cross-platform support
  • Integration with existing audit and credential infrastructure
  • Deterministic image tagging includes SSH dependencies (builder.go:37-39)

🔍 Issues & Suggestions

1. Race Condition in Host Tracking (Medium Priority)

The currentHost approach assumes SSH connections are sequential per container. If multiple SSH operations occur concurrently, SetCurrentHost could race:

// Thread 1: SetCurrentHost("github.com")
// Thread 2: SetCurrentHost("gitlab.com") 
// Thread 1: Sign() - might use wrong host!

Concern: The PR description mentions host tracking via an SSH wrapper, but I don't see SetCurrentHost being called anywhere in the codebase. How is the current host being communicated to the proxy?

Recommendation:

  • If relying on an external wrapper calling SetCurrentHost: Add documentation explaining this requirement and the threading assumption
  • Consider per-connection context instead of shared state, or use connection-scoped proxies
  • Add integration tests validating concurrent SSH operations to different hosts

2. Missing Host Tracking Implementation (High Priority)

# This search should show how SetCurrentHost is called:
$ grep -r "SetCurrentHost" internal/
# Only found in tests and the proxy itself

The planning docs mention an ssh-wrapper.sh, but it's not in the changeset. Without this:

  • Sign requests will only work for single-host keys (fallback on line 131)
  • Multi-host keys will fail with "cannot determine target" error

Questions:

  • Is the wrapper script implementation pending?
  • How does the container communicate the target host to the proxy?
  • Should there be environment-based host tracking?

3. Socket Permissions Security (Low Priority)

server.go:88 sets socket permissions to 0666 (world-readable/writable):

if err := os.Chmod(s.socketPath, 0666); err != nil {

Analysis: The comment correctly notes security is maintained by:

  1. Per-run socket directories (~/.moat/sockets/<run-id>/)
  2. Proxy-enforced host filtering
  3. Directory only mounted to specific container

Suggestion: Consider adding a reference to the proxy security model doc or expanding the comment to note this is required for cross-UID access.

4. TCP Server Binding (Low - Informational)

For Docker mode, the TCP server binds to 127.0.0.1:0 (manager.go:462). The comment in the PR description mentions this is for macOS/Windows Docker-in-VM scenarios.

Question: On Linux, does Docker Desktop also require TCP mode, or does it use Unix sockets? Might be worth documenting the runtime detection logic.

5. Error Handling in socat Bridge (Low Priority)

The moat-init.sh script starts socat in the background but doesn't validate it stays running:

socat UNIX-LISTEN:/run/moat/ssh/agent.sock,fork TCP:"$MOAT_SSH_TCP_ADDR" &

If socat fails after the health check loop, SSH operations will fail silently. Consider:

  • Trapping signals and checking socat exit status
  • More robust health checking beyond socket existence

6. Fingerprint Parsing (Low Priority)

grant_ssh.go:166 decodes base64 from authorized_keys format:

keyBlob, err := base64.StdEncoding.DecodeString(parts[1])

SSH public keys can use StdEncoding or RawStdEncoding. While standard openssh-keygen uses standard encoding, consider using ssh.ParseAuthorizedKey from golang.org/x/crypto/ssh for more robust parsing.

7. Audit Data Consistency (Low Priority)

The SSHData struct in audit/entry.go:58-64 uses omitempty tags. For tamper-proof logs, consider always including fields (even if empty) to ensure consistent hash computation across all operations.

📝 Documentation

Strengths:

  • README examples are clear and actionable
  • Error messages follow project guidelines (tell users exactly what to set and how)
  • Inline comments explain security trade-offs

Suggestions:

  • Document the host tracking mechanism once implemented
  • Add architecture diagram showing proxy flow (user agent → filtering proxy → container)
  • Consider adding troubleshooting section for common issues (agent not running, key not added, socat failures)

🧪 Testing

Coverage: Good unit test coverage for core logic (filtering, signing, protocols)

Missing:

  • Integration tests for TCP vs Unix socket modes
  • Concurrent sign request testing
  • Error path testing (upstream agent failures, network issues)
  • E2E test with actual git operations

Performance Considerations

  • Proxy adds minimal latency (one extra hop for List/Sign operations)
  • Unix socket mode should have near-zero overhead
  • TCP + socat mode adds two extra hops but is still acceptable for SSH operations

Final Verdict

Overall Assessment: This is a solid implementation with good security fundamentals. The architecture is sound and integrates well with moat's existing systems.

Blocking Issues:

Recommended Next Steps:

  1. Clarify/implement the host tracking mechanism
  2. Add integration tests for concurrent scenarios
  3. Consider the suggestions above for production hardening

The PR already has owner approval, so these are suggestions for follow-up improvements rather than blocking concerns.


Code Quality: ⭐⭐⭐⭐ (4/5)
Security: ⭐⭐⭐⭐ (4/5)
Test Coverage: ⭐⭐⭐⭐ (4/5)
Documentation: ⭐⭐⭐⭐ (4/5)

Adds secure SSH agent forwarding to containers with per-host key access
control. Users grant access via `moat grant ssh --host github.com` which
maps a specific SSH key to a host. At runtime, the proxy only allows
signing requests for authorized host/key combinations.

Key features:
- Host-based key filtering (keys only work for granted hosts)
- Automatic container setup when SSH grants are present
- TCP bridge for Docker on macOS (Unix sockets via socat in container)
- Direct Unix socket mounting for Linux/Apple containers
- Audit logging for all SSH agent operations

Usage:
  moat grant ssh --host github.com    # Grant access (one-time)
  moat run ./agent                    # SSH just works if grants present

The agent.yaml only needs:
  grants:
    - ssh:github.com
@claude
Copy link
Copy Markdown

claude Bot commented Jan 19, 2026

PR Review: SSH Agent Proxy with Host-Based Filtering

Overview

This is a well-architected feature that adds SSH agent forwarding with security-conscious host filtering. The implementation follows moat's security model and integrates cleanly with existing credential and audit systems.

Strengths

Architecture & Design

  • Clean abstraction layers: The sshagent package properly separates concerns
  • Security-first approach: Host-based filtering prevents unauthorized SSH access
  • Runtime-aware: Handles Docker/Apple containers correctly with Unix sockets vs TCP
  • Audit integration: SSH operations flow into tamper-proof audit store (manager.go:840-847)

Code Quality

  • Good error messages: Follows CLAUDE.md guidelines with actionable messages
  • Comprehensive testing: Unit tests cover filtering logic, edge cases, E2E scenarios
  • Type safety: Strong interfaces make code testable and maintainable

Issues & Concerns

1. Socket Permissions Too Permissive (Security)

Location: internal/sshagent/server.go:88

Setting socket permissions to 0666 (world read/write) is overly permissive. While comments explain security relies on directory isolation, this violates defense-in-depth. Any process discovering the socket path can connect.

Recommendation: Use 0600 or 0660 instead. Investigate ACLs or group permissions for cross-user access.

2. Race Condition in Host Tracking (Bug)

Location: internal/sshagent/proxy.go:103

currentHost is set by external wrapper but validated during Sign(). No guarantee host is set before signing. Could cause incorrect validation, wrong audit logs, or unexpected failures.

Recommendation: Pass host as parameter to Sign() instead of shared state, or document that proxy is not safe for concurrent use.

3. TCP Server Binding Unclear

Location: internal/run/manager.go:462

TCP server binds to 127.0.0.1, but comments suggest Apple containers need 0.0.0.0 (like HTTP proxy). Configuration seems inconsistent.

Recommendation: Document how Docker containers access TCP server. Consider token auth like HTTP proxy.

4. Missing Input Validation (Security)

Location: cmd/moat/cli/grant_ssh.go:44

No validation that host is valid hostname. Could accept empty strings, special characters, IPs.

Recommendation: Add hostname validation.

5. Incomplete Error Handling (Bug)

Location: internal/sshagent/server.go:119-124

Accept errors silently ignored and retried indefinitely. File descriptor exhaustion won't be surfaced.

Recommendation: Log errors or implement backoff.

6. Socat Bridge Lacks Health Check (Reliability)

Location: internal/deps/scripts/moat-init.sh:14

Socat started in background with no health check or retry logic. Could fail silently.

Recommendation: Add socket validation after creation loop.

7. Test Assertion Missing

Location: internal/sshagent/server_test.go:78-80

Test logs but doesn't assert. Name implies validation but doesn't actually test.

Recommendation: Make proper assertion, rename as informational, or remove.

Security Assessment

Strong Controls

  • Private keys never enter containers
  • Host-based key filtering at proxy layer
  • All operations audited to tamper-proof store
  • Grants stored encrypted

Areas for Improvement

  1. Socket permissions (issue 1)
  2. Race conditions in host tracking (issue 2)
  3. TCP binding model unclear (issue 3)
  4. Input validation gaps (issue 4)

Test Coverage

Excellent overall:

  • Unit tests for proxy filtering logic
  • Server lifecycle tests
  • E2E grant flow tests
  • Edge cases covered

Missing:

  • TCP server mode
  • Socat bridge functionality
  • Concurrent sign operations
  • Malformed public key handling

Recommendations

Must Fix Before Merge

  1. Socket permissions (security concern)
  2. Race condition in host tracking (potential bug)
  3. TCP binding clarification (may not work as intended)

Should Fix

  1. Input validation
  2. Error handling in server
  3. Socat health check

Nice to Have

  1. Test assertion fix
  2. Concurrent usage tests
  3. Document TCP mode security model

Summary

This is high-quality work with sound architecture and good security practices. However, socket permissions and race conditions should be addressed before merging. The implementation shows strong understanding of moat's architecture and follows project conventions well.

Recommendation: Request changes for must-fix issues, then approve.

@dpup dpup merged commit 762994b into main Jan 20, 2026
4 checks passed
@dpup dpup deleted the ssh branch January 20, 2026 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants