Skip to content

fix(proxy): rewrite proxy host for custom network gateways#217

Merged
dpup merged 4 commits into
mainfrom
fix/custom-network-proxy-gateway
Mar 6, 2026
Merged

fix(proxy): rewrite proxy host for custom network gateways#217
dpup merged 4 commits into
mainfrom
fix/custom-network-proxy-gateway

Conversation

@dpup
Copy link
Copy Markdown
Collaborator

@dpup dpup commented Mar 6, 2026

Summary

  • Containers on custom Apple/Docker networks (for services or BuildKit) couldn't reach the credential proxy because env vars embedded the default network gateway IP, unreachable from a different subnet
  • Add NetworkGateway() to NetworkManager interface — inspects a network's actual IPv4 gateway at runtime
  • After custom network creation, compare gateways and rewrite all proxy env vars (HTTP_PROXY, HTTPS_PROXY, NO_PROXY, ANTHROPIC_BASE_URL, MOAT_SSH_TCP_ADDR, etc.) to use the correct gateway
  • Extract inspectAppleNetworkGateway() shared helper to deduplicate Apple gateway inspection between probeDefaultGateway and NetworkGateway
  • Constrain replaceHostInEnv to only replace in env var values (after first =), not keys

Test plan

  • Unit tests for replaceHostInEnv: normal rewrite, no-change, empty input, key-not-replaced
  • make lint passes (0 issues)
  • make test-unit passes (Docker-specific tests excluded — no daemon on dev machine)
  • Manual verification: moat run with service dependencies on Apple containers — debug logs confirm gateway rewrite (e.g. 192.168.64.1 → 192.168.73.1), network traces show 16 requests completing with credential injection

🤖 Generated with Claude Code

Containers on custom Apple/Docker networks (created for services or
BuildKit) couldn't reach the credential proxy because proxy URLs used
the default network gateway IP, which is unreachable from a different
subnet.

Add NetworkGateway() to the NetworkManager interface so the run manager
can inspect the custom network's actual gateway after creation. When it
differs from the default, rewrite all proxy-related env vars to use the
correct gateway.

Also extract inspectAppleNetworkGateway() shared helper to deduplicate
the gateway inspection logic between probeDefaultGateway and
NetworkGateway, and constrain replaceHostInEnv to only replace in env
var values (not keys).
@claude
Copy link
Copy Markdown

claude Bot commented Mar 6, 2026

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 4m 32s —— View job


Code Review

  • Read changed files
  • Analyze for bugs, edge cases, races, security issues
  • Post review feedback

The fix is well-targeted and the implementation is clean. The replaceHostInEnv approach is correct, the ordering (rewrite happens after all proxyEnv construction including providerEnv, SSH, and service env vars) is right, and firewall setup correctly uses r.ProxyHost which gets updated to the new gateway. Test coverage for replaceHostInEnv is thorough.

Two issues worth addressing:


1. Docker NetworkGateway may return an IPv6 gatewayinternal/container/docker.go

for _, cfg := range inspect.IPAM.Config {
    if cfg.Gateway != "" {
        return cfg.Gateway  // could be an IPv6 address
    }
}

On dual-stack Docker networks, the first IPAM config might be IPv6. Returning 2001:db8::1 would produce malformed proxy URLs like http://moat:token@2001:db8::1:19080 (bare IPv6 needs brackets). Moat-created networks don't currently configure IPv6 explicitly, so this is low-risk in practice — but fragile if that ever changes or if a user's Docker daemon has IPv6 defaults enabled. A simple guard would harden this:

for _, cfg := range inspect.IPAM.Config {
    if cfg.Gateway != "" {
        if ip := net.ParseIP(cfg.Gateway); ip != nil && ip.To4() != nil {
            return cfg.Gateway
        }
    }
}

Fix this →


2. Stale doc comment in runtime.gointernal/container/runtime.go

// NetworkManager returns the network manager if supported, nil otherwise.
// Docker provides this, Apple containers return nil.
NetworkManager() NetworkManager

Apple containers now expose a real NetworkManager (the appleNetworkManager that implements NetworkGateway), so the "Apple containers return nil" claim is wrong. This PR added NetworkGateway() to the interface but didn't update the comment.

Fix this →


Minor observation — Docker + BuildKit path

The rewrite also fires for Docker BuildKit runs (where hostAddr = "host.docker.internal" on macOS). host.docker.internal resolves correctly from any Docker bridge network already, so the rewrite to the bridge gateway IP is unnecessary. It works if the proxy listens on 0.0.0.0 (which it does), but it does change behavior for an existing working path. Not a bug, just worth noting.


Overall: the core fix is correct and the Apple container scenario is well handled. The IPv6 guard in the Docker implementation is the main hardening I'd recommend.

Filter Docker NetworkGateway to IPv4-only gateways (skip IPv6 from
dual-stack networks that would produce malformed proxy URLs), and
update stale doc comment that claimed Apple containers return nil
for NetworkManager.
@claude
Copy link
Copy Markdown

claude Bot commented Mar 6, 2026

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 4m 2s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze for bugs, edge cases, races, security issues
  • Post review feedback

The fix addresses the real problem correctly. The core logic — create network, inspect its gateway, rewrite proxy env vars — is sound. replaceHostInEnv is correctly scoped to values only (not keys), and the rewrite happens after all proxy env vars are constructed (including SSH addr and service env vars) but before container creation. The buildkitEnv appended after the rewrite only contains BUILDKIT_HOST=tcp://buildkit:1234 (a hostname, not an IP), so ordering there is fine.

Two issues worth noting:


1. Rewrite fires unnecessarily on Docker+macOSinternal/run/manager.go

On macOS with Docker, hostAddr = "host.docker.internal". When a custom network is created (for services or BuildKit), NetworkGateway returns the bridge gateway IP (e.g., 172.20.0.1). Since "172.20.0.1" != "host.docker.internal", the guard passes and the rewrite fires — replacing host.docker.internal with the bridge gateway IP in HTTP_PROXY, NO_PROXY, etc.

This works because the proxy binds to 0.0.0.0 (confirmed in cmd/moat/cli/daemon.go:120), but it changes behavior for an already-working code path. host.docker.internal resolves correctly from any Docker bridge network already.

A simple guard limits the rewrite to IP-based host addresses (the only case where the gateway can differ across subnets):

if networkID != "" && hostAddr != "" && net.ParseIP(hostAddr) != nil {

Fix this →


2. Stale doc comment on ServiceManagerinternal/container/runtime.go:93

// ServiceManager returns the service manager if supported, nil otherwise.
// Docker provides this, Apple containers return nil.
ServiceManager() ServiceManager

Apple containers do provide a ServiceManagerappleServiceManager is set in NewAppleRuntime. This comment was already wrong before this PR but since the PR updated the adjacent NetworkManager comment (line 81), it's worth fixing this one too.

Fix this →


Minor observation: plain string replacement

replaceHostInEnv uses strings.ReplaceAll, which is a substring match. If hostAddr were a prefix of another IP in an env value (e.g., "192.168.64.1" matching inside "192.168.64.10"), it would produce a corrupted value. In practice this can't happen because IPs in URLs are bounded by :, @, and ,, but it's worth being aware of. The test suite covers the key cases well.


Overall: the fix is correct for its primary target (Apple containers on custom networks), the refactoring is clean, and tests are thorough. Issue #1 is a real behavioral change for Docker+macOS that could be guarded cheaply; issue #2 is a pre-existing stale comment worth cleaning up since the PR is already touching that area.

Only rewrite proxy env vars when hostAddr is an IP address (Apple
containers). Docker on macOS uses host.docker.internal which already
resolves correctly from any bridge network.

Also fix stale ServiceManager doc comment (Apple provides it too).
@claude
Copy link
Copy Markdown

claude Bot commented Mar 6, 2026

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 2m 43s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze for bugs, edge cases, races, security issues
  • Post review feedback

The fix is correct and well-structured. The previous review feedback was addressed: Docker NetworkGateway has the IPv4 guard, the net.ParseIP guard prevents the rewrite for hostname-based hosts like host.docker.internal, and the runtime.go comments are accurate. The refactoring of inspectAppleNetworkGateway is clean.

Two issues worth addressing:


1. Rewrite scope is too broad — user env vars and secrets are subject to replacementinternal/run/manager.go

replaceHostInEnv operates on all of proxyEnv, which by the time of the rewrite includes:

  • User-defined env vars from opts.Config.Env (line ~989)
  • CLI-level env overrides from opts.Env (line ~1021)
  • Resolved secrets (line ~1007)
  • Service env vars like DATABASE_URL (line ~1985)

If any of these values contain an IP address that has the gateway as a substring prefix (e.g. MY_DB_HOST=192.168.64.10 when gateway is 192.168.64.1), the replacement silently corrupts the value to 192.168.72.10. The substitution has no IP-boundary awareness — it's a raw strings.ReplaceAll.

The proxy-specific vars that actually need rewriting are a small, known set: HTTP_PROXY, HTTPS_PROXY, http_proxy, https_proxy, NO_PROXY, ANTHROPIC_BASE_URL, MOAT_SSH_TCP_ADDR, and (if applicable) AGENTOPS_CREDENTIAL_URL. Keeping those separate from user/service env vars — or rewriting only the known proxy-prefix vars — would eliminate both the scope and the substring-match risk in one shot.

Fix this →


2. Substring replacement can corrupt IPs that share a prefix with the gatewayinternal/run/manager.go:2247

result[i] = e[:idx+1] + strings.ReplaceAll(e[idx+1:], oldHost, newHost)

"192.168.64.1" is a substring of "192.168.64.10". If an env value contains 192.168.64.10 (a plausible service or user-defined IP), the replacement produces 192.168.72.10 — silently wrong. In practice this requires the gateway IP to be a decimal prefix of another IP in the same octet, which is common (.1 is always a prefix of .10, .100, .11, etc).

For proxy URLs the :port suffix provides a natural boundary, but NO_PROXY is comma-delimited and any manually composed connection string in a user env var has no such guarantee.

A simple fix is to replace only exact IP occurrences by anchoring the substitution (e.g. using a regexp like \b<ip>\b, or — since these are always bounded by URL delimiters — replacing "@"+oldHost+":" and "="+oldHost+"," forms explicitly within the known proxy var set from issue #1 above).

Fix this →


Minor: replaceHostInEnv returns []string{} for nil input, not nilinternal/run/manager.go:2244

result := make([]string, len(env)) // len(nil) == 0 → returns []string{}, not nil

The TestReplaceHostInEnv_Empty test passes nil and checks len(result) != 0, which passes, but the return type differs from the input (nil[]string{}). Callers that rely on a nil check would break. Not a current problem since proxyEnv is always non-nil at the call site, but worth noting.


Overall: the fix is correct for its primary target. The address-rewrite approach works for Apple container custom networks. Issues #1 and #2 are the same underlying problem from two angles — the rewrite is broader and less precise than it needs to be. Fixing one (scoping to proxy vars only) would largely eliminate the risk from the other.

The test hardcoded `docker ps` to verify service container cleanup,
which fails on Apple containers where Docker isn't running. Extract
a serviceContainerExists helper that tries Docker first, then falls
back to the Apple container CLI.
@claude
Copy link
Copy Markdown

claude Bot commented Mar 6, 2026

Claude finished @dpup's task in 4m 36s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze for bugs, edge cases, races, security issues
  • Post review feedback

The core fix is correct. The previous iteration addressed the main issues: Docker NetworkGateway now filters for IPv4, the net.ParseIP(hostAddr) != nil guard prevents the rewrite from firing on hostname-based hosts (Docker on macOS), and runtime.go comments are accurate. The refactoring of inspectAppleNetworkGateway is clean. What remains is largely theoretical.

Two open issues from the previous review were not addressed in the latest iteration — noting them here in case they weren't resolved intentionally:


1. Rewrite applies to user-supplied env varsinternal/run/manager.go:2004

replaceHostInEnv(proxyEnv, hostAddr, gw) operates on all of proxyEnv, which by this point includes user-defined vars from opts.Config.Env and opts.Env. If a user happens to hardcode the default gateway IP in one of these (e.g. MY_HOST=http://192.168.64.1:9090 in moat.yaml), it would be silently rewritten to MY_HOST=http://192.168.73.1:9090. Service env vars are safe because they use container hostnames, not IPs.

Practical risk is low — users very rarely hardcode gateway IPs — but it's worth noting this is a silent corruption path with no log warning.


2. strings.ReplaceAll substring matchinternal/run/manager.go:2247

result[i] = e[:idx+1] + strings.ReplaceAll(e[idx+1:], oldHost, newHost)

"192.168.64.1" matches as a substring of "192.168.64.10". Replacing it in 192.168.64.10:5432 produces 192.168.73.10:5432 — silently wrong. The proxy-specific vars are safe because :port bounds the IP on the right, and NO_PROXY is comma-delimited. The risk only materializes for user env vars containing other IPs from the same /24 subnet.

Combined with issue #1: if a user sets DB_HOST=192.168.64.10 (a service IP) and the gateway is 192.168.64.1, it would be corrupted on custom-network runs.

Fix both →


Minor: serviceContainerExists uses string containment for Apple containersinternal/e2e/services_test.go:415

return strings.Contains(string(out), name)

If looking for moat-postgres-abc123 and a container named moat-postgres-abc12345 exists, this returns true spuriously. JSON format from container list includes quoted names, so parsing the JSON and comparing exactly would be more robust. Low priority since it's a test helper.


Overall: The fix works correctly for the primary use case (Apple containers on custom networks). Issues #1 and #2 are the same underlying concern from two angles, only triggered by an unusual combination (user env vars containing the same-subnet gateway IP). If those are knowingly deferred, this is ready to merge.

@dpup dpup merged commit aef5a1b into main Mar 6, 2026
4 checks passed
@dpup dpup deleted the fix/custom-network-proxy-gateway branch March 6, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant