Skip to content

fix: use host.docker.internal for container-side Steward URL#401

Closed
0xSolace wants to merge 4 commits intodevfrom
fix/steward-container-url
Closed

fix: use host.docker.internal for container-side Steward URL#401
0xSolace wants to merge 4 commits intodevfrom
fix/steward-container-url

Conversation

@0xSolace
Copy link
Copy Markdown
Collaborator

Problem

Containers on the milady-isolated bridge network receive STEWARD_API_URL=http://localhost:3200. Inside a bridge network, localhost resolves to the container itself, not the Docker host. This means containers cannot reach the Steward vault service running on the host at port 3200.

Impact: All 9 containers on milady-core-1 have been silently unable to reach Steward.

Fix

Split the single STEWARD_API_URL constant into two:

  • STEWARD_HOST_URL (http://localhost:3200) — used by the orchestrator for host-side API calls (agent registration, token minting)
  • STEWARD_CONTAINER_URL (http://host.docker.internal:3200) — injected into container env vars

Configurable via STEWARD_CONTAINER_URL env var for custom setups.

Changes

  • packages/lib/services/docker-sandbox-provider.ts: +9/-3

Testing

  • 217/223 tests pass (6 pre-existing failures in affiliatesService, unrelated)
  • Verified host.docker.internal resolves correctly from Docker bridge containers

- Add isAuthenticationError() helper for proper error classification
- Improve try/catch coverage in agent creation and pairing flows
- Update tests for new error handling paths
Containers on the milady-isolated bridge network cannot reach the host
via localhost. Split STEWARD_API_URL into host-side (for orchestrator
API calls) and container-side (for Docker env injection).

Host-side: http://localhost:3200 (orchestrator → Steward)
Container-side: http://host.docker.internal:3200 (container → Steward)

The registerAgentWithSteward() function runs Python via SSH on the
Docker host, so it correctly uses the host-side URL. The container
env var STEWARD_API_URL now uses STEWARD_CONTAINER_URL which defaults
to http://host.docker.internal:3200.

Configurable via STEWARD_CONTAINER_URL env var for custom setups.
@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eliza-cloud-v2 Ready Ready Preview, Comment Mar 22, 2026 6:06pm

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3feeb2ed-9f60-417e-85f5-fd159e02665c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/steward-container-url

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 22, 2026

PR #401 Review — fix: use host.docker.internal for container-side Steward URL

The core fix is correct and well-designed. The split into STEWARD_HOST_URL (orchestrator/SSH) and STEWARD_CONTAINER_URL (injected into containers) solves the real bug where containers on milady-isolated were given localhost:3200 which resolved to themselves. The env var override makes it configurable for non-standard Docker setups. A few issues worth addressing before merging.


High Severity

host.docker.internal not available on Linux Docker by default
packages/lib/services/docker-sandbox-provider.ts:79

On Linux, host.docker.internal requires --add-host=host.docker.internal:host-gateway at container creation time (Docker 20.10+). The docker create command doesn't include this flag. If production nodes run Linux Docker (typical for VPS), containers will silently fail to reach Steward — the same symptom as the original bug, just with a different unresolvable hostname. The STEWARD_CONTAINER_URL env override handles this for informed operators, but the default will be broken.

Options:

  1. Add --add-host=host.docker.internal:host-gateway to the dockerCreateCmd unconditionally
  2. Document this Linux requirement prominently with the env var override instructions

Medium Severity

No Steward deregistration on docker start failure
packages/lib/services/docker-sandbox-provider.ts:466-473

If docker create succeeds but docker start throws, the catch block runs docker rm -f (good), but the Steward token has already been minted and the agent is registered. Orphaned Steward entries will accumulate over time. Consider adding a Steward deregistration call in the rollback path when the token has already been minted.

Documentation still shows the old (incorrect) URL
docs/steward-container-provisioning.md:12

The doc still says STEWARD_API_URL=http://localhost:3200 for the container env. Should be updated to reflect host.docker.internal:3200.


Low Severity

extractStewardToken silent fallback on bad JSON
packages/lib/services/docker-sandbox-provider.ts:160-164

If Steward returns malformed JSON (e.g., truncated SSH output), the function silently returns whatever bytes arrived as the "token". The container starts with a corrupted token, and the failure won't surface until the container tries to use Steward. A log warning in the catch block would make debugging significantly easier:

} catch {
  logger.warn("[docker-sandbox] Steward response is not JSON, treating as plain text token");
}

Forbidden maps to 401 instead of 403
app/api/v1/milady/agents/route.ts:30,91

"Forbidden" semantically warrants a 403 (authenticated but not authorized), not 401 (unauthenticated). Clients relying on status codes to distinguish the two cases will be misled.

Regex escapes only the first dot in domain alias
packages/lib/services/pairing-token.ts:37-38

url.hostname = url.hostname.replace(new RegExp(`${a.replace(".", "\\.")}`), b);

String.replace() with a string pattern only replaces the first occurrence. For suffixes like .waifu.fun and .milady.ai, the inner dot (in fun/ai) remains unescaped in the regex. Use a.replace(/\./g, "\\.") to escape all dots.

Timeout may be tight for Steward registration
packages/lib/services/docker-sandbox-provider.ts:210

DOCKER_CMD_TIMEOUT_MS (60s) is used for registerAgentWithSteward. The embedded Python script uses timeout=15 per HTTP request (up to ~30s total), leaving only ~30s margin for SSH connection overhead + Python startup under load. Consider a dedicated STEWARD_REGISTER_TIMEOUT_MS = 90_000 or reducing the Python per-request timeout.

Missing POST auth error test
packages/tests/unit/milady-create-routes.test.ts

A test was added for GET returning 401 on auth failure but there's no corresponding test for POST, which has the same error-handling wrapper.


Nit

Step comment at line ~495 of docker-sandbox-provider.ts says // 8. Wait for Headscale VPN registration but step numbering appears to skip 7 after a merge/renumber. Minor readability issue.


Summary

Issue File Severity
host.docker.internal unavailable on Linux without --add-host docker-sandbox-provider.ts:79 High
No Steward deregistration on docker start failure docker-sandbox-provider.ts:469-473 Medium
Doc still shows localhost:3200 for container docs/steward-container-provisioning.md:12 Medium
Silent fallback on bad JSON in extractStewardToken docker-sandbox-provider.ts:160-164 Low
Forbidden → 401 should be 403 route.ts:30,91 Low
Regex escapes only first dot in domain alias pairing-token.ts:37-38 Low
Tight timeout for Steward registration over SSH docker-sandbox-provider.ts:210 Low
No POST auth error test milady-create-routes.test.ts Low

The Linux host.docker.internal issue is the most significant concern and should be validated against the actual production node OS before merging. Everything else is low-to-medium.

@0xSolace
Copy link
Copy Markdown
Collaborator Author

Combined into #403 (fix/steward-security-migrations)

@0xSolace 0xSolace closed this Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant