Pull-based orchestration for AI coding agents inside the AI Fortress sandbox. A worker container long-polls a host-side coordinator for tasks, runs them under gVisor with a budget-capped virtual key, and posts results back. The fortress provides isolation; this layer adds a workflow.
host shell ─POST──> coordinator (uvicorn, 127.0.0.1:7222)
│ queue
│
│ long-poll lease upstream LLM
│ ▲
│ │ via fortress
▼ │
┌─ vsock relay (host) ─┐ │
│ socat 7222 │ │
└──────────┬───────────┘ │
│ AF_VSOCK │
┌──────────▼───────────┐ │
│ coordinator-shim │ │
│ alias=skia in │ ┌─────────────────────────┐
│ sandbox_net │ │ Bifrost @127.0.0.1:4000 │
└──────────┬───────────┘ └─────────────▲───────────┘
│ │
│ TCP within --internal │
▼ │
┌──── worker container ──────────┐ │
│ runsc, sandbox_net, │ │
│ HOME=/work, virtual key │ │
│ no internet, no upstream key │ │
│ │ │
│ worker.py: │ │
│ GET http://skia:7222/... │ │
│ exec opencode run … ──────┘ (Anthropic SDK
│ POST result back │ uses ANTHROPIC_BASE_URL
│ │ = authproxy:4000)
└────────────────────────────────┘
Key boundaries:
- The worker container has no direct internet egress. Only two things are reachable via the fortress's labelled vsock-shim mechanism:
authproxy(LLM proxy) andskia(this coordinator). - The worker holds only a per-session
sk-bf-*virtual key — never the real Anthropic upstream key. - The coordinator binds 127.0.0.1 only; the relay+shim is the only path in from sandboxes.
- AI Fortress installed (Phase 1 + Phase 2 verified — see
ai-fortress/README.md). - You're in the
fortressgroup (sudo -n /usr/local/sbin/fortress-mintworks without prompting). - Python 3.12+ on host with a venv:
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt .envin this directory containing at minimum:FORTRESS_API_KEY=<random secret used by both coordinator and worker>
bash install.shWhat it does:
- Installs and starts
code-workers-coordinator.serviceas a user-mode systemd unit (~/.config/systemd/user/), uvicorn bound to127.0.0.1:7222. Runs as you. Enablesloginctl --lingerso it survives logout. (System-mode would require relaxing SELinux to letinit_ttraverseuser_home_tpaths to read your venv.) - Installs and starts
code-workers-coordinator-relay.serviceas a system-mode unit (vsock 7222 → tcp 127.0.0.1:7222). - Builds the worker image on the host.
- Pushes the worker image into the VM via
ai-fortress/push_image_to_vm.sh. - Installs
coordinator-shim.servicein the VM (labelai-fortress.shim.alias=skia) and starts it. - Removes legacy
/opt/bin/worker-upif present (replaced byagent).
Status checks afterwards:
systemctl --user status code-workers-coordinator # user-mode
systemctl status code-workers-coordinator-relay # system-mode
ssh ranton@<vm> systemctl status coordinator-shim # in VMTo roll back: bash uninstall.sh.
bash run-worker rhizomeThis is a one-line wrapper around the fortress launcher; equivalent to:
~/bin/agent rhizome \
--image code-workers/worker:latest \
--env REPO_NAME=rhizome \
--env FORTRESS_API_KEY="$FORTRESS_API_KEY" \
--env OPENCODE_API_KEY="$OPENCODE_API_KEY"The container starts polling the coordinator immediately. Leave it running.
bash send_task.sh rhizome "fix the login bug in auth.py"
bash send_task.sh --wait rhizome "add unit tests for the parser module"The --wait flag polls and prints results when the task completes.
bash poll_task.sh task-1234567890-12345| File | Role |
|---|---|
coordinator.py |
FastAPI app — task queue, lease/result/status endpoints, optional Paperclip webhook |
worker.py |
In-sandbox poller; long-polls GET /tasks/lease/<repo> and runs opencode run |
Dockerfile.worker |
Worker image (Python + Rust + Claude Code + OpenCode + the worker.py entrypoint) |
Dockerfile.python |
General Python dev sandbox (built by build_python.sh) |
run-worker |
Thin wrapper around ~/bin/agent |
send_task.sh / poll_task.sh |
Host-shell helpers |
install/code-workers-coordinator.service |
Host systemd unit for uvicorn |
install/code-workers-coordinator-relay.service |
Host systemd unit for vsock 7222 → tcp 127.0.0.1:7222 |
install/coordinator-shim.service |
VM systemd unit for the labelled coordinator-shim container |
install.sh / uninstall.sh |
One-shot install + reverse |
Dockerfile.worker writes an OpenCode config that allows everything except the three server-side LLM tools that bypass the network sandbox at the application layer (webfetch, websearch, codesearch). The fortress can't catch those at the network or syscall layer because they ride inside legitimate Messages API calls. See ai-fortress/ARCHITECTURE.md, "LLM-as-egress channel," for the full discussion. A future improvement is a Bifrost-side request-scrub that strips these tools from any inbound request body — defense-in-depth that doesn't depend on per-image config.
- Worker logs
403 Forbiddenfrom the coordinator →FORTRESS_API_KEYmismatch. The worker reads from its env (passed byrun-worker), the coordinator reads from.envat startup. They must match. - Worker logs
Cannot connect to coordinator at http://skia:7222→ the coordinator-shim isn't running, or the host relay isn't running, or the host coordinator isn't running. Check (note the different--userflag for the coordinator):systemctl --user status code-workers-coordinator systemctl status code-workers-coordinator-relay ssh ranton@<vm> systemctl status coordinator-shim
agentsayssudo: a password is required→ your shell predatesusermod -aG fortress.newgrp fortress(one shell) or log out and back in (everywhere).- Worker image
EACCES: mkdir /.local→agent-vmshould setHOME=/workautomatically; check that you're on the latest fortress (agent-vmincludes the fix).
plan.md lists the originally planned hardening steps (auth, structured logging, persistence). The fortress integration adds a separate set of guarantees on top: sandbox network isolation, virtual-key budgets, no upstream-key leak, runsc syscall confinement. The two layers are complementary.