Run OpenCode as a sandboxed, non-root Docker container connected to a self-hosted vLLM inference server. No cloud API keys required.
- Docker + Docker Compose installed on your machine
- Access to a running vLLM server exposing an OpenAI-compatible API (e.g.
http://10.0.0.13:8000) - Your vLLM server must have the model loaded and
/v1/modelsresponding
Verify your vLLM is reachable before starting:
curl http://10.0.0.13:8000/v1/modelsYou should see your model ID in the response (e.g. gemma4-26b-a4b).
Use the exact "id" value from the response — e.g. gemma4-26b-a4b.
Finding your context size:
The max_model_len field in the /v1/models response is your context limit. Use that value for "context".
Create the following layout on your machine:
opencode-sandbox/
├── Dockerfile
├── docker-compose.yml
├── config/
│ └── opencode.json
└── workspace/ ← put your code projects here
# Get the code
git clone git@github.com:jammsen/docker-opencode-sandbox.git
# Build the image (takes ~2-3 minutes on first run)
docker compose build
# Launch OpenCode interactively
docker compose run --rm opencode
# Or as a oneliner
docker compose build && docker compose run --rm opencodeOn first launch OpenCode opens the TUI. Press / to open the command palette.
Inside the TUI:
- Press
/model— your model should appear under your provider name with an orange dot - Type
hello, what model are you?— the response should mention your model ID - Check the status bar at the bottom — it should show
Gemma 4 26B MoE · vLLM (Gemma4 local) - Check the right panel —
$0.00 spentconfirms no cloud API is being used
Drop files into ./workspace/ on your host. They appear at ~/workspace/ inside the container. OpenCode operates within this directory and cannot access anything outside it.
# Copy a project into the sandbox
cp -r ~/myproject ./workspace/myproject| Mode | Shortcut | Token overhead | Best for |
|---|---|---|---|
| Build | default | ~10k tokens | Agentic file editing, multi-step tasks |
| Ask | tab |
~3-5k tokens | Questions, code review, explanations |
With a 32k context limit, Ask mode leaves significantly more room for your actual code and conversation.
The status bar shows X tokens (Y% used). Build mode consumes ~10,000 tokens just for the system prompt before you type anything. For large codebases, open only the files you need or use Ask mode.
Config not loading / provider picker appears on every launch
docker compose run --rm --entrypoint bash opencode -c \
"cat /home/opencode/.config/opencode/opencode.json"If this returns an error, check that docker compose is run from the same directory as docker-compose.yml and that ./config/opencode.json exists.
GID already exists error during build
Ubuntu 26.04 ships with a default user at UID/GID 1000. The Dockerfile handles this by renaming the existing user instead of creating a new one. Ensure you are using the Dockerfile exactly as provided above.
Model not responding / timeout
# Test vLLM connectivity from inside the container
docker compose run --rm --entrypoint bash opencode -c \
"curl -s http://YOUR_VLLM_IP:8000/v1/models"If this fails, your vLLM IP is unreachable from the container. Use the actual host IP — not localhost.
Tool calling loops or model halts mid-task
This is a known Gemma 4 behavior with agentic tool use. Mitigations:
- Prefer Ask mode for questions and code review that don't require file editing
- For Build mode, give explicit step-by-step instructions rather than open-ended goals
- Keep tasks scoped to one file or one function at a time
The container runs with the following restrictions:
- Non-root user (UID 1000)
no-new-privileges— prevents privilege escalation via setuid binariescap_drop: ALL— all Linux capabilities removed- Bridge networking only — no host network access
- Filesystem access limited to
./workspaceon the host
The model runs entirely on your local vLLM server. No data leaves your network.