🤖 Agentically developed. Agentic development patterns were used in the creation of this project, but with a human still at the wheel.
A local, multimodal AI agent that plays DOOM by letting a single Gemma 4 model look at the game frame and pick the next action, every tick, with no heuristic scaffolding. A web dashboard gives live observation and debugging.
The live dashboard: the game on the left; the model's reasoning and its per-tick decisions on the right.
There is exactly one model in the loop: it sees a short sequence of recent frames plus a tiny HUD, reasons briefly, and emits one action word. Everything spatial (enemies, walls, aim, distance) is read from the pixels by the model itself.
Three containers, plus Ollama on the host:
browser ──► dashboard ◄── agent ──► vizdoom (doom_env)
│
└─► host.docker.internal:11434 (Ollama on host)
- doom_env (doom_env/) runs VizDoom headless in synchronous
PLAYERmode and exposes a ZMQ REQ/REP server. Frames are JPEG-encoded; the reply also carries the HUD game variables. - agent (agent/) is the brain (the
doommapackage). It polls the environment, builds a tiny context (recent frames + HUD + recent-action trail), makes one Gemma call per tick, steps the game, and ships telemetry to the dashboard. - dashboard (dashboard/) is the FastAPI hub. The agent sends JSON telemetry over an internal WebSocket and JPEG frames; browsers receive telemetry over WebSocket and frames over MJPEG.
- Ollama runs natively on the host (Metal-accelerated). Containers
reach it via
host.docker.internal:11434.
-
macOS on Apple Silicon (M-series). Linux works too; on Linux you can run Ollama in a container if you have an NVIDIA GPU (see Linux notes below).
-
Colima as the Docker engine host (with Compose v2):
brew install colima docker docker-compose colima start --cpu 4 --memory 8 --disk 30
Colima exposes
host.docker.internalto containers automatically, so the agent can reach Ollama on the host without extra setup. -
Ollama installed natively on the host, not in a container (the Linux VM that Colima runs cannot access Apple's Metal GPU):
brew install ollama OLLAMA_HOST=0.0.0.0:11434 ollama serve # listen on all interfaces -
Pull one multimodal model. The default targets Gemma 4 (
e4b); until that tag is available locally, swap in any multimodal model you have (e.g.gemma3:12b,gemma3:27b). Bigger = better decisions but slower. The tag is configurable via theMODELenv var (see .env.example).ollama pull gemma4:e4b # or: ollama pull gemma3:12b
cp .env.example .env # edit if you want different model tags or scenario
docker compose up --buildThen open http://localhost:8080: the live game frame on the left and the decision telemetry (action, latency, the model's reasoning) on the right.
To stop:
docker compose downKnobs live in .env (loaded by Compose) and
agent/config.yaml (loaded by the agent). The most useful:
| Setting | Where | Purpose |
|---|---|---|
MODEL |
.env |
Ollama tag of the multimodal model that decides actions |
OLLAMA_HOST |
.env |
URL the agent uses to reach Ollama |
SCENARIO |
.env |
VizDoom scenario (basic.cfg, deadly_corridor.cfg, full_game.cfg, …) |
sampling.think |
agent/config.yaml |
Let the model reason before acting (keep num_predict large) |
sampling.num_predict |
agent/config.yaml |
Token budget for the reasoning trace plus the action word |
image_tokens |
agent/config.yaml |
Estimated tokens per frame; sizes num_ctx so traces aren't truncated |
vision.frames |
agent/config.yaml |
How many recent frames the model sees each tick |
tic_skip |
agent/config.yaml |
Game tics advanced per agent decision |
memory.recent_actions |
agent/config.yaml |
How many recent actions are fed back to the model |
The agent reads its config at startup; apply changes with
docker compose restart agent.
After the stack is up, verify the agent is actually playing (not standing still or only shooting):
docker compose exec -T agent python -m doomma.probes.behavior --duration 60The probe observes the live VizDoom state plus the agent's JSONL decision log and exits non-zero if movement stalls, decisions stop, parse failures spike, or the action stream collapses to a single action.
To verify the browser-facing delivery path too:
docker compose exec -T agent python -m doomma.probes.dashboard --duration 10This checks that /stream.mjpg is delivering frames and /telemetry is
streaming WebSocket messages.
Two capture tools produce shareable artifacts from the live dashboard:
scripts/capture_demo_gif.sh # the dashboard-playback GIF above
scripts/capture_reel.sh # a vertical, scored MP4 for social
The vertical reel's title card, rendered by scripts/capture_reel.sh.
A single tool, ruff, formats and lints (Google style); mypy --strict
type-checks the three services; pytest runs the suite. Set up a local
environment with uv:
uv venv --python 3.11 .venv
uv pip install -e . --group dev # the doomma package + dev toolchain
# Runtime deps the tests import (VizDoom itself is never needed locally —
# the tests only touch its dependency-free action module):
uv pip install -r agent/requirements.txt -r dashboard/requirements.txt
.venv/bin/ruff format .
.venv/bin/ruff check .
.venv/bin/mypy
.venv/bin/pytestThese run without Docker, Ollama, or VizDoom. The services depend on ports (protocols) that the tests satisfy with in-memory fakes.
.
├── agent/ # the brain
│ ├── config.yaml # gameplay tunables
│ ├── prompts/agent.txt # the one system prompt
│ └── doomma/ # the package (see AGENTS.md / CLAUDE.md)
├── doom_env/ # the VizDoom ZMQ server (Compose service: vizdoom)
│ ├── actions.py # action space + button mapping (dependency-free)
│ └── server.py
├── dashboard/ # FastAPI hub + single-page UI
├── scripts/ # host-side demo/reel capture tools
├── tests/ # pytest suite
└── docker-compose.yml
See AGENTS.md for how the system behaves and CLAUDE.md for the architecture and conventions.
On Linux with NVIDIA you can run Ollama in a container too. Replace
OLLAMA_HOST=http://host.docker.internal:11434 with the container address and
add the official ollama/ollama image to docker-compose.yml. Everything else
is portable.
- Agent logs
connection refusedto Ollama. Ollama isn't running, or isn't on0.0.0.0. Restart withOLLAMA_HOST=0.0.0.0:11434 ollama serve. - Decisions are very slow. Check the model is warm with
ollama ps. If it keeps unloading,keep_aliveisn't being applied (see AGENTS.md §2.6). - Dashboard MJPEG stream is blank. The agent hasn't started pushing frames
yet (a few seconds after
up). Checkdocker compose logs agentfor VizDoom connection errors. - VizDoom exits with
Could not find … wad. Use one of the bundled scenarios (see doom_env/scenarios/README.md).