doomma

🤖 Agentically developed. Agentic development patterns were used in the creation of this project, but with a human still at the wheel.

A local, multimodal AI agent that plays DOOM by letting a single Gemma 4 model look at the game frame and pick the next action, every tick, with no heuristic scaffolding. A web dashboard gives live observation and debugging.

_{The live dashboard: the game on the left; the model's reasoning and its per-tick decisions on the right.}

There is exactly one model in the loop: it sees a short sequence of recent frames plus a tiny HUD, reasons briefly, and emits one action word. Everything spatial (enemies, walls, aim, distance) is read from the pixels by the model itself.

How it works

Three containers, plus Ollama on the host:

browser ──► dashboard ◄── agent ──► vizdoom (doom_env)
                            │
                            └─► host.docker.internal:11434 (Ollama on host)

doom_env (doom_env/) runs VizDoom headless in synchronous PLAYER mode and exposes a ZMQ REQ/REP server. Frames are JPEG-encoded; the reply also carries the HUD game variables.
agent (agent/) is the brain (the doomma package). It polls the environment, builds a tiny context (recent frames + HUD + recent-action trail), makes one Gemma call per tick, steps the game, and ships telemetry to the dashboard.
dashboard (dashboard/) is the FastAPI hub. The agent sends JSON telemetry over an internal WebSocket and JPEG frames; browsers receive telemetry over WebSocket and frames over MJPEG.
Ollama runs natively on the host (Metal-accelerated). Containers reach it via host.docker.internal:11434.

Prerequisites

macOS on Apple Silicon (M-series). Linux works too; on Linux you can run Ollama in a container if you have an NVIDIA GPU (see Linux notes below).
Colima as the Docker engine host (with Compose v2):
```
brew install colima docker docker-compose
colima start --cpu 4 --memory 8 --disk 30
```
Colima exposes host.docker.internal to containers automatically, so the agent can reach Ollama on the host without extra setup.
Ollama installed natively on the host, not in a container (the Linux VM that Colima runs cannot access Apple's Metal GPU):
```
brew install ollama
OLLAMA_HOST=0.0.0.0:11434 ollama serve    # listen on all interfaces
```
Pull one multimodal model. The default targets Gemma 4 (e4b); until that tag is available locally, swap in any multimodal model you have (e.g. gemma3:12b, gemma3:27b). Bigger = better decisions but slower. The tag is configurable via the MODEL env var (see .env.example).
```
ollama pull gemma4:e4b   # or: ollama pull gemma3:12b
```

Quick start

cp .env.example .env       # edit if you want different model tags or scenario
docker compose up --build

Then open http://localhost:8080: the live game frame on the left and the decision telemetry (action, latency, the model's reasoning) on the right.

To stop:

docker compose down

Configuration

Knobs live in .env (loaded by Compose) and agent/config.yaml (loaded by the agent). The most useful:

Setting	Where	Purpose
`MODEL`	`.env`	Ollama tag of the multimodal model that decides actions
`OLLAMA_HOST`	`.env`	URL the agent uses to reach Ollama
`SCENARIO`	`.env`	VizDoom scenario (`basic.cfg`, `deadly_corridor.cfg`, `full_game.cfg`, …)
`sampling.think`	`agent/config.yaml`	Let the model reason before acting (keep `num_predict` large)
`sampling.num_predict`	`agent/config.yaml`	Token budget for the reasoning trace plus the action word
`image_tokens`	`agent/config.yaml`	Estimated tokens per frame; sizes `num_ctx` so traces aren't truncated
`vision.frames`	`agent/config.yaml`	How many recent frames the model sees each tick
`tic_skip`	`agent/config.yaml`	Game tics advanced per agent decision
`memory.recent_actions`	`agent/config.yaml`	How many recent actions are fed back to the model

The agent reads its config at startup; apply changes with docker compose restart agent.

Health checks

After the stack is up, verify the agent is actually playing (not standing still or only shooting):

docker compose exec -T agent python -m doomma.probes.behavior --duration 60

The probe observes the live VizDoom state plus the agent's JSONL decision log and exits non-zero if movement stalls, decisions stop, parse failures spike, or the action stream collapses to a single action.

To verify the browser-facing delivery path too:

docker compose exec -T agent python -m doomma.probes.dashboard --duration 10

This checks that /stream.mjpg is delivering frames and /telemetry is streaming WebSocket messages.

Demo capture

Two capture tools produce shareable artifacts from the live dashboard:

scripts/capture_demo_gif.sh             # the dashboard-playback GIF above
scripts/capture_reel.sh                 # a vertical, scored MP4 for social

_{The vertical reel's title card, rendered by scripts/capture_reel.sh.}

Development

A single tool, ruff, formats and lints (Google style); mypy --strict type-checks the three services; pytest runs the suite. Set up a local environment with uv:

uv venv --python 3.11 .venv
uv pip install -e . --group dev          # the doomma package + dev toolchain
# Runtime deps the tests import (VizDoom itself is never needed locally —
# the tests only touch its dependency-free action module):
uv pip install -r agent/requirements.txt -r dashboard/requirements.txt

.venv/bin/ruff format .
.venv/bin/ruff check .
.venv/bin/mypy
.venv/bin/pytest

These run without Docker, Ollama, or VizDoom. The services depend on ports (protocols) that the tests satisfy with in-memory fakes.

Project layout

.
├── agent/                     # the brain
│   ├── config.yaml            # gameplay tunables
│   ├── prompts/agent.txt      # the one system prompt
│   └── doomma/                # the package (see AGENTS.md / CLAUDE.md)
├── doom_env/                  # the VizDoom ZMQ server (Compose service: vizdoom)
│   ├── actions.py             # action space + button mapping (dependency-free)
│   └── server.py
├── dashboard/                 # FastAPI hub + single-page UI
├── scripts/                   # host-side demo/reel capture tools
├── tests/                     # pytest suite
└── docker-compose.yml

See AGENTS.md for how the system behaves and CLAUDE.md for the architecture and conventions.

Linux notes

On Linux with NVIDIA you can run Ollama in a container too. Replace OLLAMA_HOST=http://host.docker.internal:11434 with the container address and add the official ollama/ollama image to docker-compose.yml. Everything else is portable.

Troubleshooting

Agent logs connection refused to Ollama. Ollama isn't running, or isn't on 0.0.0.0. Restart with OLLAMA_HOST=0.0.0.0:11434 ollama serve.
Decisions are very slow. Check the model is warm with ollama ps. If it keeps unloading, keep_alive isn't being applied (see AGENTS.md §2.6).
Dashboard MJPEG stream is blank. The agent hasn't started pushing frames yet (a few seconds after up). Check docker compose logs agent for VizDoom connection errors.
VizDoom exits with Could not find … wad. Use one of the bundled scenarios (see doom_env/scenarios/README.md).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

doomma

How it works

Prerequisites

Quick start

Configuration

Health checks

Demo capture

Development

Project layout

Linux notes

Troubleshooting

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
agent		agent
dashboard		dashboard
doom_env		doom_env
media		media
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

doomma

How it works

Prerequisites

Quick start

Configuration

Health checks

Demo capture

Development

Project layout

Linux notes

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages