Skip to content

mseeks/doomma

Repository files navigation

doomma

🤖 Agentically developed. Agentic development patterns were used in the creation of this project, but with a human still at the wheel.

A local, multimodal AI agent that plays DOOM by letting a single Gemma 4 model look at the game frame and pick the next action, every tick, with no heuristic scaffolding. A web dashboard gives live observation and debugging.

doomma's live dashboard: the DOOM frame on the left; the model's reasoning and per-tick decisions on the right
The live dashboard: the game on the left; the model's reasoning and its per-tick decisions on the right.

There is exactly one model in the loop: it sees a short sequence of recent frames plus a tiny HUD, reasons briefly, and emits one action word. Everything spatial (enemies, walls, aim, distance) is read from the pixels by the model itself.

How it works

Three containers, plus Ollama on the host:

browser ──► dashboard ◄── agent ──► vizdoom (doom_env)
                            │
                            └─► host.docker.internal:11434 (Ollama on host)
  • doom_env (doom_env/) runs VizDoom headless in synchronous PLAYER mode and exposes a ZMQ REQ/REP server. Frames are JPEG-encoded; the reply also carries the HUD game variables.
  • agent (agent/) is the brain (the doomma package). It polls the environment, builds a tiny context (recent frames + HUD + recent-action trail), makes one Gemma call per tick, steps the game, and ships telemetry to the dashboard.
  • dashboard (dashboard/) is the FastAPI hub. The agent sends JSON telemetry over an internal WebSocket and JPEG frames; browsers receive telemetry over WebSocket and frames over MJPEG.
  • Ollama runs natively on the host (Metal-accelerated). Containers reach it via host.docker.internal:11434.

Prerequisites

  • macOS on Apple Silicon (M-series). Linux works too; on Linux you can run Ollama in a container if you have an NVIDIA GPU (see Linux notes below).

  • Colima as the Docker engine host (with Compose v2):

    brew install colima docker docker-compose
    colima start --cpu 4 --memory 8 --disk 30

    Colima exposes host.docker.internal to containers automatically, so the agent can reach Ollama on the host without extra setup.

  • Ollama installed natively on the host, not in a container (the Linux VM that Colima runs cannot access Apple's Metal GPU):

    brew install ollama
    OLLAMA_HOST=0.0.0.0:11434 ollama serve    # listen on all interfaces
  • Pull one multimodal model. The default targets Gemma 4 (e4b); until that tag is available locally, swap in any multimodal model you have (e.g. gemma3:12b, gemma3:27b). Bigger = better decisions but slower. The tag is configurable via the MODEL env var (see .env.example).

    ollama pull gemma4:e4b   # or: ollama pull gemma3:12b

Quick start

cp .env.example .env       # edit if you want different model tags or scenario
docker compose up --build

Then open http://localhost:8080: the live game frame on the left and the decision telemetry (action, latency, the model's reasoning) on the right.

To stop:

docker compose down

Configuration

Knobs live in .env (loaded by Compose) and agent/config.yaml (loaded by the agent). The most useful:

Setting Where Purpose
MODEL .env Ollama tag of the multimodal model that decides actions
OLLAMA_HOST .env URL the agent uses to reach Ollama
SCENARIO .env VizDoom scenario (basic.cfg, deadly_corridor.cfg, full_game.cfg, …)
sampling.think agent/config.yaml Let the model reason before acting (keep num_predict large)
sampling.num_predict agent/config.yaml Token budget for the reasoning trace plus the action word
image_tokens agent/config.yaml Estimated tokens per frame; sizes num_ctx so traces aren't truncated
vision.frames agent/config.yaml How many recent frames the model sees each tick
tic_skip agent/config.yaml Game tics advanced per agent decision
memory.recent_actions agent/config.yaml How many recent actions are fed back to the model

The agent reads its config at startup; apply changes with docker compose restart agent.

Health checks

After the stack is up, verify the agent is actually playing (not standing still or only shooting):

docker compose exec -T agent python -m doomma.probes.behavior --duration 60

The probe observes the live VizDoom state plus the agent's JSONL decision log and exits non-zero if movement stalls, decisions stop, parse failures spike, or the action stream collapses to a single action.

To verify the browser-facing delivery path too:

docker compose exec -T agent python -m doomma.probes.dashboard --duration 10

This checks that /stream.mjpg is delivering frames and /telemetry is streaming WebSocket messages.

Demo capture

Two capture tools produce shareable artifacts from the live dashboard:

scripts/capture_demo_gif.sh             # the dashboard-playback GIF above
scripts/capture_reel.sh                 # a vertical, scored MP4 for social

doomma reel title card: 'A local open model is now playing DOOM'
The vertical reel's title card, rendered by scripts/capture_reel.sh.

Development

A single tool, ruff, formats and lints (Google style); mypy --strict type-checks the three services; pytest runs the suite. Set up a local environment with uv:

uv venv --python 3.11 .venv
uv pip install -e . --group dev          # the doomma package + dev toolchain
# Runtime deps the tests import (VizDoom itself is never needed locally —
# the tests only touch its dependency-free action module):
uv pip install -r agent/requirements.txt -r dashboard/requirements.txt

.venv/bin/ruff format .
.venv/bin/ruff check .
.venv/bin/mypy
.venv/bin/pytest

These run without Docker, Ollama, or VizDoom. The services depend on ports (protocols) that the tests satisfy with in-memory fakes.

Project layout

.
├── agent/                     # the brain
│   ├── config.yaml            # gameplay tunables
│   ├── prompts/agent.txt      # the one system prompt
│   └── doomma/                # the package (see AGENTS.md / CLAUDE.md)
├── doom_env/                  # the VizDoom ZMQ server (Compose service: vizdoom)
│   ├── actions.py             # action space + button mapping (dependency-free)
│   └── server.py
├── dashboard/                 # FastAPI hub + single-page UI
├── scripts/                   # host-side demo/reel capture tools
├── tests/                     # pytest suite
└── docker-compose.yml

See AGENTS.md for how the system behaves and CLAUDE.md for the architecture and conventions.

Linux notes

On Linux with NVIDIA you can run Ollama in a container too. Replace OLLAMA_HOST=http://host.docker.internal:11434 with the container address and add the official ollama/ollama image to docker-compose.yml. Everything else is portable.

Troubleshooting

  • Agent logs connection refused to Ollama. Ollama isn't running, or isn't on 0.0.0.0. Restart with OLLAMA_HOST=0.0.0.0:11434 ollama serve.
  • Decisions are very slow. Check the model is warm with ollama ps. If it keeps unloading, keep_alive isn't being applied (see AGENTS.md §2.6).
  • Dashboard MJPEG stream is blank. The agent hasn't started pushing frames yet (a few seconds after up). Check docker compose logs agent for VizDoom connection errors.
  • VizDoom exits with Could not find … wad. Use one of the bundled scenarios (see doom_env/scenarios/README.md).

About

A local, multimodal AI agent that plays DOOM by letting a single Gemma 4 model look at the game frame and pick the next action, every tick, with no heuristic scaffolding. A web dashboard gives live observation and debugging.

Topics

Resources

License

Stars

Watchers

Forks

Contributors