World Model Harness

wmh makes it easy to go from agent traces to faithful replication of your production environment where your agents run.

Basically, an LLM pretends to be a virtual machine executing instructions — but it's 5x faster than a real sandbox.

Just:

git clone https://github.com/experientiallabs/world-model-harness
cd world-model-harness
uv sync
uv run wmh build

The build command opens a wizard that walks you through creating your own world model from your traces.

Below is a comparison running 8 SWE-bench tasks: real sandboxes on the left, a world model acting as the sandbox on the right.

How it works

A frontier LLM acts as the environment your agent steps against, reconstructed from your own OpenTelemetry traces. Inspired by Qwen-AgentWorld (LLM-as-environment), GEPA (reflective prompt evolution), and DreamGym (retrieval over a trace replay buffer) — but with zero training: we get there with prompt optimization on a frontier model.

Build from your OTel traces: ingest → normalize → split train/held-out → index a replay buffer → evolve the env prompt with GEPA against the held-out split.
Serve: agents call WorldModel.step(action) (in-process or via the local HTTP backend). Each step retrieves the most similar past (state, action) → observation examples and predicts the next observation.

Try it

uv run wmh examples list          # swe-bench, tau-bench, terminal-tasks
uv run wmh eval list              # eval suites shipped with the examples
uv run wmh eval run tau-bench     # replay + score reconstruction fidelity
uv run wmh play                   # step into the environment yourself
uv run wmh serve                  # local HTTP backend on :8000

Example-local prebuilt models live under examples/<task>/models/; pass --root examples/<task> to wmh list, wmh demo, wmh play, or wmh serve to use one without rebuilding.

Use it as an API

from wmh import Action, ActionKind
from wmh.config.store import WorldModelStore
from wmh.engine.loader import load_world_model

model_dir = WorldModelStore(".wmh").resolve("airline")
wm, _provider = load_world_model(model_dir)

session = wm.new_session(task="check out the cart")
obs = wm.step(session.id, Action(kind=ActionKind.TOOL_CALL, name="add_to_cart",
                                 arguments={"sku": "A1"}))
print(obs.content)

Or over HTTP (same code path), namespaced by model name: GET /world_models, then POST /world_models/{name}/sessions and POST /world_models/{name}/sessions/{id}/step.

Providers

One interface, four backends, verified on startup. Credentials are read from the environment:

Provider	Model	Env vars
Anthropic	Claude Opus	`ANTHROPIC_API_KEY`
AWS Bedrock	Claude Opus	`AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
Azure OpenAI	GPT	`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`
OpenAI	GPT	`OPENAI_API_KEY`

Development

Managed with uv; linting/formatting with ruff; type checking with ty. Conventions live in AGENTS.md.

uv sync --extra dev      # env + dev tools
uv run ruff check .      # lint
uv run ruff format .     # format
uv run ty check          # type check
uv run pytest -q         # tests

Usage telemetry

wmh uses anonymous usage telemetry to track the volume of usage. Telemetry is strictly metadata. It never includes prompts, traces, actions, observations, file paths, model names, provider credentials, or raw user content.

Telemetry is enabled by default. To opt out for a project:

uv run wmh config telemetry disable

This writes .wmh/settings.toml. You can re-enable it with uv run wmh config telemetry enable, check the current setting with uv run wmh config telemetry status, or disable it for a process with DO_NOT_TRACK=1 or WMH_TELEMETRY=0.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
docs		docs
examples		examples
scripts		scripts
wmh		wmh
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

World Model Harness

How it works

Try it

Use it as an API

Providers

Development

Usage telemetry

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

World Model Harness

How it works

Try it

Use it as an API

Providers

Development

Usage telemetry

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages