🤖 Inspect Robots

The Inspect AI for robotics

An open-source evaluation framework for physical AI and VLA (vision-language-action) models.

Define a robotics benchmark once, then run any policy against any compatible embodiment — a real robot or a simulator — with reproducible logs and first-class Rerun visualization.

Documentation · Quickstart · Concepts · For LLMs

One framework, two swappable inputs

LLM evaluations have a single swappable input: the model. Robotics evaluations have two — and Inspect Robots makes both first-class and orthogonal:


🧠 `Policy` — the VLA	The "brain". Maps an observation + instruction to an action chunk (a horizon of actions executed open-loop, as π0 / ACT / diffusion policies do).
🦾 `Embodiment` — the robot or sim	The "body + world". Produces observations, executes actions, owns the action/observation spaces and control rate. Real-robot-first; sims are a stricter special case.

A Task — a dataset of Scenes (initial conditions, instructions, success targets) plus scorers — is defined independently of both. Before any rollout, Inspect Robots checks the (policy, embodiment) pair is compatible (action/observation spaces, semantics, control rate, scene realizability) and fails fast if not.

Install

pip install inspect-robots            # core (numpy only)
pip install "inspect-robots[rerun]"   # + Rerun visualization

Quickstart

No hardware or simulator needed — the dependency-free CubePick mock world exercises the whole stack:

from inspect_robots import eval
from inspect_robots.mock import CubePickEmbodiment, ScriptedPolicy
from inspect_robots.scene import Scene
from inspect_robots.scorer import success_at_end
from inspect_robots.task import Task

task = Task(
    name="cubepick-reach",
    scenes=[Scene(id=f"layout-{i}", instruction="reach the cube", init_seed=i) for i in range(5)],
    scorer=success_at_end(),
    max_steps=80,
)

# The two swappable inputs: a policy (VLA) and an embodiment (robot/sim).
(log,) = eval(task, ScriptedPolicy(), CubePickEmbodiment())
print(log.status, log.results.metrics)   # success {'success_at_end': 1.0}

…or from the command line (components resolve from a registry):

inspect-robots list                                          # registered components
inspect-robots run --task cubepick-reach --policy scripted --embodiment cubepick
inspect-robots inspect logs/cubepick-reach_*.json            # results table

Why Inspect Robots

🌍 Real-world first. Interfaces assume real-robot reality — human-in-the-loop reset, no privileged success oracle, wall-clock control rate. Simulators just offer more (seeding, privileged success, rendering) via opt-in capabilities.
🔁 Reproducible. Every run yields an immutable, schema-versioned EvalLog with the resolved config, git revision, and package versions — re-readable across releases, and re-scorable offline.
🪶 Light core. Depends only on NumPy. Rerun and simulator/VLA backends are optional extras and separately installable plugins.
🛑 Safe unattended. An explicit error taxonomy separates "record and continue" from "halt and require a human", so a faulted robot never auto-advances overnight.
🎞️ Rerun visualization. Stream camera images, 3D poses, joint/action time-series, and success markers to a .rrd recording.
🧩 Pluggable. Ship inspect-robots-maniskill or inspect-robots-openvla as separate packages — entry points make them appear in inspect-robots list automatically.
⚙️ VLA-native. Action chunking, open-loop execution, and ACT/ALOHA temporal ensembling are built in, with action semantics (control mode, rotation representation, gripper, frame) that make compatibility and ensembling correct.

How it maps to Inspect AI

If you know Inspect AI, you already know Inspect Robots.

Inspect AI	Inspect Robots
`Model`	`Policy` (VLA) + `Embodiment` (two inputs)
`Task = dataset + solver + scorer`	`Task = scenes + controller + scorer`
`Sample`	`Scene`
`Solver` chain	`Controller` middleware (chunking, ensembling, smoothing)
`eval()` → `EvalLog`	`eval()` → `EvalLog`
`@task` / `@solver` / `@scorer` + registry	`@task` / `@policy` / `@embodiment` / `@scorer` + entry points

This repository is the framework (the "Inspect AI for robotics"). Concrete benchmarks (the "Inspect Evals for robotics") and backend adapters live in separate plugin packages.

Documentation

Full guides and an auto-generated API reference live at inspectrobots.org. LLM-friendly versions: llms.txt and llms-full.txt.

Development

uv venv && uv pip install -e ".[dev]"
uv run pre-commit install          # ruff + mypy on commit, 100% coverage on push
uv run pytest --cov                 # 100% coverage required
uv run ruff check . && uv run mypy

Pre-commit hooks and a blocking CI coverage gate keep main green. See CONTRIBUTING.md and the design docs in plans/.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
docs		docs
examples		examples
plans		plans
plugins/inspect-robots-isaacsim		plugins/inspect-robots-isaacsim
src/inspect_robots		src/inspect_robots
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 Inspect Robots

The Inspect AI for robotics

One framework, two swappable inputs

Install

Quickstart

Why Inspect Robots

How it maps to Inspect AI

Documentation

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🤖 Inspect Robots

The Inspect AI for robotics

One framework, two swappable inputs

Install

Quickstart

Why Inspect Robots

How it maps to Inspect AI

Documentation

Development

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages