strands-robots v0.4.0

150+ commits since v0.3.8 (Feb 20). This is the release where strands-robots
stops being "policy inference glue for a single arm on your desk" and becomes a
platform you can simulate on, evaluate on, and deploy to a fleet with - without
swapping libraries at each stage. The through-line of this cycle was closing the gap
between "it runs a model" and "it runs a robot you'd trust in a room with people."

Python >=3.12 required (LeRobot >=0.5.0 floor).
pip install strands-robots · everything: pip install "strands-robots[all]"

The story of this release

Three forces drove almost every PR:

The sim-to-real loop was broken in the middle. You could load a policy and you
could talk to hardware, but there was no first-class simulator in between - so you
couldn't develop a policy without a physical robot, and you couldn't reproduce a
failure without the exact bench setup. We built a MuJoCo backend and a Robot()
factory so the same code path runs in sim and on metal.
"One robot" assumptions don't survive contact with a fleet. The moment more than
one robot, or more than one operator, is involved, you need identity, presence,
replay protection, an audit trail, and a human able to hit stop. We built the mesh
and its AWS IoT transport around that reality, and spent a large fraction of the
cycle hardening it rather than adding surface.
"It produced an action array" is not "it succeeded." We added a real evaluation
protocol (LIBERO) so policy quality is a number we can defend, not a vibe from
watching a render.

Everything below ladders up to one of those three.

Simulation: so you can develop without a robot on the bench

Why: Iterating on a policy against physical hardware is slow, unsafe, and
unreproducible. A bug that only shows up at 50 Hz on a real servo is nearly impossible
to bisect. We needed a simulator that is byte-equivalent enough to trust and that
exposes the same agent-tool surface as the rest of the library.

MuJoCo backend as an AgentTool with 50+ actions (#85) on a foundation of
models, the Policy/engine ABCs, a factory, a model registry, and asset
auto-download (#84, #105/#106). The agent drives the world the same way it drives a
real robot - world building, robot inject/eject, cameras, stepping, rollout, render.
Control-rate substepping (#353) and stepping physics for the full control
period in eval (#429). Why it matters: position-servo policies were silently
failing to track because we stepped the sim once per action instead of for the whole
control interval - the policy looked broken when the integration was. This is the
kind of bug that costs a week on hardware; we paid it down once, in sim.
Render fidelity fixes - blown-out white ground (#428), ground-plane z-fighting on
attach (#360), conditional ground-strip + tendon scale + RNG parity (#400).
Why it matters: renders are the policy's input in vision models. A blown-out
frame isn't cosmetic - it's feeding the policy garbage and poisoning eval.
Naming/identity correctness: register sim robots under the user's name not the
canonical one (#435), structured errors from resolvers (#417), optional robot_name
across the state family (#412). Why: the sim has to address robots the way the user
thinks about them, or multi-robot scenes become a guessing game.
SimEngine.describe() discovery surface (#407) - so an agent can ask "what can I do
here?" instead of failing on an unknown action key. We also started surfacing valid
actuator names on errors and propagating failures into run_policy status (#436),
because a silent zero-action on failure is the single most dangerous default in robotics.

Robot() factory: one entry point

Why: Before this, choosing sim vs. real meant choosing a different code path, and
the path that touches a physical motor was just as easy to invoke by accident as the
safe one. That's backwards.

Robot() factory + top-level lazy imports (#86; hygiene follow-ups #145).
Sim is the default; real hardware is an explicit opt-in (mode="real" /
STRANDS_ROBOT_MODE=real). Lazy imports keep import strands_robots cheap so the
factory doesn't drag the entire ML stack into a process that just wants to talk to a
serial port.
Ergonomics for send_action, add_robot, render, get_robot (#431) - the
paper cuts that make the difference between an API you demo and an API you live in.

Mesh + AWS IoT: because a fleet is a security problem, not a networking problem

Why: Connecting robots is easy. Connecting robots safely - where a stale command
can't be replayed, a compromised peer can't impersonate another, an operator can always
intervene, and every actuation is on an audit trail - is the actual job. We treated the
mesh as a trust boundary from day one, which is why the hardening PRs outnumber the
feature PRs here.

Core mesh - session, presence, RPC, streams, wiring, + AWS IoT transport (#101).
Zenoh for the LAN, MQTT5/mTLS for the cloud, Device Shadow mirror, S3 camera offload,
account-wide Fleet Provisioning.
The #195 hardening split landed as a deliberate sequence so each layer could be
reviewed in isolation: PKI helpers + conftest (#220), payload validation / action
allowlist (#223), Zenoh + ACL config with mTLS/downsampling/low-pass (#224),
tamper-evident HMAC audit log with per-peer sequence + rotation (#221), cross-transport
dedup + monotonic TTL + strict mode (#222), replay caches + override-resume + safety
topic handlers (#225), and robot_mesh HITL via tool_context.interrupt + per-action
rate limit (#227). Why split: security review fatigue is real; a 9-part series each
reviewable in an afternoon catches more than one 4000-line PR nobody reads to the end.
Human-in-the-loop done right (#227, #411): a declined approval must not consume a
rate-limit slot (or an operator's "no" could lock out a legitimate e-stop), and the
operator's literal reply is never echoed back into the LLM context (that turns a human
into a prompt-injection channel). Read-only actions are audited too (#411) - operators
need "the agent read N frames at time T", not just actuation logs.
Replay/lockout safety pins: estop engages even when the per-issuer replay cache is
full (#263/#339) - the cache bounds memory, never safety; resume-cache fairness mirror
(#342); check-then-set estop replay lock (#273/#361); poison records on every degraded
audit path so a stream gap is attributable, not silent (#410). Why this obsession:
in this domain the failure mode isn't "wrong answer", it's "arm moves when it shouldn't"
or "stop didn't take." Fail-loud, fail-safe, always.
AWS IoT provisioning hardening: CA pin + thing-name regex + scoped policy (#228),
deny-by-default Fleet Provisioning hook (#333), operator-shadow + response publishes
scoped to the device's own ThingName (#334, #336), atomic break-glass marker with
explicit symlink reject (#388/#402). Why: fleet provisioning is the blast radius -
a permissive default here means one bad cert owns the whole account.
Teleop integrity: validate input frames before apply (#332), route every teleop
publish through the single Mesh.publish() chokepoint (#452). One door, guarded.
Sim is a first-class mesh peer: tell() dispatch maps to run_policy/start_policy
(#304), sim joint state bridges to child peers (#422), sim cameras publish JPEG frames
(#425). Why: if sim and real don't look identical on the mesh, your fleet tooling
can't be tested in sim - which defeats the point of having a sim.

Policies: more brains, one socket

Why: The field is moving fast and no single policy wins everywhere - VLAs for
open-ended manipulation, classical planners for collision-aware motion. The Policy ABC
exists so adding a new brain doesn't fork the stack. This cycle we proved the abstraction
holds by hanging very different things off it.

NVIDIA Cosmos 3 omnimodal VLA (#317) with both a service backend (msgpack/websockets,
GPU-isolated) and an in-process diffusers backend (#458) for when you'd rather not
run a sidecar. We re-anchor IK on the achieved EE pose each step (#462) - why: over a
long rollout, integrating the model's relative pose deltas drifts; anchoring on where the
arm actually is bounds the tracking error instead of letting it compound.
MoveIt2Policy under [moveit2] (#305) and cuRobo migrated to the main API
(#442) - collision-aware planners living under the same ABC as the VLAs, so an agent can
choose "plan a safe path" vs. "imitate" without changing how it calls a policy.
GR00T N1.7 EA (#93) plus the unglamorous-but-essential wire-format fixes: service
(B, T, ...) shape + float32 state (#149), container lifecycle (#152), command-builder
flags (#150, #155). Why these are in a release at all: an off-by-one in the observation
tensor shape doesn't error - it silently degrades inference. Pinning the wire format is
what makes "GR00T support" a claim instead of a hope.
LeRobot local direct-HF inference with RTC (#56), device-resolution + postprocessor
warnings (#430), and the LeRobot 0.5.2 recording pipeline overhaul - synchronized
multi-robot, action-horizon batching, a camera-recorder race fix, full embodiment
coverage (#366). Why: recording is how you get training data; a race in the recorder
is silent data loss you only discover when your dataset trains a worse policy.

Evaluation: turning "looks good" into a number

Why: Without a benchmark, "the policy is better now" is unfalsifiable. We adopted a
benchmark-agnostic eval protocol (#129) and a LIBERO adapter + BDDL parser (#130), then
spent real effort making the eval honest:

Load the actual LIBERO scene MJCFs (#165) - evaluating against the wrong world is
worse than not evaluating. Snapshot/restore canonical qpos for procedural scenes (#168),
agree with robosuite's check_success (#173), reach success_rate > 0 on our engine
(#175), pack the gripper as the 2-element array the model was trained on (#162), bridge
EE FK into state for libero_panda (#161), per-episode reseed for reproducibility (#180).
We retired libero_offscreen_render once our engine was byte-equivalent to upstream
(#186) - why keep two renderers when one is provably the same?

Device Connect: the mesh transport for when you have real infrastructure

Why: The built-in Zenoh mesh is great for getting started, but organizations with
device fleets already have discovery/RPC/safety infrastructure. Device Connect (#370)
plugs into that as the primary transport and falls back to Zenoh when it isn't installed -
so you can start simple and graduate without a rewrite. CI installs the matching
device-connect packages from source while they're pre-release, so the integration is
tested against the version it actually targets.

Docs, CLI, and the boring stuff that makes it usable

Full MkDocs Material site + Pages CI (#160), Device Connect (#449) and security
(#465) pages, README rewritten for the v0.x Robot/mesh/sim story (#371). Why: a
platform nobody can onboard to isn't a platform.
strands-robots doctor (#419) - why: 90% of "it doesn't work" is an environment
problem (missing CUDA, wrong torch, no OpenGL). A diagnostic that exits non-zero under
NO_COLOR/TERM=dumb (#443) so CI can gate on it turns those tickets into self-service.
5 hero examples + hub-to-hardware walkthrough (#432, #381, #459) - the examples are
the spec for the ergonomics work; if the hero path is ugly, the API is wrong.

Build, CI & the security baseline

Why: This library tells an LLM to move physical motors. The supply chain and the
input-validation story are not optional.

Python >=3.12, uv as the hatch installer (#83), ruff replacing
black+isort+flake8 (#73), 11 granular optional-dependency groups (#14) - so you install
the brain you need, not the entire NVIDIA stack to talk to a serial port.
NVIDIA Thor/Jetson GPU torch (#374): a targeted torch 2.11 override fixing the
sm_110 cuBLAS bug, with UV_TORCH_BACKEND=auto. Why the complexity: the naive PyPI
torch wheel is CPU-only on Thor, so inference silently runs on CPU - "works but 50x too
slow" is the worst kind of bug.
Security baseline (#185, #189): CodeQL security-and-quality (catches the LLM-input →
subprocess/XML/path taint class), Dependency Review hard-failing on high/critical CVEs,
an LLM-input-safety annotation check, SHA-pinned actions + Dependabot. Plus path
validation on every filesystem-writing tool (#91). The pypa/gh-action-pypi-publish pin
is non-negotiable - a moving release branch there is exactly the tj-actions supply-chain
pattern.
A large test-coverage push (hardware_robot, assets, lerobot_*, pose/serial tools,
cli, doctor, registry, mesh sensors, predicates) and ASCII-only tool output enforced
everywhere (#434). Why ASCII: agents read these strings programmatically; emojis are
tokenizer noise and a stray combining mark breaks downstream parsing.

Upgrade notes

Python 3.12+ required.
Extras are granular - groot-service, cosmos3-service, cosmos3-diffusers,
cosmos3-sim, moveit2, curobo, lerobot, sim-mujoco, mesh, mesh-iot,
device-connect, or all.
cuRobo is not on PyPI - install from source (NVlabs/curobo); the [curobo] extra is
intentionally a no-op until a real release exists (the PyPI nvidia-curobo is a squatter).
Hardware is opt-in - Robot() defaults to sim; pass mode="real" or set
STRANDS_ROBOT_MODE=real deliberately.
Thor/Jetson - see the README Installation section for the UV_TORCH_BACKEND /
torchcodec CUDA-index caveat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

strands-robots v0.4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

strands-robots v0.4.0

The story of this release

Simulation: so you can develop without a robot on the bench

Robot() factory: one entry point

Mesh + AWS IoT: because a fleet is a security problem, not a networking problem

Policies: more brains, one socket

Evaluation: turning "looks good" into a number

Device Connect: the mesh transport for when you have real infrastructure

Docs, CLI, and the boring stuff that makes it usable

Build, CI & the security baseline

Upgrade notes

Uh oh!