strands-robots v0.4.0
150+ commits since v0.3.8 (Feb 20). This is the release where strands-robots
stops being "policy inference glue for a single arm on your desk" and becomes a
platform you can simulate on, evaluate on, and deploy to a fleet with - without
swapping libraries at each stage. The through-line of this cycle was closing the gap
between "it runs a model" and "it runs a robot you'd trust in a room with people."
Python >=3.12 required (LeRobot >=0.5.0 floor).
pip install strands-robots· everything:pip install "strands-robots[all]"
The story of this release
Three forces drove almost every PR:
-
The sim-to-real loop was broken in the middle. You could load a policy and you
could talk to hardware, but there was no first-class simulator in between - so you
couldn't develop a policy without a physical robot, and you couldn't reproduce a
failure without the exact bench setup. We built a MuJoCo backend and aRobot()
factory so the same code path runs in sim and on metal. -
"One robot" assumptions don't survive contact with a fleet. The moment more than
one robot, or more than one operator, is involved, you need identity, presence,
replay protection, an audit trail, and a human able to hit stop. We built the mesh
and its AWS IoT transport around that reality, and spent a large fraction of the
cycle hardening it rather than adding surface. -
"It produced an action array" is not "it succeeded." We added a real evaluation
protocol (LIBERO) so policy quality is a number we can defend, not a vibe from
watching a render.
Everything below ladders up to one of those three.
Simulation: so you can develop without a robot on the bench
Why: Iterating on a policy against physical hardware is slow, unsafe, and
unreproducible. A bug that only shows up at 50 Hz on a real servo is nearly impossible
to bisect. We needed a simulator that is byte-equivalent enough to trust and that
exposes the same agent-tool surface as the rest of the library.
- MuJoCo backend as an
AgentToolwith 50+ actions (#85) on a foundation of
models, thePolicy/engine ABCs, a factory, a model registry, and asset
auto-download (#84, #105/#106). The agent drives the world the same way it drives a
real robot - world building, robot inject/eject, cameras, stepping, rollout, render. - Control-rate substepping (#353) and stepping physics for the full control
period in eval (#429). Why it matters: position-servo policies were silently
failing to track because we stepped the sim once per action instead of for the whole
control interval - the policy looked broken when the integration was. This is the
kind of bug that costs a week on hardware; we paid it down once, in sim. - Render fidelity fixes - blown-out white ground (#428), ground-plane z-fighting on
attach (#360), conditional ground-strip + tendon scale + RNG parity (#400).
Why it matters: renders are the policy's input in vision models. A blown-out
frame isn't cosmetic - it's feeding the policy garbage and poisoning eval. - Naming/identity correctness: register sim robots under the user's name not the
canonical one (#435), structured errors from resolvers (#417), optionalrobot_name
across the state family (#412). Why: the sim has to address robots the way the user
thinks about them, or multi-robot scenes become a guessing game. SimEngine.describe()discovery surface (#407) - so an agent can ask "what can I do
here?" instead of failing on an unknown action key. We also started surfacing valid
actuator names on errors and propagating failures intorun_policystatus (#436),
because a silent zero-action on failure is the single most dangerous default in robotics.
Robot() factory: one entry point
Why: Before this, choosing sim vs. real meant choosing a different code path, and
the path that touches a physical motor was just as easy to invoke by accident as the
safe one. That's backwards.
Robot()factory + top-level lazy imports (#86; hygiene follow-ups #145).
Sim is the default; real hardware is an explicit opt-in (mode="real"/
STRANDS_ROBOT_MODE=real). Lazy imports keepimport strands_robotscheap so the
factory doesn't drag the entire ML stack into a process that just wants to talk to a
serial port.- Ergonomics for
send_action,add_robot,render,get_robot(#431) - the
paper cuts that make the difference between an API you demo and an API you live in.
Mesh + AWS IoT: because a fleet is a security problem, not a networking problem
Why: Connecting robots is easy. Connecting robots safely - where a stale command
can't be replayed, a compromised peer can't impersonate another, an operator can always
intervene, and every actuation is on an audit trail - is the actual job. We treated the
mesh as a trust boundary from day one, which is why the hardening PRs outnumber the
feature PRs here.
- Core mesh - session, presence, RPC, streams, wiring, + AWS IoT transport (#101).
Zenoh for the LAN, MQTT5/mTLS for the cloud, Device Shadow mirror, S3 camera offload,
account-wide Fleet Provisioning. - The #195 hardening split landed as a deliberate sequence so each layer could be
reviewed in isolation: PKI helpers + conftest (#220), payload validation / action
allowlist (#223), Zenoh + ACL config with mTLS/downsampling/low-pass (#224),
tamper-evident HMAC audit log with per-peer sequence + rotation (#221), cross-transport
dedup + monotonic TTL + strict mode (#222), replay caches + override-resume + safety
topic handlers (#225), and robot_mesh HITL viatool_context.interrupt+ per-action
rate limit (#227). Why split: security review fatigue is real; a 9-part series each
reviewable in an afternoon catches more than one 4000-line PR nobody reads to the end. - Human-in-the-loop done right (#227, #411): a declined approval must not consume a
rate-limit slot (or an operator's "no" could lock out a legitimate e-stop), and the
operator's literal reply is never echoed back into the LLM context (that turns a human
into a prompt-injection channel). Read-only actions are audited too (#411) - operators
need "the agent read N frames at time T", not just actuation logs. - Replay/lockout safety pins: estop engages even when the per-issuer replay cache is
full (#263/#339) - the cache bounds memory, never safety; resume-cache fairness mirror
(#342); check-then-set estop replay lock (#273/#361); poison records on every degraded
audit path so a stream gap is attributable, not silent (#410). Why this obsession:
in this domain the failure mode isn't "wrong answer", it's "arm moves when it shouldn't"
or "stop didn't take." Fail-loud, fail-safe, always. - AWS IoT provisioning hardening: CA pin + thing-name regex + scoped policy (#228),
deny-by-default Fleet Provisioning hook (#333), operator-shadow + response publishes
scoped to the device's own ThingName (#334, #336), atomic break-glass marker with
explicit symlink reject (#388/#402). Why: fleet provisioning is the blast radius -
a permissive default here means one bad cert owns the whole account. - Teleop integrity: validate input frames before apply (#332), route every teleop
publish through the singleMesh.publish()chokepoint (#452). One door, guarded. - Sim is a first-class mesh peer:
tell()dispatch maps torun_policy/start_policy
(#304), sim joint state bridges to child peers (#422), sim cameras publish JPEG frames
(#425). Why: if sim and real don't look identical on the mesh, your fleet tooling
can't be tested in sim - which defeats the point of having a sim.
Policies: more brains, one socket
Why: The field is moving fast and no single policy wins everywhere - VLAs for
open-ended manipulation, classical planners for collision-aware motion. The Policy ABC
exists so adding a new brain doesn't fork the stack. This cycle we proved the abstraction
holds by hanging very different things off it.
- NVIDIA Cosmos 3 omnimodal VLA (#317) with both a service backend (msgpack/websockets,
GPU-isolated) and an in-processdiffusersbackend (#458) for when you'd rather not
run a sidecar. We re-anchor IK on the achieved EE pose each step (#462) - why: over a
long rollout, integrating the model's relative pose deltas drifts; anchoring on where the
arm actually is bounds the tracking error instead of letting it compound. - MoveIt2Policy under
[moveit2](#305) and cuRobo migrated to themainAPI
(#442) - collision-aware planners living under the same ABC as the VLAs, so an agent can
choose "plan a safe path" vs. "imitate" without changing how it calls a policy. - GR00T N1.7 EA (#93) plus the unglamorous-but-essential wire-format fixes: service
(B, T, ...)shape + float32 state (#149), container lifecycle (#152), command-builder
flags (#150, #155). Why these are in a release at all: an off-by-one in the observation
tensor shape doesn't error - it silently degrades inference. Pinning the wire format is
what makes "GR00T support" a claim instead of a hope. - LeRobot local direct-HF inference with RTC (#56), device-resolution + postprocessor
warnings (#430), and the LeRobot 0.5.2 recording pipeline overhaul - synchronized
multi-robot, action-horizon batching, a camera-recorder race fix, full embodiment
coverage (#366). Why: recording is how you get training data; a race in the recorder
is silent data loss you only discover when your dataset trains a worse policy.
Evaluation: turning "looks good" into a number
Why: Without a benchmark, "the policy is better now" is unfalsifiable. We adopted a
benchmark-agnostic eval protocol (#129) and a LIBERO adapter + BDDL parser (#130), then
spent real effort making the eval honest:
- Load the actual LIBERO scene MJCFs (#165) - evaluating against the wrong world is
worse than not evaluating. Snapshot/restore canonical qpos for procedural scenes (#168),
agree with robosuite'scheck_success(#173), reachsuccess_rate > 0on our engine
(#175), pack the gripper as the 2-element array the model was trained on (#162), bridge
EE FK into state forlibero_panda(#161), per-episode reseed for reproducibility (#180).
We retiredlibero_offscreen_renderonce our engine was byte-equivalent to upstream
(#186) - why keep two renderers when one is provably the same?
Device Connect: the mesh transport for when you have real infrastructure
Why: The built-in Zenoh mesh is great for getting started, but organizations with
device fleets already have discovery/RPC/safety infrastructure. Device Connect (#370)
plugs into that as the primary transport and falls back to Zenoh when it isn't installed -
so you can start simple and graduate without a rewrite. CI installs the matching
device-connect packages from source while they're pre-release, so the integration is
tested against the version it actually targets.
Docs, CLI, and the boring stuff that makes it usable
- Full MkDocs Material site + Pages CI (#160), Device Connect (#449) and security
(#465) pages, README rewritten for thev0.xRobot/mesh/sim story (#371). Why: a
platform nobody can onboard to isn't a platform. strands-robots doctor(#419) - why: 90% of "it doesn't work" is an environment
problem (missing CUDA, wrong torch, no OpenGL). A diagnostic that exits non-zero under
NO_COLOR/TERM=dumb(#443) so CI can gate on it turns those tickets into self-service.- 5 hero examples + hub-to-hardware walkthrough (#432, #381, #459) - the examples are
the spec for the ergonomics work; if the hero path is ugly, the API is wrong.
Build, CI & the security baseline
Why: This library tells an LLM to move physical motors. The supply chain and the
input-validation story are not optional.
- Python >=3.12,
uvas the hatch installer (#83), ruff replacing
black+isort+flake8 (#73), 11 granular optional-dependency groups (#14) - so you install
the brain you need, not the entire NVIDIA stack to talk to a serial port. - NVIDIA Thor/Jetson GPU torch (#374): a targeted torch 2.11 override fixing the
sm_110 cuBLAS bug, withUV_TORCH_BACKEND=auto. Why the complexity: the naive PyPI
torch wheel is CPU-only on Thor, so inference silently runs on CPU - "works but 50x too
slow" is the worst kind of bug. - Security baseline (#185, #189): CodeQL security-and-quality (catches the LLM-input →
subprocess/XML/path taint class), Dependency Review hard-failing on high/critical CVEs,
an LLM-input-safety annotation check, SHA-pinned actions + Dependabot. Plus path
validation on every filesystem-writing tool (#91). Thepypa/gh-action-pypi-publishpin
is non-negotiable - a moving release branch there is exactly thetj-actionssupply-chain
pattern. - A large test-coverage push (hardware_robot, assets, lerobot_*, pose/serial tools,
cli, doctor, registry, mesh sensors, predicates) and ASCII-only tool output enforced
everywhere (#434). Why ASCII: agents read these strings programmatically; emojis are
tokenizer noise and a stray combining mark breaks downstream parsing.
Upgrade notes
- Python 3.12+ required.
- Extras are granular -
groot-service,cosmos3-service,cosmos3-diffusers,
cosmos3-sim,moveit2,curobo,lerobot,sim-mujoco,mesh,mesh-iot,
device-connect, orall. - cuRobo is not on PyPI - install from source (NVlabs/curobo); the
[curobo]extra is
intentionally a no-op until a real release exists (the PyPInvidia-curobois a squatter). - Hardware is opt-in -
Robot()defaults to sim; passmode="real"or set
STRANDS_ROBOT_MODE=realdeliberately. - Thor/Jetson - see the README Installation section for the
UV_TORCH_BACKEND/
torchcodec CUDA-index caveat.