LeAgents

Agentic orchestration for the LeRobot robotics pipeline — an orchestrator drives an automated collect → train → eval → improve loop over LeRobotDataset v3.0, with a deterministic loop controller, a constitution safety gate, verification gates before promotion, and (M2) a dashboard for visualizing the flow.

Architecture, research grounding (verified 2023–2026 papers), and roadmap: DESIGN.md.

Status — v0.0.4

Milestone	Scope	Status
M0	Sim-only loop on LIBERO: seed dataset → SmolVLA fine-tune → `lerobot-eval` gate → promote/iterate/escalate/rollback	✅ done — full-scale autonomous run completed (see below); PushT/LIBERO smoke configs included
M1	DexFlyWheel-style self-improvement, RoboGene-style task curation, policy escalation, OKF knowledge layer (Karpathy-wiki-style, DESIGN.md §3.6) + provider-agnostic LLM proposer	🚧 knowledge layer + LLM adapter landed
M2	Flow dashboard (Rerun episode replay, WandB curves, OTel agent traces)	🚧 flow view v1 landed: runs → cycles → decisions live, eval chart, event log, knowledge browser (`leagents dash`)
M3	Real robot: teleop collection, HIL-SERL adapter (requires lerobot ≥ 0.6.0, see CVE note in DESIGN.md §6)	planned

What works today: the full loop state machine with budgets, the constitution gate, SQLite job store, JSONL event log, subprocess wrappers for lerobot-train / lerobot-eval, the OKF knowledge layer (knowledge/ pages with provenance, updated every cycle, linted), the DexFlyWheel data path (success-filtered rollout harvesting → accumulated mix → adaptation training), and a provider-agnostic LLM adapter (llm: gemini:*|anthropic:*|openai:*[@base_url], or none at all — every flow has a deterministic fallback). All covered by tests that run without a GPU or lerobot installed.

First full-scale autonomous run (2026-07-04)

One M0 run on a single RTX 5070 Ti (16 GB), fully autonomous — 3 cycles, 6.1 GPU-hours, zero human intervention, every decision/event/knowledge-page update logged:

Cycle	Data	Train	LIBERO spatial eval (100 episodes)	Decision
0	40 episodes	20k steps from `smolvla_base` (loss → 0.030)	0% success — arm reaches targets, never completes	promote (baseline)
1	80	20k steps, continued from the blessed checkpoint	0%	iterate
2	160	20k steps	0%	iterate — the `escalate_floor` guard correctly refused to escalate a 0%-plateau to a bigger policy

This run validated the loop — budgets held, weights carried over, and the decision function behaved exactly as specified.

Root-cause correction (same day): the flat 0% was first attributed to the data budget (4→16 episodes per task vs. the verified ~50). Digging into a follow-up run that stayed at 0% with 20 per task exposed the real cause: HuggingFaceVLA/libero is suite-ordered, and the [0..N) episode prefix belongs to other suites (libero_spatial episodes live around indices 1261–1538) — every run had trained on tasks disjoint from the eval. In the metrics this silent failure was indistinguishable from under-training. Fix: episode selection is now task-filtered against the eval suite (leagents.scripts.select_episodes, data.task_filter), balanced per task, and the Data Agent fails loudly if the selection doesn't cover the suite. The table above stands as a record of the failure mode.

Quickstart

— one autonomous cycle (PushT) on a free GPU, ~5 minutes, nothing to install locally.

pip install -e ".[dev]"
pytest

# dry run — no GPU, no lerobot; synthetic eval scores exercise the decision logic
leagents run -c configs/m0_libero.yaml --dry-run

# real run (Linux; needs a GPU and the LIBERO extras)
pip install -e ".[lerobot]"
leagents run -c configs/m0_libero.yaml

# inspect runs, cycles, decisions, blessed checkpoints
leagents status

# flow dashboard — runs, cycle pipeline, decisions, eval chart, events, knowledge
pip install -e ".[dash]"
leagents dash            # → http://127.0.0.1:8321

No root, no Docker? Use pixi

On shared servers the only step that needs sudo is LIBERO's egl-probe build (system EGL headers). pixi supplies Python, the EGL/OpenGL headers, a C++ toolchain, and CMake 3.x from conda-forge instead — zero root required:

pixi run test                  # loop tests, no GPU needed
pixi -e lerobot run doctor     # full environment checks (GPU/EGL/LIBERO)
pixi -e lerobot run smoke-pusht

How the loop decides

Each cycle trains a candidate checkpoint and evaluates it on LIBERO. The decision is a pure function of success-rate deltas vs. the blessed baseline (leagents/orchestrator/decision.py):

promote — candidate beats baseline by ≥ promote_delta; it becomes the new blessed checkpoint
iterate — small improvement; collect more data on failing task variations
escalate — plateaued for plateau_cycles; move up the policy ladder (SmolVLA → π0.5)
rollback — regression; the blessed checkpoint stands

Control flow is deterministic Python persisted to SQLite — LLM agents (task proposal, curation; M1) only make proposals inside the gates, never control-flow decisions.

Layout

leagents/
├── orchestrator/   # loop controller, decision logic, constitution gate, proposer
├── agents/         # data / train / eval / improve agents (LeRobot CLI wrappers)
├── contracts/      # typed records: DatasetRef, CheckpointRecord, EvalReport
├── events/         # JSONL event bus (dashboard reads this in M2)
└── store/          # SQLite job store (runs, cycles, checkpoints)
configs/            # m0_libero.yaml, constitution.yaml
tests/              # loop e2e with fake runners — no GPU needed

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
knowledge		knowledge
leagents		leagents
notebooks		notebooks
tests		tests
.env.example		.env.example
.gitignore		.gitignore
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeAgents

Status — v0.0.4

First full-scale autonomous run (2026-07-04)

Quickstart

No root, no Docker? Use pixi

How the loop decides

Layout

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LeAgents

Status — v0.0.4

First full-scale autonomous run (2026-07-04)

Quickstart

No root, no Docker? Use pixi

How the loop decides

Layout

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages