Official code release for Play2Perfect, a 2-stage reinforcement-learning pipeline for contact-rich robotic assembly.
📄 Paper: https://arxiv.org/abs/2606.26428
highlight_reel.mp4
Play2Perfect learns precise, contact-rich assembly in two stages:
- Play — pretrain a policy by playing with random objects in free space.
- Finetune — finetune the play policy on a contact-rich assembly task.
This repository provides:
- Training code for both stages in Isaac Sim (Isaac Lab).
- Interactive (viser) evaluation for four assembly tasks using our released checkpoints.
- A minimal sim-to-real deployment reference.
This release targets Isaac Sim only.
play2perfect/
├── isaacsimenvs/ # Training: Stage-1 play env + Stage-2 precise-assembly env
│ ├── train.py # Single training entry point (both stages)
│ ├── cfg/ # Hydra task + train configs
│ └── tasks/{play,precise_assembly}/
├── evaluation/ # The 4 assembly tasks: problem registry + viser evaluation
│ ├── problems/ # peg-in-hole, fabrica beam parts 0 & 2, furniture-bench
│ └── eval_isaacsim.py # Interactive viser evaluation
├── rl_games/ # Vendored RL library (PPO + SAPG)
├── deployment/ # Minimal sim-to-real reference
├── assets/urdf/ # Robot + task assets
└── docs/ # Installation + deployment guides
| Task | problem key |
|---|---|
| Tight insertion (L-peg, 0.5 mm tolerance) | tight_insertion |
| Beam assembly — step 1 | beam_assembly_step1 |
| Beam assembly — step 2 | beam_assembly_step2 |
| Screwing (furniture leg) | screwing |
See docs/isaacsim_installation.md for the full Isaac Sim /
Isaac Lab setup (Python 3.11, .venv_isaacsim).
# Stage-1 play policy + the 4 finetuned assembly policies
python download_checkpoints.py# One problem from a single checkpoint
python evaluation/eval_isaacsim.py \
--checkpoint-path pretrained_assembly/tight_insertion/model.pth \
--problem tight_insertion \
--port 8044
# ...or load all four downloaded policies and switch between them in the GUI
python evaluation/eval_isaacsim.py --policies-dir pretrained_assembly --port 8044Open http://localhost:8044 in a browser to watch the policy roll out the insertion.
Headless, batched eval that scores each problem over many parallel envs (one i.i.d. first-episode trial per env) and prints a success-rate table:
python evaluation/eval_offline.py --policies-dir pretrained_assemblypython isaacsimenvs/train.py \
--task Isaacsimenvs-Play-Direct-v0 \
--agent rl_games_cfg_entry_point \
--headlesspython isaacsimenvs/train.py \
--task Isaacsimenvs-PreciseAssembly-Direct-v0 \
--agent rl_games_cfg_entry_point \
--checkpoint pretrained_policy/model.pth \
env.precise_assembly.problem=tight_insertion \
--headlessSee docs/deployment.md for the minimal sim-to-real reference.
./format.sh # ruff over the Python packages + xmllint over URDFsThis codebase builds on excellent prior work:
- rl_games — the high-throughput RL library our training loop is built on (vendored here).
- SAPG — Split and Aggregate Policy Gradients, the large-scale PPO variant we use for training.
- SimToolReal — our prior work on object-centric tool manipulation, which the Stage-1 "play" pretraining is based on.
@misc{lum2026play2perfectmattersdexterousplay,
title={Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?},
author={Tyler Ga Wei Lum and Kushal Kedia and C. Karen Liu and Jeannette Bohg},
year={2026},
eprint={2606.26428},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2606.26428},
}- Kushal Kedia — kk837@cornell.edu
- Tyler Lum — tylerlum@stanford.edu