ACRoCo is a research extension of the RoCo: Dialectic Multi-Robot Collaboration with Large Language Models baseline for multi-robot collaboration with LLMs.
Our core idea is to convert open-ended LLM planning into action-constrained decision making via legality masks, then optimize collaborative behavior with MAPPO and hybrid LLM+RL policies.
- Action-constrained collaboration: legality masks remove unreachable/invalid actions before policy selection.
- Factorized legality masking: each agent action is decomposed into
(verb/object, target)heads with mask-driven filtering, turning open-ended planning into constrained choice. - MAPPO + CTDE training for multi-agent coordination under constrained action spaces.
- Primitive-aware architecture: macro actions are composed from reusable primitives (
REACH/GRASP/LIFT/TRANSLATE/RELEASE/PUSH/WAIT) to align policy actions with executable motion stages. - Hierarchical phase-adaptive reward: semantic-layer and physical-layer rewards are blended by execution phase/progress, improving stability across decision vs execution stages.
- Manager-style task adaptation: a task-defined composer (objects/targets/reachability/goal-map) auto-generates action space and masks, enabling reusable training pipelines.
- Mask-aware LLM prompting: legal action sets are exposed to LLM to reduce hallucinated decisions.
- Cross-task transfer: same training core reused across Sort and Sweep with task-specific vocab/mask hooks.
cite/ # cited baseline code (paper reference)
├─ rocobench/
├─ prompting/
└─ real_world/
rocobench/ # runnable core code for this project
prompting/ # prompt and LLM API implementation
real_world/ # real-robot integration bridge
rl/ # added RL modules (MAPPO, symbolic env, hybrid)
scripts/
├─ train/ # training entrypoints
├─ run/ # execution and evaluation entrypoints
└─ analysis/ # benchmarking, ablation, plotting, reporting scripts
figures/ # result figures for paper/report
conda.yml # exported conda environment snapshot
requirements.txt # exported pip dependency snapshot
LICENSE # MIT license (original notice preserved)
We provide two exported dependency snapshots from the roco conda environment:
- conda.yml: conda dependencies (with pip subsection)
- requirements.txt: full pip freeze from
roco
conda env create -f conda.yml
conda activate acroco
python -m pip install -e .conda create -n roco python=3.10 -y
conda activate acroco
python -m pip install -r requirements.txt
python -m pip install -e .# create and sync a local virtual environment
uv venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS / Linux
# source .venv/bin/activate
uv pip install -r requirements-roco.txt
uv pip install -e .# Sort
python scripts/train/train_rl_sort.py --steps 30000 --save checkpoints/sort_mappo.pt
# Sweep
python scripts/train/train_rl_sweep.py --steps 30000 --save checkpoints/sweep_mappo.pt
# Primitive architecture variant
python scripts/train/train_rl_primitive.py --steps 200000# RL benchmark
python scripts/analysis/benchmark_rl.py --train-steps 25000 --eval-episodes 200
# Mask-aware LLM hybrid ablation
python scripts/analysis/benchmark_mask_aware.py --episodes 8 --llm-model glm-4-flash
# Real MuJoCo rollout (Sort)
python scripts/run/run_rl_sort.py --method mappo --load checkpoints/sort_mappo.pt --num-runs 5 --snap-finished
# Real MuJoCo rollout (Sweep)
python scripts/run/run_rl_sweep.py --load checkpoints/sweep_mappo.pt --num-runs 3python scripts/analysis/make_training_curves.py --steps 25000
python scripts/analysis/make_plots.py
python scripts/analysis/plot_analysis.py --steps 15000Main dispatch logic is in prompting/llm_api.py:
chat_completion(...)routes requests by model/provider.NVIDIA_MODELSmaps short aliases (e.g.,"llama") to provider model IDs.DEEPSEEK_MODELScontrols which model names route to DeepSeek.
To add a model, choose one of these patterns:
# Alias-based NVIDIA model (recommended)
NVIDIA_MODELS["my-model"] = "vendor/model-id"
# call with model="my-model"# DeepSeek-routed model
DEEPSEEK_MODELS.add("deepseek-new-model")
# call with model="deepseek-new-model"# Direct provider model id
# If model contains "/", it is routed to NVIDIA branch by default.
model="vendor/model-id"prompting/llm_api.py uses load_dotenv(), so create a root .env file:
GLM_API_KEY=your_glm_key
NVIDIA_API_KEY=your_nvidia_key
DEEPSEEK_API_KEY=your_deepseek_key
# optional for real_world runner compatibility
OPENAI_API_KEY=your_openai_compatible_keyYou can also export these variables in your shell/CI environment directly.
A ready-to-copy template is provided at .env.example.
This project builds upon the original RoCo implementation by Mandi Zhao and collaborators.
If you use ACRoCo in research, please cite both the original RoCo work and this repository’s extension details.




