ACRoCo: Action-Constrained Dialectic Multi-Robot Collaboration with Large Language Models

ACRoCo is a research extension of the RoCo: Dialectic Multi-Robot Collaboration with Large Language Models baseline for multi-robot collaboration with LLMs.
Our core idea is to convert open-ended LLM planning into action-constrained decision making via legality masks, then optimize collaborative behavior with MAPPO and hybrid LLM+RL policies.

Highlights

Action-constrained collaboration: legality masks remove unreachable/invalid actions before policy selection.
Factorized legality masking: each agent action is decomposed into (verb/object, target) heads with mask-driven filtering, turning open-ended planning into constrained choice.
MAPPO + CTDE training for multi-agent coordination under constrained action spaces.
Primitive-aware architecture: macro actions are composed from reusable primitives (REACH/GRASP/LIFT/TRANSLATE/RELEASE/PUSH/WAIT) to align policy actions with executable motion stages.
Hierarchical phase-adaptive reward: semantic-layer and physical-layer rewards are blended by execution phase/progress, improving stability across decision vs execution stages.
Manager-style task adaptation: a task-defined composer (objects/targets/reachability/goal-map) auto-generates action space and masks, enabling reusable training pipelines.
Mask-aware LLM prompting: legal action sets are exposed to LLM to reduce hallucinated decisions.
Cross-task transfer: same training core reused across Sort and Sweep with task-specific vocab/mask hooks.

Results

Comparison:
Ablation:
Training Curves:
Mask-aware Hybrid:
Cross-task Generalization:

Structure

cite/                      # cited baseline code (paper reference)
├─ rocobench/
├─ prompting/
└─ real_world/

rocobench/                 # runnable core code for this project
prompting/                 # prompt and LLM API implementation
real_world/                # real-robot integration bridge
rl/                        # added RL modules (MAPPO, symbolic env, hybrid)

scripts/
├─ train/                  # training entrypoints
├─ run/                    # execution and evaluation entrypoints
└─ analysis/               # benchmarking, ablation, plotting, reporting scripts

figures/                   # result figures for paper/report
conda.yml                  # exported conda environment snapshot
requirements.txt           # exported pip dependency snapshot
LICENSE                    # MIT license (original notice preserved)

Reproduction

Environment

We provide two exported dependency snapshots from the roco conda environment:

conda.yml: conda dependencies (with pip subsection)
requirements.txt: full pip freeze from roco

Option A: Recreate with conda YAML

conda env create -f conda.yml
conda activate acroco
python -m pip install -e .

Option B: Recreate with pip lock file

conda create -n roco python=3.10 -y
conda activate acroco
python -m pip install -r requirements.txt
python -m pip install -e .

Option C: Recreate with uv

# create and sync a local virtual environment
uv venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS / Linux
# source .venv/bin/activate

uv pip install -r requirements-roco.txt
uv pip install -e .

Train

# Sort
python scripts/train/train_rl_sort.py --steps 30000 --save checkpoints/sort_mappo.pt

# Sweep
python scripts/train/train_rl_sweep.py --steps 30000 --save checkpoints/sweep_mappo.pt

# Primitive architecture variant
python scripts/train/train_rl_primitive.py --steps 200000

Evaluate / Run

# RL benchmark
python scripts/analysis/benchmark_rl.py --train-steps 25000 --eval-episodes 200

# Mask-aware LLM hybrid ablation
python scripts/analysis/benchmark_mask_aware.py --episodes 8 --llm-model glm-4-flash

# Real MuJoCo rollout (Sort)
python scripts/run/run_rl_sort.py --method mappo --load checkpoints/sort_mappo.pt --num-runs 5 --snap-finished

# Real MuJoCo rollout (Sweep)
python scripts/run/run_rl_sweep.py --load checkpoints/sweep_mappo.pt --num-runs 3

Plotting

python scripts/analysis/make_training_curves.py --steps 25000
python scripts/analysis/make_plots.py
python scripts/analysis/plot_analysis.py --steps 15000

Model Extension & API Key Configuration

1. Add a new model in `prompting/llm_api.py`

Main dispatch logic is in prompting/llm_api.py:

chat_completion(...) routes requests by model/provider.
NVIDIA_MODELS maps short aliases (e.g., "llama") to provider model IDs.
DEEPSEEK_MODELS controls which model names route to DeepSeek.

To add a model, choose one of these patterns:

# Alias-based NVIDIA model (recommended)
NVIDIA_MODELS["my-model"] = "vendor/model-id"
# call with model="my-model"

# DeepSeek-routed model
DEEPSEEK_MODELS.add("deepseek-new-model")
# call with model="deepseek-new-model"

# Direct provider model id
# If model contains "/", it is routed to NVIDIA branch by default.
model="vendor/model-id"

2. Configure API keys via `.env`

prompting/llm_api.py uses load_dotenv(), so create a root .env file:

GLM_API_KEY=your_glm_key
NVIDIA_API_KEY=your_nvidia_key
DEEPSEEK_API_KEY=your_deepseek_key
# optional for real_world runner compatibility
OPENAI_API_KEY=your_openai_compatible_key

You can also export these variables in your shell/CI environment directly.

A ready-to-copy template is provided at .env.example.

Citation & License

Acknowledgement

This project builds upon the original RoCo implementation by Mandi Zhao and collaborators.

License

Original license text is preserved in LICENSE (MIT).
Baseline cited code is kept under cite/.

If you use ACRoCo in research, please cite both the original RoCo work and this repository’s extension details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACRoCo: Action-Constrained Dialectic Multi-Robot Collaboration with Large Language Models

Highlights

Results

Structure

Reproduction

Environment

Option A: Recreate with conda YAML

Option B: Recreate with pip lock file

Option C: Recreate with uv

Train

Evaluate / Run

Plotting

Model Extension & API Key Configuration

1. Add a new model in `prompting/llm_api.py`

2. Configure API keys via `.env`

Citation & License

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cite		cite
figures		figures
prompting		prompting
real_world		real_world
rl		rl
rocobench		rocobench
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda.yml		conda.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ACRoCo: Action-Constrained Dialectic Multi-Robot Collaboration with Large Language Models

Highlights

Results

Structure

Reproduction

Environment

Option A: Recreate with conda YAML

Option B: Recreate with pip lock file

Option C: Recreate with uv

Train

Evaluate / Run

Plotting

Model Extension & API Key Configuration

1. Add a new model in prompting/llm_api.py

2. Configure API keys via .env

Citation & License

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Add a new model in `prompting/llm_api.py`

2. Configure API keys via `.env`

Packages