Skip to content

flybbits/ACRoCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ACRoCo: Action-Constrained Dialectic Multi-Robot Collaboration with Large Language Models

ACRoCo is a research extension of the RoCo: Dialectic Multi-Robot Collaboration with Large Language Models baseline for multi-robot collaboration with LLMs.
Our core idea is to convert open-ended LLM planning into action-constrained decision making via legality masks, then optimize collaborative behavior with MAPPO and hybrid LLM+RL policies.


Highlights

  • Action-constrained collaboration: legality masks remove unreachable/invalid actions before policy selection.
  • Factorized legality masking: each agent action is decomposed into (verb/object, target) heads with mask-driven filtering, turning open-ended planning into constrained choice.
  • MAPPO + CTDE training for multi-agent coordination under constrained action spaces.
  • Primitive-aware architecture: macro actions are composed from reusable primitives (REACH/GRASP/LIFT/TRANSLATE/RELEASE/PUSH/WAIT) to align policy actions with executable motion stages.
  • Hierarchical phase-adaptive reward: semantic-layer and physical-layer rewards are blended by execution phase/progress, improving stability across decision vs execution stages.
  • Manager-style task adaptation: a task-defined composer (objects/targets/reachability/goal-map) auto-generates action space and masks, enabling reusable training pipelines.
  • Mask-aware LLM prompting: legal action sets are exposed to LLM to reduce hallucinated decisions.
  • Cross-task transfer: same training core reused across Sort and Sweep with task-specific vocab/mask hooks.

Results

  • Comparison: figures/fig_comparison.png
  • Ablation: figures/fig_ablation.png
  • Training Curves: figures/fig_curves.png
  • Mask-aware Hybrid: figures/fig_mask_aware.png
  • Cross-task Generalization: figures/fig_cross_task.png

Structure

cite/                      # cited baseline code (paper reference)
├─ rocobench/
├─ prompting/
└─ real_world/

rocobench/                 # runnable core code for this project
prompting/                 # prompt and LLM API implementation
real_world/                # real-robot integration bridge
rl/                        # added RL modules (MAPPO, symbolic env, hybrid)

scripts/
├─ train/                  # training entrypoints
├─ run/                    # execution and evaluation entrypoints
└─ analysis/               # benchmarking, ablation, plotting, reporting scripts

figures/                   # result figures for paper/report
conda.yml                  # exported conda environment snapshot
requirements.txt           # exported pip dependency snapshot
LICENSE                    # MIT license (original notice preserved)

Reproduction

Environment

We provide two exported dependency snapshots from the roco conda environment:

Option A: Recreate with conda YAML

conda env create -f conda.yml
conda activate acroco
python -m pip install -e .

Option B: Recreate with pip lock file

conda create -n roco python=3.10 -y
conda activate acroco
python -m pip install -r requirements.txt
python -m pip install -e .

Option C: Recreate with uv

# create and sync a local virtual environment
uv venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS / Linux
# source .venv/bin/activate

uv pip install -r requirements-roco.txt
uv pip install -e .

Train

# Sort
python scripts/train/train_rl_sort.py --steps 30000 --save checkpoints/sort_mappo.pt

# Sweep
python scripts/train/train_rl_sweep.py --steps 30000 --save checkpoints/sweep_mappo.pt

# Primitive architecture variant
python scripts/train/train_rl_primitive.py --steps 200000

Evaluate / Run

# RL benchmark
python scripts/analysis/benchmark_rl.py --train-steps 25000 --eval-episodes 200

# Mask-aware LLM hybrid ablation
python scripts/analysis/benchmark_mask_aware.py --episodes 8 --llm-model glm-4-flash

# Real MuJoCo rollout (Sort)
python scripts/run/run_rl_sort.py --method mappo --load checkpoints/sort_mappo.pt --num-runs 5 --snap-finished

# Real MuJoCo rollout (Sweep)
python scripts/run/run_rl_sweep.py --load checkpoints/sweep_mappo.pt --num-runs 3

Plotting

python scripts/analysis/make_training_curves.py --steps 25000
python scripts/analysis/make_plots.py
python scripts/analysis/plot_analysis.py --steps 15000

Model Extension & API Key Configuration

1. Add a new model in prompting/llm_api.py

Main dispatch logic is in prompting/llm_api.py:

  • chat_completion(...) routes requests by model/provider.
  • NVIDIA_MODELS maps short aliases (e.g., "llama") to provider model IDs.
  • DEEPSEEK_MODELS controls which model names route to DeepSeek.

To add a model, choose one of these patterns:

# Alias-based NVIDIA model (recommended)
NVIDIA_MODELS["my-model"] = "vendor/model-id"
# call with model="my-model"
# DeepSeek-routed model
DEEPSEEK_MODELS.add("deepseek-new-model")
# call with model="deepseek-new-model"
# Direct provider model id
# If model contains "/", it is routed to NVIDIA branch by default.
model="vendor/model-id"

2. Configure API keys via .env

prompting/llm_api.py uses load_dotenv(), so create a root .env file:

GLM_API_KEY=your_glm_key
NVIDIA_API_KEY=your_nvidia_key
DEEPSEEK_API_KEY=your_deepseek_key
# optional for real_world runner compatibility
OPENAI_API_KEY=your_openai_compatible_key

You can also export these variables in your shell/CI environment directly.

A ready-to-copy template is provided at .env.example.


Citation & License

Acknowledgement

This project builds upon the original RoCo implementation by Mandi Zhao and collaborators.

License

  • Original license text is preserved in LICENSE (MIT).
  • Baseline cited code is kept under cite/.

If you use ACRoCo in research, please cite both the original RoCo work and this repository’s extension details.

About

Action-Constrained Dialectic Multi-Robot Collaboration with Large Language Models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages