CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks

CRAFT is a a framework that leverages the reasoning capabilities of foundation models to act as a “coach” for multi-robot coordination. CRAFT consist of 5 key stages:

Curriculum generation: A curriculum LLM decomposes the long-horizon coordination task into a sequence of subtasks
Reward function generation: A reward generation LLM generates a reward function in executable python code based on the natural language descriptions of a subtask.
Policy evaluation: An evaluation VLM evaluates the success or failure of the policy trained with the LLM-generated reward.
Reward refinement: If the policy fails to achieve the desired behavior, an advice VLM provides advice on how to change the reward.
Sequential training of subtasks: we initialize each subtask with the policy learned from the previous one while motivating exploration to learn the new subtask.

Introduction

Learning coordinated behaviors remains a significant challenge in multi-robot systems. Multi-Agent Reinforcement Learning (MARL) offers a promising framework, but applying MARL to robotics is notoriously difficult due to the high-dimensional action spaces, complex reward design, and the non-stationarity introduced by decentralized decision-making.

To solve this problem, curriculum generation has been studied as an effective approach for learning coordination and strategies in MARL domains by structuring the training into stages of increasing complexity. However, curriculum design is often non-trivial as it requires domain knowledge to identify key steps in long-horizon tasks and reasoning ability to monitor the learning progress.

Our key idea is to use human-like reasoning capabilities of foundation models, such as LLM or VLM, as a coach, the entity that teaches agents how to coordinate. Coaching naturally integrates all these skills: a good coach breaks down the task, defines success criteria for each subtask, and provides actionable feedback to guide improvement. Similarily, CRAFT produces curricula for long-horizon coordination tasks, designs semantically rich reward functions, and evaluates task–policy alignment using visual information.

Installation

We will announce our code after refactoring.

Acknowledgement

Our bimanual manipulation environment is from Robosuite
Our quadruped navigation environment is from MQE
We use SKRL and OpenRL for MAPPO implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks

Introduction

Installation

Acknowledgement

About

Uh oh!

Releases

Packages

labicon/CRAFT

Folders and files

Latest commit

History

Repository files navigation

CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks

Introduction

Installation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages