Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision

Overview

Official codebase for T2DA: Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision. Paper link: [ArXiv].

Installation

Experiments require MuJoCo and Metaworld. Follow the instructions in the [MuJoCo][Metaworld] to install. Create a virtual environment using conda, and see requirments.txt file for more information about how to install the dependencies.

conda create -n t2da python=3.8.20 -y
conda activate t2da
pip install -r requirements.txt

Data Collection

Note that we set done = False in all environments.

Train SAC

We use SAC to train agents on different environments and collect datasets.
Train agents on different tasks in PointRobot-v0:

python train_data_collection.py --env point-robot --save_freq 40 --task_id_start 0 --task_id_end 5

in which task_id_start and task_id_end mean that training tasks of [task_id_start, task_id_end).

Generate Datasets

We use all checkpoints of traning process to generate datasets.

python get_datasets_mix.py --env point-robot --task_id_start 0 --task_id_end 5 --capacity 80

in which capacity means the number of transitions collected by one checkpoint. For example, for the PointRobot-v0 environment, we collect 80 transitions (4 episodes) per checkpoint.

Train Trajectory Encoder

Train the Trajectory Encoder on different tasks in PointRobot-v0:

python train_traj_encoder.py --env point-robot --context_horizon 20

We use an entire trajectory as the context, so the parameter context_horizon should be consistent with max_episode_steps of the environment. The trained checkpoint will be saved in saves_world_model/point-robot/.

Align Text Encoder with Trajectory Encoder

Fine-tune the text encoder to align the produced text embeddings with dynamics-aware decision embeddings:

python train_align.py --text_encoder clip --env point-robot --context_horizon 20

By modifying the parameter text_encoder, you can switch to using T5 or BERT as text_encoder. The trained checkpoint will be saved in saves_align/point-robot/.

Downstream Task Training

Text-to-Decision Diffuser

Train the Text-to-Decision Diffuser on different tasks in PointRobot-v0:

python train_t2d_diffuser.py --env point-robot --prompts_type aligned_clip

Evaluate the Text-to-Decision Diffuser on different tasks in PointRobot-v0:

python evaluate_parallel.py --env point-robot --prompts_type aligned_clip

Text-to-Decision Transformer

Train the Text-to-Decision Transformer on different tasks in PointRobot-v0:

python train_t2d_transformer.py --env point-robot --prompts_type aligned_clip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision

Overview

Installation

Data Collection

Train SAC

Generate Datasets

Train Trajectory Encoder

Align Text Encoder with Trajectory Encoder

Downstream Task Training

Text-to-Decision Diffuser

Text-to-Decision Transformer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
configs		configs
context		context
data_collection		data_collection
datasets		datasets
diffuser		diffuser
saves_align/point-robot/clip		saves_align/point-robot/clip
transformer		transformer
world_model		world_model
wrapped_metaworld		wrapped_metaworld
README.md		README.md
evaluate_parallel.py		evaluate_parallel.py
get_datasets_mix.py		get_datasets_mix.py
register_mujoco.py		register_mujoco.py
requirements.txt		requirements.txt
train_align.py		train_align.py
train_data_collection.py		train_data_collection.py
train_t2d_diffuser.py		train_t2d_diffuser.py
train_t2d_transformer.py		train_t2d_transformer.py
train_traj_encoder.py		train_traj_encoder.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision

Overview

Installation

Data Collection

Train SAC

Generate Datasets

Train Trajectory Encoder

Align Text Encoder with Trajectory Encoder

Downstream Task Training

Text-to-Decision Diffuser

Text-to-Decision Transformer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages