Skip to content

Code for 'Strengthening Generative Robot Policies through Predictive World Modeling'

Notifications You must be signed in to change notification settings

han20192019/gpc_code

Repository files navigation

Strengthening Generative Robot Policies through Predictive World Modeling — Official Code Release

Official implementation of:

Strengthening Generative Robot Policies through Predictive World Modeling
Han Qi, Haocheng Yin, Aris Zhu, Yilun Du, Heng Yang
arXiv 2025
Paper: https://arxiv.org/abs/2502.00622


Overview

This repository provides the official implementation of the framework proposed in the paper:

We strengthen diffusion-based generative robot policies by integrating a predictive world model that enables long-horizon reasoning and improved robustness.

The framework consists of two main components:

  1. Diffusion-based Action Policy
    Generates action sequences using a generative diffusion model.

  2. Predictive World Model
    Learns environment dynamics to evaluate and refine candidate action trajectories.

At inference time, the world model enhances policy performance through trajectory prediction and ranking/optimization.

We use push-T experiment as an example in code.


Repository Structure

.
├── all_checkpoint/                   # Pretrained checkpoints (policy + world model)
├── diffusion_policy_data/            # Training data for diffusion action policy
├── diffusion_policy_training/        # Training code for diffusion-based action policy
├── gpc_opt_evaluation/               # Evaluation with GPC-OPT (trajectory optimization)
├── gpc_rank_evaluation/              # Evaluation with GPC-RANK (trajectory ranking)
├── world_model_data/                 # Training data for predictive world model
├── world_model_train_phase_one/      # Phase I: single-step world model warmup training
└── world_model_train_phase_two/      # Phase II: multi-step world model training

Installation

1. Clone the repository

git clone https://github.com/han20192019/gpc_code.git
cd gpc_code

2. Install dependencies

We recommend using a clean conda environment:

conda env create -f environment.yml
conda activate gpc

Checkpoints & Datasets

  • Pretrained checkpoints:
    https://huggingface.co/han2019/gpc_checkpoints/tree/main

Please download the checkpoints under a folder named 'all_checkpoint' in the root folder.

  • Diffusion policy training dataset:
    https://huggingface.co/datasets/han2019/gpc_pushT_data/tree/main/diffusion_policy_data

  • World model training dataset:
    https://huggingface.co/datasets/han2019/gpc_pushT_data/tree/main/world_model_data

Please download the datasets and place them in the root folder.


Training

There are two independent modules to train:


1️⃣ Train the Diffusion Action Policy

Directory:

diffusion_policy_training/

Run:

python train_model.py

This trains the diffusion-based generative policy that produces candidate action sequences.


2️⃣ Train the Predictive World Model

World model training is performed in two stages, as described in the paper.


(a) Phase One — Single-Step Warmup Training

Directory:

world_model_train_phase_one/

Run:

python train.py

This stage trains the world model for single-step prediction, which stabilizes early training and improves multi-step rollout performance.


(b) Phase Two — Multi-Step Training

Directory:

world_model_train_phase_two/

Run:

python train.py

This stage trains the model for multi-step rollouts, enabling long-horizon trajectory evaluation.


Evaluation

After training both the policy and world model, you can evaluate the integrated system.


GPC-RANK (Trajectory Ranking)

Directory:

gpc_rank_evaluation/

Run:

python gpc_rank_evaluation.py

This mode:

  • Samples candidate action sequences from the diffusion policy
  • Uses the predictive world model to simulate future states
  • Ranks trajectories
  • Executes the highest-scoring candidate

GPC-OPT (Trajectory Optimization)

Directory:

gpc_opt_evaluation/

Run:

python gpc_opt_evaluation.py

This mode:

  • Uses the world model to iteratively optimize action sequences
  • Improves performance through predictive refinement

Citation

If you find this work useful, please cite:

@article{qi2025strengthening,
  title={Strengthening generative robot policies through predictive world modeling},
  author={Qi, Han and Yin, Haocheng and Zhu, Aris and Du, Yilun and Yang, Heng},
  journal={arXiv preprint arXiv:2502.00622},
  year={2025}
}

About

Code for 'Strengthening Generative Robot Policies through Predictive World Modeling'

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages