Skip to content

Pandora: Towards General World Model with Natural Language Actions and Video States

Notifications You must be signed in to change notification settings

maitrix-org/Pandora

Repository files navigation

Pandora: Towards General World Model with Natural Language Actions and Video States

We introduce Pandora, a step towards a General World Model (GWM) that:

  1. Simulates world states by generating videos across any domains
  2. Allows any-time control with actions expressed in natural language

Please refer to world-model.ai for results.

[Website] [Paper] [Model] [Gallery]

struct

News

  • [2024/05/23] Release the model and inference code.
  • [2024/05/23] Launch the website and release the paper.

Setup

conda create -n pandora python=3.12.3 nvidia/label/cuda-12.1.0::cuda-toolkit -y
conda activate pandora
pip install torch torchvision torchaudio
bash build_envs.sh  

If your GPU doesn't support CUDA 12.1, you can also install with CUDA 11.8:

conda create -n pandora python=3.12.3 nvidia/label/cuda-11.8.0::cuda-toolkit -y 
conda activate pandora
pip install torch torchvision torchaudio
bash build_envs.sh  

Inference

Gradio Demo

  1. Download the model checkpoint from Hugging Face. (We currently hide the model weights due to data license issue. We will re-open the weights soon after we figure this out.)
  2. Run the commands on your terminal
CUDA_VISIBLE_DEVICES={cuda_id} python gradio_app.py  --ckpt_path {path_to_ckpt}

Then you can interact with the model through gradio interface.

Citation

@article{xiang2024pandora,
  title={Pandora: Towards General World Model with Natural Language Actions and Video States},
  author={Jiannan Xiang and Guangyi Liu and Yi Gu and Qiyue Gao and Yuting Ning and Yuheng Zha and Zeyu Feng and Tianhua Tao and Shibo Hao and Yemin Shi and Zhengzhong Liu and Eric P. Xing and Zhiting Hu},
  year={2024}
}