Sihui Ji1 Xi Chen1 Xin Tao2 Pengfei Wan2 Hengshuang Zhao1✉
1The University of Hong Kong
2Kling Team, Kuaishou Technology
✉Corresponding author
- [2025.10.16]: Release the Project Page and the Arxiv version.
TL;DR: We propose PhysMaster, which captures physical knowledge as a representation for guiding video generation models to enhance their physics-awareness. Specifically, PhysMaster is based on the image-to-video task where the model is expected to predict physically plausible dynamics from the input image. We devise PhysEncoder to encode such physical representation as an extra condition, and adopt a top-down optimization strategy for finetuning PhysEncoder based on the physical plausibility of the final generated videos using reinforcement learning (RL). Experiment results demonstrate strong performance of our model on both specialized proxy tasks and general open-world scenarios.
|
compare.mp4
ablation.mp4
This repository is released under the Apache-2.0 license as found in the LICENSE file.
If you find this codebase useful for your research, please use the following entry.
@article{Ji2025physmaster,
title={{PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning}},
author={Ji, Sihui and Chen, Xi and Tao, Xin and Wan, Pengfei and Zhao, Hengshuang},
journal={arXiv preprint arXiv:2510.13809},
year={2025}
}
