PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Authors: Yidong Huang*, Zun Wang*, Han Lin, Dong-Ki Kim, Shayegan Omidshafiei, Jaehong Yoon, Jaemin Cho, Yue Zhang and Mohit Bansal (UNC Chapel Hill, FieldAI, NTU Singapore, AI2, Johns Hopkins University)

* Equal contribution.

Project page · Arxiv · Model · Dataset

Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propose PhyMotion, a structured, fine-grained motion reward that grounds recovered 3D human trajectories in a physics simulator and evaluates motion quality along multiple dimensions of physical feasibility. Concretely, we recover SMPL body meshes from generated videos, retarget them onto a humanoid in the MuJoCo physics simulator, and evaluate the resulting motion along three axes: kinematic plausibility, contact and balance consistency, and dynamic feasibility. Each component provides a continuous and interpretable signal tied to a specific aspect of motion quality, allowing the reward to capture which aspects of motion are physically correct or violated. Experiments show that PhyMotion achieves stronger correlation with human judgments than existing reward formulations. These gains carry over to RL-based post-training, where optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain). Ablations show that the three axes provide complementary supervision signals, while the reward preserves overall video generation quality with only modest training overhead.

Pretrained Checkpoints and Data

Asset	Hugging Face	Notes
PhyMotion-CausalForcing-1.3B LoRA	`6kplus/PhyMotion-CausalForcing-1.3B` (model)	LoRA adapter for the Causal Forcing 1.3B base, post-trained with the PhyMotion reward.
MotionX prompts (train 21,348 / test 1,123)	`6kplus/PhyMotion-MotionX-Prompts` (dataset)	`train.txt` is used for RL rollout during post-training; `test.txt` is used for evaluation.

Download both:

# LoRA adapter
huggingface-cli download 6kplus/PhyMotion-CausalForcing-1.3B \
  --local-dir checkpoints/phymotion-causalforcing

# Train + test prompt splits
huggingface-cli download 6kplus/PhyMotion-MotionX-Prompts \
  --repo-type dataset --local-dir dataset/motionx

Environment Setup

Create the Python environment and install dependencies. requirements.txt covers the full stack including MuJoCo 3.3.6 and SMPL-X — no separate steps needed.

conda create -n phymotion python=3.10 -y
conda activate phymotion
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
pip install flash-attn==2.7.4.post1 --no-build-isolation

Quick sanity check the env:

python -c "import torch, flash_attn, mujoco, smplx; \
print(f'torch={torch.__version__} cuda={torch.cuda.is_available()}, flash_attn={flash_attn.__version__}, mujoco={mujoco.__version__}')"
# Expected output:
# torch=2.6.0+cu124 cuda=True, flash_attn=2.7.4.post1, mujoco=3.3.6

Install GVHMR. The reward calls GVHMR in-process to recover SMPL-X meshes from generated frames.

git clone https://github.com/zju3dv/GVHMR.git ~/GVHMR
export GVHMR_ROOT=~/GVHMR

Download the GVHMR checkpoint bundle (~9 GB) from HuggingFace:

for ckpt in \
  gvhmr/gvhmr_siga24_release.ckpt \
  hmr2/epoch=10-step=25000.ckpt \
  vitpose/vitpose-h-multi-coco.pth \
  yolo/yolov8x.pt; do
  huggingface-cli download camenduru/GVHMR "$ckpt" \
    --local-dir $GVHMR_ROOT/inputs/checkpoints
done

SMPL-X body models (required): The GVHMR bundle does not include the SMPL-X body model files — these must be obtained separately.

Register (free academic license) at https://smpl-x.is.tue.mpg.de/ and download the SMPL-X model zip.
Extract and place the following three files:

$GVHMR_ROOT/inputs/checkpoints/body_models/smplx/SMPLX_NEUTRAL.npz
$GVHMR_ROOT/inputs/checkpoints/body_models/smplx/SMPLX_MALE.npz
$GVHMR_ROOT/inputs/checkpoints/body_models/smplx/SMPLX_FEMALE.npz

The training script and reward module read GVHMR_ROOT from the environment.

After GVHMR's pip dependencies are resolved, pin scipy to avoid a numpy/ufunc incompatibility:

pip install --force-reinstall scipy==1.15.2

The humanoid MJCF model used to retarget SMPL is bundled inside this repo (astrolabe/scorers/video/), so no additional asset is required.

Download the Wan2.1 T2V-1.3B base components (transformer config, VAE, and UMT5-XXL text encoder). ~17 GB.

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B

Download the Causal Forcing 1.3B sampler weights (the autoregressive distilled version of Wan2.1 T2V-1.3B). PhyMotion's RL post-training starts from this. (~5.3 GB)

huggingface-cli download zhuhz22/Causal-Forcing \
  chunkwise/causal_forcing.pt \
  --local-dir checkpoints/casualforcing
# Result: checkpoints/casualforcing/chunkwise/causal_forcing.pt

(Optional) Download our pretrained PhyMotion-CausalForcing-1.3B LoRA + the MotionX prompt splits from Hugging Face:

# LoRA adapter (700 MB)
huggingface-cli download 6kplus/PhyMotion-CausalForcing-1.3B \
  --local-dir checkpoints/phymotion-causalforcing

# Prompt splits: train.txt (21,348) and test.txt (1,123)
huggingface-cli download 6kplus/PhyMotion-MotionX-Prompts \
  --repo-type dataset --local-dir dataset/motionx

To train on your own prompt list instead, drop your one-prompt-per-line files at dataset/motionx/train.txt and dataset/motionx/test.txt.

Stage 1: PhyMotion Reward

The reward grounds each generated video in a 3D body and scores it along three feasibility axes (kinematic, contact, dynamic). It is implemented as a single function in astrolabe/rewards.py.

Axis	Sub-scores
Kinematic	joint velocity, joint acceleration, self-penetration
Contact	foot slip, ground penetration, foot float, balance
Dynamic	joint torque, ground reaction force, metabolic effort

The final reward is the mean of the three axes. All feasibility code (joint-based kinematics and MuJoCo-based contact / dynamics) lives in a single file: astrolabe/scorers/video/smpl_feasibility.py.

To wire the reward into a config:

config.reward_fn = {"phymotion_score": 1.0}

To combine with a perceptual reward (e.g. HPSv3) for balanced training:

config.reward_fn = {
    "phymotion_score":   1.0,
    "video_hpsv3_local": 1.0,
}

Stage 2: RL Post-Training

Launch RL post-training of Causal Forcing 1.3B with the PhyMotion reward.

export GVHMR_ROOT=/path/to/GVHMR
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
  --config configs/nft_casual_forcing.py:casual_forcing_video_phymotion

nproc_per_node: number of GPUs on a single node.
--config: a <file>:<entry> selector. The entry casual_forcing_video_phymotion uses the PhyMotion reward (see configs/nft_casual_forcing.py for other entries that mix in perceptual rewards).

Outputs are written to logs/nft/<base_model>/<run_name>_<timestamp>/:

checkpoints/checkpoint-<step>/lora/ — PEFT LoRA adapter (rank 256 on CausalWanAttentionBlock).
optimizer.pt, scaler.pt, and W&B / TensorBoard logs.

Stage 3: Inference

Roll out a trained LoRA on a list of prompts.

# Using the released PhyMotion-CausalForcing-1.3B LoRA 
torchrun --nproc_per_node=1 scripts/inference_wan.py \
  --base_model checkpoints/casualforcing/chunkwise/causal_forcing.pt \
  --lora_path  checkpoints/phymotion-causalforcing \
  --prompt_file prompts/sample.txt \
  --output_dir outputs/test \
  --num_frames 45 --height 480 --width 832 \
  --guidance_scale 3.0 \
  --denoising_steps "1000,750,500,250" \
  --num_frame_per_block 3 \
  --mixed_precision bf16 --seed 42

To use your own freshly trained LoRA, point --lora_path at your checkpoint dir:

--lora_path  logs/nft/wan_casual_chunk/casual_forcing_video_phymotion_<TS>/checkpoints/phymotion-causalforcing

--base_model: path to the Causal Forcing 1.3B checkpoint.
--lora_path: a checkpoint-<step>/ folder or its lora/ subdir.
--prompt_file: a one-prompt-per-line text file.
--output_dir: directory for the generated mp4s. Expect ~5 seconds per video on a single A100.

Hardware and Reference Runtimes

Our reported numbers were produced on:

Hardware: 1 node with 8× NVIDIA A100 80 GB
CUDA: 12.4; Python: 3.10; PyTorch: 2.6.0; flash-attn: 2.7.4.post1.

Approximate per-stage compute / wall-clock:

Stage	Hardware	Wall clock
Stage 2 (RL post-training)	8× A100 80 GB	~60 hours for 330 steps (≈ 10 min/step at batch 8)
Stage 3 (inference, 45 frames @ 480×832)	1× A100 / RTX 4090	~5 seconds per video
Stage 1 (reward, 1 video)	1× A100 (GVHMR + MuJoCo)	~3 seconds per video

Acknowledgements

This codebase builds on several excellent open-source projects. We thank the authors and maintainers of Astrolabe for the RL / reward-training infrastructure, FastVideo for efficient video generation and training utilities, and GVHMR for human mesh recovery used in our 3D motion reward pipeline. Their publicly released code made this work possible.

Citation

If you find this work useful, please consider citing:

@article{huang2026phymotion,
  title={PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation},
  author={Huang, Yidong and Wang, Zun and Lin, Han and Kim, Dong-Ki and Omidshafiei, Shayegan and Yoon, Jaehong and Cho, Jaemin and Zhang, Yue and Bansal, Mohit},
  journal={arXiv preprint arXiv:2605.14269},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
astrolabe		astrolabe
configs		configs
pipeline		pipeline
prompts		prompts
scripts		scripts
utils		utils
wan		wan
wandb		wandb
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Pretrained Checkpoints and Data

Environment Setup

Stage 1: PhyMotion Reward

Stage 2: RL Post-Training

Stage 3: Inference

Hardware and Reference Runtimes

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Pretrained Checkpoints and Data

Environment Setup

Stage 1: PhyMotion Reward

Stage 2: RL Post-Training

Stage 3: Inference

Hardware and Reference Runtimes

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages