GitHub - facebookresearch/DuoMo: Body motion estimation from monocular videos via two stage diffusion (CVPR 2026).

DuoMo

Official implementation for the paper:
DuoMo: Dual Motion Diffusion for World-space Human Reconstruction
Yufu Wang, Evonne Ng, Soyong Shin, Rawal Khirodkar, Yuan Dong, Zhaoen Su, Jinhyung Park
Kris Kitani, Alexander Richard, Fabian Prada, Michael Zollhoefer
[Project Page] [Arxiv]

Installation

Clone the repository.

git clone https://github.com/facebookresearch/duomo.git
cd duomo

Setup Python environment.

conda create -n duomo python=3.12
conda activate duomo

Install dependencies.

# Set CUDA_HOME
export CUDA_HOME=$CONDA_PREFIX

# Install CUDA toolkit
conda install -c nvidia cuda-toolkit=12.8

# Install standard packages (with PyTorch cu128 wheel specified)
pip install -r requirements.txt

# Compile and install custom GitHub packages
pip install "git+https://github.com/mattloper/chumpy@9b045ff5d6588a24a0bab52c83f032e2ba433e17" --no-build-isolation
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable" --no-build-isolation

Install third-party dependencies into third_party/.

# Pulling external repositories (e.g., GVHMR, PromptHMR)
bash scripts/install_third_party.sh

Prepare data

Run the following commands to download all checkpoints and processed dataset features into data/. The second command will prompt you to register and log in to access SMPL.

# Checkpoints and annotations
bash scripts/download_data.sh

# SMPLX family models
bash scripts/download_smplx.sh

Demos

Demo for an exmple video with static camera:

# By default it assumes static camera
python scripts/inference.py --video_path data/dance.mp4

By default, the demo assumes a static camera setup. While this release does not include a SLAM module, the inference pipeline supports moving cameras. If you have pre-computed camera motion (e.g., from SLAM or device sensors), you can provide them as input.

As an example, we sample a video from EMDB with precomputed camera poses from TRAM and ground truth bounding boxes. Please take a look at scripts/data_prep/create_emdb_example.py for the definition of camera param.

# Get an EMDB video, save to data/
python scripts/data_prep/create_emdb_example.py --dataset_dir /your_emdb_dir

# Inference the video, with SLAM camera and GT boxes
python scripts/inference.py --video_path data/emdb_sample.mp4 --camera_param data/emdb_sample_cam.pt --boxes data/emdb_sample_boxes.pt

Output Coordinates: When camera extrinsics are provided, the final reconstructed motion will be aligned to the coordinate system of the first video frame (first camera pose as world origin).

Evaluation

We provide pre-computed features (dense keypoints, image features, etc) under data/processed for evaluation. However, to complete the evaluation, you need to obtain the annotations from the official EMDB, RICH, and EgoBody websites. For EgoBody, we only need a subset and you can use scripts/download_egobody.py to download them. After obtaining the official annotations, combine them with our pre-computed features as follows. Please see scripts/data_prep/README.md for me details.

# EMDB
python scripts/data_prep/process_emdb.py --dataset_dir /your_emdb_dir

# Egobody
python scripts/data_prep/process_egobody.py --dataset_dir /your_egobody_dir

After that, the dateset files in data/processed are updated. Use the following for the actual evaluation.

# Available datasets: EMDB, RICH, EgoBody
python scripts/evaluation.py --dataset emdb2

Training

We have included the full training pipeline for our models. However, we do not provide preprocessed dataset labels. To train the models from scratch, you will need to implement your own dataset preprocessing. We provide some example preprocessing scripts (e.g. scripts/data_prep/process_amass.py). Please reference our data preprocess and loading implementation to understand the expected structure. The following commands run the training loop.

# Training the camera-space motion diffusion model
sbatch scripts/train_stage1.sh

# Training the world-space motion diffusion model
sbatch scripts/train_stage2.sh

Project structure

DuoMo/
├── src/
│   ├── data/            # <-- dataset loading
│   ├── models/          # <-- architecture
│   ├── processors/      # <-- wrapper for third-party processes
│   ├── recipes/         # <-- configurations
│   ├── trainer/         # <-- training
│   ├── utils/
│   ├── vis/
│   ├── __init__.py  
│   ├── inference.py      # <-- pipeline
│   └── evaluation.py     # <-- evaluation
│ 
├── third_party/          (after installation)
│   ├── GVHMR/            # <-- for synthetic camera motion on AMASS
│   └── PromptHMR/        # <-- for image feature encoding
│ 
├── scripts/              # <-- demo, training and evlauation scripts
└── data/                 # <-- hold checkpoints

Citation

@article{wang2026duomo,
  title={DuoMo: Dual Motion Diffusion for World-space Human Reconstruction},
  author={Wang, Yufu and Ng, Evonne and Shin, Soyong and Khirodkar, Rawal and Dong, Yuan and Su, Zhaoen and Park, Jinhyung and Kitani, Kris and Richard, Alexander and Prada, Fabian and Zollhoefer, Michael},
  year={2026}
}

License

DuoMo is licensed under the XRCIA Noncommercial Research License Agreement License. A copy of the license can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
data		data
scripts		scripts
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.pdf		LICENSE.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuoMo

Installation

Prepare data

Demos

Evaluation

Training

Project structure

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

DuoMo

Installation

Prepare data

Demos

Evaluation

Training

Project structure

Citation

License

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages