Skip to content

facebookresearch/DuoMo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DuoMo

Official implementation for the paper:
DuoMo: Dual Motion Diffusion for World-space Human Reconstruction
Yufu Wang, Evonne Ng, Soyong Shin, Rawal Khirodkar, Yuan Dong, Zhaoen Su, Jinhyung Park
Kris Kitani, Alexander Richard, Fabian Prada, Michael Zollhoefer
[Project Page] [Arxiv]

Installation

  1. Clone the repository.
git clone https://github.com/facebookresearch/duomo.git
cd duomo
  1. Setup Python environment.
conda create -n duomo python=3.12
conda activate duomo
  1. Install dependencies.
# Set CUDA_HOME
export CUDA_HOME=$CONDA_PREFIX

# Install CUDA toolkit
conda install -c nvidia cuda-toolkit=12.8

# Install standard packages (with PyTorch cu128 wheel specified)
pip install -r requirements.txt

# Compile and install custom GitHub packages
pip install "git+https://github.com/mattloper/chumpy@9b045ff5d6588a24a0bab52c83f032e2ba433e17" --no-build-isolation
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable" --no-build-isolation
  1. Install third-party dependencies into third_party/.
# Pulling external repositories (e.g., GVHMR, PromptHMR)
bash scripts/install_third_party.sh

Prepare data

Run the following commands to download all checkpoints and processed dataset features into data/. The second command will prompt you to register and log in to access SMPL.

# Checkpoints and annotations
bash scripts/download_data.sh

# SMPLX family models
bash scripts/download_smplx.sh

Demos

Demo for an exmple video with static camera:

# By default it assumes static camera
python scripts/inference.py --video_path data/dance.mp4

By default, the demo assumes a static camera setup. While this release does not include a SLAM module, the inference pipeline supports moving cameras. If you have pre-computed camera motion (e.g., from SLAM or device sensors), you can provide them as input.

As an example, we sample a video from EMDB with precomputed camera poses from TRAM and ground truth bounding boxes. Please take a look at scripts/data_prep/create_emdb_example.py for the definition of camera param.

# Get an EMDB video, save to data/
python scripts/data_prep/create_emdb_example.py --dataset_dir /your_emdb_dir

# Inference the video, with SLAM camera and GT boxes
python scripts/inference.py --video_path data/emdb_sample.mp4 --camera_param data/emdb_sample_cam.pt --boxes data/emdb_sample_boxes.pt

Output Coordinates: When camera extrinsics are provided, the final reconstructed motion will be aligned to the coordinate system of the first video frame (first camera pose as world origin).

Evaluation

We provide pre-computed features (dense keypoints, image features, etc) under data/processed for evaluation. However, to complete the evaluation, you need to obtain the annotations from the official EMDB, RICH, and EgoBody websites. For EgoBody, we only need a subset and you can use scripts/download_egobody.py to download them. After obtaining the official annotations, combine them with our pre-computed features as follows. Please see scripts/data_prep/README.md for me details.

# EMDB
python scripts/data_prep/process_emdb.py --dataset_dir /your_emdb_dir

# Egobody
python scripts/data_prep/process_egobody.py --dataset_dir /your_egobody_dir

After that, the dateset files in data/processed are updated. Use the following for the actual evaluation.

# Available datasets: EMDB, RICH, EgoBody
python scripts/evaluation.py --dataset emdb2

Training

We have included the full training pipeline for our models. However, we do not provide preprocessed dataset labels. To train the models from scratch, you will need to implement your own dataset preprocessing. We provide some example preprocessing scripts (e.g. scripts/data_prep/process_amass.py). Please reference our data preprocess and loading implementation to understand the expected structure. The following commands run the training loop.

# Training the camera-space motion diffusion model
sbatch scripts/train_stage1.sh

# Training the world-space motion diffusion model
sbatch scripts/train_stage2.sh

Project structure

DuoMo/
├── src/
│   ├── data/            # <-- dataset loading
│   ├── models/          # <-- architecture
│   ├── processors/      # <-- wrapper for third-party processes
│   ├── recipes/         # <-- configurations
│   ├── trainer/         # <-- training
│   ├── utils/
│   ├── vis/
│   ├── __init__.py  
│   ├── inference.py      # <-- pipeline
│   └── evaluation.py     # <-- evaluation
│ 
├── third_party/          (after installation)
│   ├── GVHMR/            # <-- for synthetic camera motion on AMASS
│   └── PromptHMR/        # <-- for image feature encoding
│ 
├── scripts/              # <-- demo, training and evlauation scripts
└── data/                 # <-- hold checkpoints

Citation

@article{wang2026duomo,
  title={DuoMo: Dual Motion Diffusion for World-space Human Reconstruction},
  author={Wang, Yufu and Ng, Evonne and Shin, Soyong and Khirodkar, Rawal and Dong, Yuan and Su, Zhaoen and Park, Jinhyung and Kitani, Kris and Richard, Alexander and Prada, Fabian and Zollhoefer, Michael},
  year={2026}
}

License

DuoMo is licensed under the XRCIA Noncommercial Research License Agreement License. A copy of the license can be found here.

About

Body motion estimation from monocular videos via two stage diffusion (CVPR 2026).

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages