GitHub - eltociear/MagicDance: MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

MagicDance: Realistic Human Dance Video Generation
with Motions & Facial Expressions Transfer

Di Chang¹ · Yichun Shi² · Quankai Gao¹ · Jessica Fu¹ · Hongyi Xu² ·
Guoxian Song² · Qing Yan² · Xiao Yang² · Mohammad Soleymani¹ ·
¹University of Southern California ²ByteDance Inc.

News

[2024.02.15] Release training and inference code.
[2024.02.02] Release updated paper - MagicPose. The method and data are exactly the same.
[2023.11.18] Release MagicDance paper and project page.

Related Open-Source Works

Disco, from Microsoft
MagicAnimate, from ByteDance - Singapore

Getting Started

For inference on TikTok dataset or your own image and poses, download our MagicDance checkpoint.

For appearance control pretraining, please download the pretrained model for StableDiffusion V1.5.

For appearance-disentangled Pose Control, please download pretrained Appearance Control Model and pretrained ControlNet OpenPose.

The pre-processed TikTok dataset can be downloaded from here. OpenPose may fail to detect human pose skeletons for some images, so we will filter those failure cases and train our model on clean data.

Place the pretrained weights and dataset as following:

MagicDance
|----TikTok-v4
|----pretrained_weights
  |----control_v11p_sd15_openpose.pth
  |----control_sd15_ini.ckpt
  |----model_state-110000.th
  |----model_state-10000.th  
|----...

Environment

The environment from my machine is python==3.9, pytorch==1.13.1, CUDA==11.7. You may use other version of these prerequisites according to your local environment.

conda env create -f environment.yaml
conda activate magicpose

Inference

Inference on the test set:

bash scripts/inference_tiktok_dataset.sh

We use exactly same code from DisCo for metrics evaluation. Some example outputs from our model are shown below:

Inference with specific image and pose sequence:

bash scripts/inference_any_image_pose.sh

We offer some images and poses in "example_data", you can easily inference with your own image or pose sequence by replacing the arguments "local_cond_image_path" and "local_pose_path" in inference_any_image_pose.sh. Some interesting outputs from out-of-domain images are shown below:

Our model is also able to retarget the pose of generated image from T2I model.

Training

Appearance Control Pretraining:

bash scripts/appearance_control_pretraining.sh

Appearance-Disentangled Pose Control:

bash scripts/appearance_disentangle_pose_control.sh

Some tips

The task

From our experiences with this project, this motion retargeting task is a data-hungry task. Generation result highly depends on the training data, e.g. the quality of pose tracker, the amount of video sequences and frames per video in your training data. You may consider adopt DensePose as in MagicAnimate, DWPose as in Animate Anyone or any other geometry control for better generation quality. We have tried MMPose as well, which produced slightly better pose detection results. Introduce extra training data will yield better performance, consider using any other real-human dataset half-body/full-body dataset, e.g. TaiChi/DeepFashion, for further finetuning.

The code

Most of the arguments are self-explanatory in the codes. Several key arguments are explained below.

model_config A relative or absolute folder path to the config file of your model architecture.
img_bin_limit The maximum step for randomly selecting source and target image during training. During inference, the value is set to be "all".
control_mode This argument controls the Image-CFG during inference. "controlnet_important" denotes Image-CFG is used and "balance" means not.
wonoise The reference image is fed into the appearance control model without adding noise.
with_text When "with_text" is given, text is not used for training. (I know it's a bit confusing, lol)
finetune_control Finetune Appearance Control Model (and Pose ControlNet).
output_dir A relative or absolute folder for writing checkpoints.
local_image_dir A relative or absolute folder for writing image outputs.
image_pretrain_dir A relative or absolute folder for loading appearance control model checkpoint.
pose_pretrain_dir A relative or absolute path to pose controlnet.

Citing

If you find our work useful, please consider citing:

@article{chang2023magicdance,
  title={MagicDance: Realistic Human Dance Video Generation with Motions \& Facial Expressions Transfer},
  author={Chang, Di and Shi, Yichun and Gao, Quankai and Fu, Jessica and Xu, Hongyi and Song, Guoxian and Yan, Qing and Yang, Xiao and Soleymani, Mohammad},
  journal={arXiv preprint arXiv:2311.12052},
  year={2023}
}

Acknowledgments

Our code follows several excellent repositories. We appreciate them for making their codes available to the public. We also appreciate the help from Tan Wang, who offered assistance to our baselines comparison experiment.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
TikTok-v4		TikTok-v4
dataset		dataset
example_data		example_data
figures		figures
misc_scripts		misc_scripts
model_lib/ControlNet		model_lib/ControlNet
scripts		scripts
tool		tool
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
test_any_image_pose.py		test_any_image_pose.py
test_tiktok.py		test_tiktok.py
train_tiktok.py		train_tiktok.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MagicDance: Realistic Human Dance Video Generation
with Motions & Facial Expressions Transfer

News

Related Open-Source Works

Getting Started

Environment

Inference

Training

Some tips

The task

The code

Citing

Acknowledgments

About

Releases

Packages

Languages

eltociear/MagicDance

Folders and files

Latest commit

History

Repository files navigation

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

News

Related Open-Source Works

Getting Started

Environment

Inference

Training

Some tips

The task

The code

Citing

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

MagicDance: Realistic Human Dance Video Generation
with Motions & Facial Expressions Transfer

Packages