Skip to content

jqliu09/MCLD

Repository files navigation

MCLD arXiv

Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Jiaqi Liu, Jichao Zhang, Paolo Rota, Nicu Sebe
Computer Vision and Pattern Recognition Conference (CVPR), 2025, Nashville, USA

qualitative

Generated Results

You can directly download our test results from Google Drive (Including 256x176, 512*352 on Deepfashion) for further comparison.

Dataset

  • Download img_highres.zip of the DeepFashion Dataset from In-shop Clothes Retrieval Benchmark.

  • Unzip img_highres.zip. You will need to ask for password from the dataset maintainers. Then unzip it and put it under the ./dataset/fashion directory.

  • Preprocess dataset by runing prepare_dataset.py. This will split the dataset, and prepare the needed conditions such poses, texture maps and face embeddings. You need pip install detectron2 for Densepose. The whole preprocessing time requires ~ 8h. You could also download our processed conditions form Google Drive and unzip.

  • After the preprocessing, you should have your dataset folder organized as follows:

./dataset/fashion/
|-- train
|-- train_densepose
|-- train_texture
|-- train_face
|-- test
|-- test_densepose
|-- test_texture
|-- test_face
|-- MEN
|-- WOMEN

Preparation

Install Environment

pip install -r requirements.txt

Download pretrained Models

  1. Download pretrained weight of based models and other components and put it to the pretrained weights:

  2. Download our trained checkpoints from Google drive/HF hub and put it to ./checkpoints folder.

Finally you will have your pretrained weight as this structure:

./pretrained_weights/
|-- model_final_844d15.pkl
|-- control_v11p_sd15_seg
    |-- config.json
    |-- diffusion_pytorch_model.bin
    `-- diffusion_pytorch_model.safetensors
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml
./checkpoints/
|-- denoising_unet.pth
|-- image_projector.pth
|-- pose_guider.pth
`-- reference_unet.pth

Method

method The overall pipeline of our proposed Multi-focal Conditioned Diffusion Model. (a) Face regions and appearance regions are first extracted from the source person images; (b) multi-focal condition aggregation module $\phi$ is used to fuse the focal embeddings as $c_{emb}$; (c) ReferenceNet $\mathcal{R}$ is used to aggregate information from the appearance texture map, denoted as $c_{ref}$; (d) Densepose provides the pose control to be fused into UNet with noise by Pose Guider.

Training

This code support multi-GPU training with accelerate. Full training takes ~26 hours with 2 A100-80G GPUs with a batch size 12 on deepfashion dataset.

accelerate launch --main_process_port 12148 train.py --config ./configs/train/train.yaml

Validation

To test our method on the whole Deepfashion dataset, run:

test.py --save_path FOLDER_TO_SAVE --ckpt_dir ./checkpoints/ --config_path ./configs/train/train.yaml

Then, the results can be evaluated by:

evaluate.py --save_path FOLDER_TO_SAVE --gt_folder FOLDER_FOR_GT --training_path ./dataset/fashion/train/

Editing

MCLD allows flexible editing since it decompose the human appearance and identities. We will release the editing code in the future as soon as it is ready.

editing

Citation

@misc{liu2025multifocalconditionedlatentdiffusion,
      title={Multi-focal Conditioned Latent Diffusion for Person Image Synthesis}, 
      author={Jiaqi Liu and Jichao Zhang and Paolo Rota and Nicu Sebe},
      year={2025},
      eprint={2503.15686},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.15686}, 
}

About

[CVPR 2025] Multi-focal Conditioned Latent Diffusion for Person Image Synthesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages