Skip to content

sungnyun/diffblender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models 🔥

  • DiffBlender successfully synthesizes complex combinations of input modalities. It enables flexible manipulation of conditions, providing the customized generation aligned with user preferences.
  • We designed its structure to intuitively extend to additional modalities while achieving a low training cost through a partial update of hypernetworks.

teaser

🗓️ TODOs

  • Project page is open: link
  • DiffBlender model: code & checkpoint
  • Release inference code
  • Release training code & pipeline
  • Gradio UI

🚀 Getting Started

Install the necessary packages with:

$ pip install -r requirements.txt

Download DiffBlender model checkpoint from this Huggingface model, and place it under ./diffblender_checkpoints/.
Also, prepare the SD model from this link (we used CompVis/sd-v1-4.ckpt).

⚡️ Try Multimodal T2I Generation with DiffBlender

$ python inference.py --ckpt_path=./diffblender_checkpoints/{CKPT_NAME}.pth \
                      --official_ckpt_path=/path/to/sd-v1-4.ckpt \
                      --save_name={SAVE_NAME} 

Results will be saved under ./inference/{SAVE_NAME}/, in the format as {conditions + generated image}.

BibTeX

@article{kim2023diffblender,
  title={DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models},
  author={Kim, Sungnyun and Lee, Junsoo and Hong, Kibeom and Kim, Daesik and Ahn, Namhyuk},
  journal={arXiv preprint arXiv:2305.15194},
  year={2023}
}

About

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages