New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Versatile Diffusion] Add versatile diffusion model #1283
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
src/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion.py
Show resolved
Hide resolved
src/diffusers/pipelines/versatile_diffusion/pipeline_versatile_diffusion.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice that everything works and that we need no changes to the UNet2DCondition! Not super happy about the context manager 😅 Could we maybe do a different design here it's quite difficult to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice progress here and great that you made it work! Think I understand the difficulties with the architecture a bit more now - should we maybe add all the existing functionality first and then discuss API / design a bit more?
Think we should also make multiple pipelines no?
- All in one pipeline (this one will be heavy as it'll have both unets loaded in memory
- To-Image pipeline (this one will only have image_unet + text_cross_att & image_cross_att)
- To-Text pipeline (this one will only have text_unet + text_cross_att & image_cross_att
- 4 very light pipelines (text2img, img2img, img2text, text2text) ? (maybe we don't need to add those if memory is low enough in the "dual" pipelines
…add_versatile_diffusers
…d_versatile_diffusers
Added the "GPT2 optimus" It expects latent diffusion outputs and should work as follows: #!/usr/bin/env python3
import torch
from diffusers.pipelines.versatile_diffusion import GPT2OptimusForLatentConnector
from transformers import GPT2Tokenizer
model = GPT2OptimusForLatentConnector.from_pretrained("fusing/gpt2_optimus")
tokenizer = GPT2Tokenizer.from_pretrained("fusing/gpt2_optimus")
latent_output_of_unet = # get tensor from unet
output = model.generate(bos_token_id=tokenizer.bos_token_id, past=latent_output_of_unet) Haven't tested it end to end as I think we first need to wait for the text unet, but more than happy to debug further when the text unet is ready #784 |
…ace/diffusers into add_versatile_diffusers
…ace/diffusers into add_versatile_diffusers
library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.) | ||
|
||
Parameters: | ||
vqvae ([`VQModel`]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @patrickvonplaten , it seems that these docs still need to be updated 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Would you like to open a PR? :-)
* up * convert dual unet * revert dual attn * adapt for vd-official * test the full pipeline * mixed inference * mixed inference for text2img * add image prompting * fix clip norm * split text2img and img2img * fix format * refactor text2img * mega pipeline * add optimus * refactor image var * wip text_unet * text unet end to end * update tests * reshape * fix image to text * add some first docs * dual guided pipeline * fix token ratio * propose change * dual transformer as a native module * DualTransformer(nn.Module) * DualTransformer(nn.Module) * correct unconditional image * save-load with mega pipeline * remove image to text * up * uP * fix * up * final fix * remove_unused_weights * test updates * save progress * uP * fix dual prompts * some fixes * finish * style * finish renaming * up * fix * fix * fix * finish Co-authored-by: anton-l <anton@huggingface.co>
* up * convert dual unet * revert dual attn * adapt for vd-official * test the full pipeline * mixed inference * mixed inference for text2img * add image prompting * fix clip norm * split text2img and img2img * fix format * refactor text2img * mega pipeline * add optimus * refactor image var * wip text_unet * text unet end to end * update tests * reshape * fix image to text * add some first docs * dual guided pipeline * fix token ratio * propose change * dual transformer as a native module * DualTransformer(nn.Module) * DualTransformer(nn.Module) * correct unconditional image * save-load with mega pipeline * remove image to text * up * uP * fix * up * final fix * remove_unused_weights * test updates * save progress * uP * fix dual prompts * some fixes * finish * style * finish renaming * up * fix * fix * fix * finish Co-authored-by: anton-l <anton@huggingface.co>
Add model from https://github.com/SHI-Labs/Versatile-Diffusion