Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Versatile Diffusion] Add versatile diffusion model #1283

Merged
merged 62 commits into from Nov 23, 2022

Conversation

patrickvonplaten
Copy link
Contributor

@HuggingFaceDocBuilder
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor Author

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice that everything works and that we need no changes to the UNet2DCondition! Not super happy about the context manager 😅 Could we maybe do a different design here it's quite difficult to understand

Copy link
Contributor Author

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice progress here and great that you made it work! Think I understand the difficulties with the architecture a bit more now - should we maybe add all the existing functionality first and then discuss API / design a bit more?

Think we should also make multiple pipelines no?

  • All in one pipeline (this one will be heavy as it'll have both unets loaded in memory
  • To-Image pipeline (this one will only have image_unet + text_cross_att & image_cross_att)
  • To-Text pipeline (this one will only have text_unet + text_cross_att & image_cross_att
  • 4 very light pipelines (text2img, img2img, img2text, text2text) ? (maybe we don't need to add those if memory is low enough in the "dual" pipelines

@patrickvonplaten
Copy link
Contributor Author

Added the "GPT2 optimus"

It expects latent diffusion outputs and should work as follows:

#!/usr/bin/env python3
import torch
from diffusers.pipelines.versatile_diffusion import GPT2OptimusForLatentConnector
from transformers import GPT2Tokenizer

model = GPT2OptimusForLatentConnector.from_pretrained("fusing/gpt2_optimus")
tokenizer = GPT2Tokenizer.from_pretrained("fusing/gpt2_optimus")

latent_output_of_unet =  # get tensor from unet

output = model.generate(bos_token_id=tokenizer.bos_token_id, past=latent_output_of_unet)

Haven't tested it end to end as I think we first need to wait for the text unet, but more than happy to debug further when the text unet is ready #784

@patrickvonplaten patrickvonplaten merged commit 2625fb5 into main Nov 23, 2022
@patrickvonplaten patrickvonplaten deleted the add_versatile_diffusers branch November 23, 2022 18:03
library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

Parameters:
vqvae ([`VQModel`]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @patrickvonplaten , it seems that these docs still need to be updated 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Would you like to open a PR? :-)

sliard pushed a commit to sliard/diffusers that referenced this pull request Dec 21, 2022
* up

* convert dual unet

* revert dual attn

* adapt for vd-official

* test the full pipeline

* mixed inference

* mixed inference for text2img

* add image prompting

* fix clip norm

* split text2img and img2img

* fix format

* refactor text2img

* mega pipeline

* add optimus

* refactor image var

* wip text_unet

* text unet end to end

* update tests

* reshape

* fix image to text

* add some first docs

* dual guided pipeline

* fix token ratio

* propose change

* dual transformer as a native module

* DualTransformer(nn.Module)

* DualTransformer(nn.Module)

* correct unconditional image

* save-load with mega pipeline

* remove image to text

* up

* uP

* fix

* up

* final fix

* remove_unused_weights

* test updates

* save progress

* uP

* fix dual prompts

* some fixes

* finish

* style

* finish renaming

* up

* fix

* fix

* fix

* finish

Co-authored-by: anton-l <anton@huggingface.co>
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* up

* convert dual unet

* revert dual attn

* adapt for vd-official

* test the full pipeline

* mixed inference

* mixed inference for text2img

* add image prompting

* fix clip norm

* split text2img and img2img

* fix format

* refactor text2img

* mega pipeline

* add optimus

* refactor image var

* wip text_unet

* text unet end to end

* update tests

* reshape

* fix image to text

* add some first docs

* dual guided pipeline

* fix token ratio

* propose change

* dual transformer as a native module

* DualTransformer(nn.Module)

* DualTransformer(nn.Module)

* correct unconditional image

* save-load with mega pipeline

* remove image to text

* up

* uP

* fix

* up

* final fix

* remove_unused_weights

* test updates

* save progress

* uP

* fix dual prompts

* some fixes

* finish

* style

* finish renaming

* up

* fix

* fix

* fix

* finish

Co-authored-by: anton-l <anton@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants