Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unCLIP image variation #1781

Merged

Conversation

williamberman
Copy link
Contributor

@williamberman williamberman commented Dec 20, 2022

Adds an unclip image variation pipeline

Converting the text to image pipeline to image variation

I uploaded the pipeline to https://huggingface.co/fusing/karlo-image-variations-diffusers if you want to skip this

From the diffusers root directory:

$ python scripts/convert_unclip_txt2img_to_image_variation.py --dump_path <path to save model>

Using the model

import os
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"

from diffusers import UnCLIPImageVariationPipeline
import torch
import random
import numpy as np
import PIL
from PIL import Image


def set_seed(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
set_seed(0)

torch.backends.cuda.matmul.allow_tf32 = False
torch.use_deterministic_algorithms(True)
torch.set_printoptions(precision=40)

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid


pipe = UnCLIPImageVariationPipeline.from_pretrained("fusing/karlo-image-variations-diffusers")
pipe = pipe.to('cuda')

# See image below to use as input
image = PIL.Image.open("./test.jpg") 

images = pipe(image, num_images_per_prompt=4).images

image_grid(images, 1, 4).save('./out.jpg')

test

out

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 20, 2022

The documentation is not available anymore as the PR was closed or merged.

@williamberman williamberman force-pushed the unclip_image_variation branch 4 times, most recently from 241d482 to 54176fc Compare December 21, 2022 02:46
@williamberman williamberman marked this pull request as ready for review December 21, 2022 03:14
@williamberman williamberman changed the title Unclip image variation unCLIP image variation Dec 21, 2022
Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks a lot for adding the pipeline! Would be nice to add copied from .. comments wherever possible.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@@ -0,0 +1,454 @@
# Copyright 2022 Kakao Brain and The HuggingFace Team. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this so? Or is it just Hugging Face for the code? Just wondering, no idea how those things work!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the licensing @patrickvonplaten added to the text to image pipeline. We should probably clarify with him :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it's fine to mention Kakao Brain, since we use their code as a reference when implementing it here.

tests/pipelines/unclip/test_unclip.py Show resolved Hide resolved
Comment on lines +1 to +32
import argparse

from diffusers import UnCLIPImageVariationPipeline, UnCLIPPipeline
from transformers import CLIPImageProcessor, CLIPVisionModelWithProjection


if __name__ == "__main__":
parser = argparse.ArgumentParser()

parser.add_argument("--dump_path", default=None, type=str, required=True, help="Path to the output model.")

parser.add_argument(
"--txt2img_unclip",
default="kakaobrain/karlo-v1-alpha",
type=str,
required=False,
help="The pretrained txt2img unclip.",
)

args = parser.parse_args()

txt2img = UnCLIPPipeline.from_pretrained(args.txt2img_unclip)

feature_extractor = CLIPImageProcessor()
image_encoder = CLIPVisionModelWithProjection.from_pretrained("openai/clip-vit-large-patch14")

img2img = UnCLIPImageVariationPipeline(
decoder=txt2img.decoder,
text_encoder=txt2img.text_encoder,
tokenizer=txt2img.tokenizer,
text_proj=txt2img.text_proj,
feature_extractor=feature_extractor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very informative, but I'm not sure we store this kind of scripts in the repo. The ones in the folder are usually about converting weights from other checkpoints. What do you think @patil-suraj?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to remove and just put in the PR description! lmk @patil-suraj

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, think no need to have this script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better than not having a script at all. Think it's totally fine to leave it here as is. The main purpose of the scripts is really so that the user can convert the checkpoints themselves - I'm fine with the way it is. Better would be to directly convert from the original checkpoint, but for me this is ok as well and def better than not having anything.

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Also, think it would be nice to add a doc page explaining the unCLIP pipelines. It's the first cascaded pipeline in diffusers, so would be nice to document the different components and how they work.

@@ -0,0 +1,454 @@
# Copyright 2022 Kakao Brain and The HuggingFace Team. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it's fine to mention Kakao Brain, since we use their code as a reference when implementing it here.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice looks good to me!

@patrickvonplaten patrickvonplaten merged commit 53c8147 into huggingface:main Dec 28, 2022
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* unCLIP image variation

* remove prior comment re: @pcuenca

* stable diffusion -> unCLIP re: @pcuenca

* add copy froms re: @patil-suraj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants