Skip to content

Stable Diffusion Unclip Broken #2831

@nbardy

Description

@nbardy

Describe the bug

StableUnCLIPPipeline doesn't work with default values. It's missing the prior models. I'm trying to update them here and even converted a checkpoint model, but can't see to get it working yet.

Reproduction

!pip install git+https://github.com/huggingface/diffusers@main transformers accelerate scipy safetensors xformers

import requests
import torch
from PIL import Image
from io import BytesIO
from diffusers import UnCLIPScheduler, DDPMScheduler
from diffusers.models import PriorTransformer
from transformers import CLIPTokenizer, CLIPTextModelWithProjection
from diffusers import StableUnCLIPPipeline, UNet2DConditionModel

karlo_model = "kakaobrain/karlo-v1-alpha"
prior = PriorTransformer.from_pretrained(karlo_model, subfolder="prior")

clip_name = "openai/clip-vit-large-patch14"
#clip_name = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K
prior_tokenizer = CLIPTokenizer.from_pretrained(clip_name)
prior_text_model = CLIPTextModelWithProjection.from_pretrained(clip_name)

prior_scheduler = UnCLIPScheduler.from_pretrained(karlo_model, subfolder="prior_scheduler")
prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config)

unet = UNet2DConditionModel.from_pretrained("Nbardy/stable-diffusion-unclip-diffusers", subfolder="unet")

#Start the StableUnCLIP Image variations pipeline
pipe = StableUnCLIPPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1-unclip",
#revision="sd21-unclip-l.ckpt",
torch_dtype=torch.float16, variation="fp16",
unet=unet,
prior_tokenizer=prior_tokenizer,
prior_text_encoder=prior_text_model,
prior=prior,
prior_scheduler=prior_scheduler,
)

pipe = pipe.to('cuda')
wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular"
negative_prompt = "((disfigured)), ((bad art)), ((deformed)),((extra limbs)),((close up)),((b&w)), wierd colors, blurry, (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, 3d render, bad-artist bad_prompt_version2"
#Pipe to make the variation
images = pipe(prompt=wave_prompt).images
images[0].save("tarsila_variation.png")
display(images[0])

Logs

No response

System Info

!pip install git+https://github.com/huggingface/diffusers@main transformers accelerate scipy safetensors xformers

in colab is my setup

https://colab.research.google.com/drive/1y7som7KnaTOWuXAWYDIkCTVSm2otX9_R?usp=sharing

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions