diff --git a/docs/source/en/api/pipelines/stable_unclip.mdx b/docs/source/en/api/pipelines/stable_unclip.mdx index 372242ae2dff..ee359d0ba486 100644 --- a/docs/source/en/api/pipelines/stable_unclip.mdx +++ b/docs/source/en/api/pipelines/stable_unclip.mdx @@ -32,12 +32,50 @@ we do not add any additional noise to the image embeddings i.e. `noise_level = 0 * [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip) * [stabilityai/stable-diffusion-2-1-unclip-small](https://hf.co/stabilityai/stable-diffusion-2-1-unclip-small) * Text-to-image - * Coming soon! + * [stabilityai/stable-diffusion-2-1-unclip-small](https://hf.co/stabilityai/stable-diffusion-2-1-unclip-small) ### Text-to-Image Generation +Stable unCLIP can be leveraged for text-to-image generation by pipelining it with the prior model of KakaoBrain's open source DALL-E 2 replication [Karlo](https://huggingface.co/kakaobrain/karlo-v1-alpha) + +```python +import torch +from diffusers import UnCLIPScheduler, DDPMScheduler, StableUnCLIPPipeline +from diffusers.models import PriorTransformer +from transformers import CLIPTokenizer, CLIPTextModelWithProjection + +prior_model_id = "kakaobrain/karlo-v1-alpha" +data_type = torch.float16 +prior = PriorTransformer.from_pretrained(prior_model_id, subfolder="prior", torch_dtype=data_type) + +prior_text_model_id = "openai/clip-vit-large-patch14" +prior_tokenizer = CLIPTokenizer.from_pretrained(prior_text_model_id) +prior_text_model = CLIPTextModelWithProjection.from_pretrained(prior_text_model_id, torch_dtype=data_type) +prior_scheduler = UnCLIPScheduler.from_pretrained(prior_model_id, subfolder="prior_scheduler") +prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config) + +stable_unclip_model_id = "stabilityai/stable-diffusion-2-1-unclip-small" + +pipe = StableUnCLIPPipeline.from_pretrained( + stable_unclip_model_id, + torch_dtype=data_type, + variant="fp16", + prior_tokenizer=prior_tokenizer, + prior_text_encoder=prior_text_model, + prior=prior, + prior_scheduler=prior_scheduler, +) + +pipe = pipe.to("cuda") +wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular" + +images = pipe(prompt=wave_prompt).images +images[0].save("waves.png") +``` + -Coming soon! +For text-to-image we use `stabilityai/stable-diffusion-2-1-unclip-small` as it was trained on CLIP ViT-L/14 embedding, the same as the Karlo model prior. [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip) was trained on OpenCLIP ViT-H, so we don't recommend its use. + ### Text guided Image-to-Image Variation