# Stable diffusion demo

This notebook is adapted from [this repository](https://github.com/woctezuma/stable-diffusion-colab) covered under [MIT License](https://github.com/woctezuma/stable-diffusion-colab/blob/main/LICENSE). It allows to run the Stable Diffusion model *directly*, without going through as service that "hides" the model like openAI.

## Installation

First we need to install a few libraries, notably transformers and diffusers:

In [2]:
%pip install --quiet --upgrade diffusers transformers accelerate invisible_watermark mediapy

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m91.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m30.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m91.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.1/311.1 kB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m83.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m103.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

We need to decide also if we want to use a "refiner" which gives a high-resolution image.

In [3]:
use_refiner = True

## Model loading

Now we load the model which is stored on HuggingFace. Specifically we load a pre-trained version. This can take a while as the whole model with its weights has to be downloaded.

In [4]:
import mediapy as media
import random
import sys
import torch

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
    )

if use_refiner:
  refiner = DiffusionPipeline.from_pretrained(
      "stabilityai/stable-diffusion-xl-refiner-1.0",
      text_encoder_2=pipe.text_encoder_2,
      vae=pipe.vae,
      torch_dtype=torch.float16,
      use_safetensors=True,
      variant="fp16",
  )

  refiner = refiner.to("cuda")

  pipe.enable_model_cpu_offload()
else:
  pipe = pipe.to("cuda")

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

Downloading (…)ain/model_index.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Fetching 19 files:   0%|          | 0/19 [00:00<?, ?it/s]

Downloading (…)ncoder_2/config.json:   0%|          | 0.00/575 [00:00<?, ?B/s]

Downloading (…)tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading (…)cheduler_config.json:   0%|          | 0.00/479 [00:00<?, ?B/s]

Downloading (…)_encoder/config.json:   0%|          | 0.00/565 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

Downloading (…)tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]

Downloading (…)a7b/unet/config.json:   0%|          | 0.00/1.68k [00:00<?, ?B/s]

Downloading (…)4a7b/vae/config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading (…)kenizer_2/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading model.fp16.safetensors:   0%|          | 0.00/1.39G [00:00<?, ?B/s]

Downloading model.fp16.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/5.14G [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Downloading (…)ain/model_index.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

Downloading (…)cheduler_config.json:   0%|          | 0.00/479 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]

Downloading (…)kenizer_2/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading (…)kenizer_2/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Downloading (…)356/unet/config.json:   0%|          | 0.00/1.71k [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/4.52G [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

In [27]:
prompt = "a cartoon of a squirell playing violin on a tree under the rain"
seed = random.randint(0, sys.maxsize)

#negative_prompt = "what you don't want to see"
negative_prompt = ""

images = pipe(
    prompt = prompt,
    negative_prompt = negative_prompt,
    output_type = "latent" if use_refiner else "pil",
    generator = torch.Generator("cuda").manual_seed(seed),
    ).images

if use_refiner:
  images = refiner(
      prompt = prompt,
      negative_prompt = negative_prompt,
      image = images,
      ).images

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

In [28]:
print(f"Prompt:\t{prompt}\nSeed:\t{seed}")
media.show_images(images)
images[0].save("output1.jpg")

Prompt:	a cartoon of a squirell playing violin on a tree under the rain
Seed:	9029145675535114567
