<a href="https://colab.research.google.com/github/sjain-21/cv/blob/main/CV_A0_model_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Stable Diffusion** 🎨
*...using `🧨diffusers`*

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It utilises multi-aspect training on images of varying dimensions to adhere with real world datasets. This model uses frozen CLIP ViT-L & OpenCLIP ViT-bigG text encoder to condition the model on text prompts. With its 2.6B UNet, the model is relatively heavier than Stable Diffusion 1.5(860M) but can still run on many consumer GPUs.
See the [model card](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) for more information.

This Colab notebook shows how to use Stable Diffusion with the 🤗 Hugging Face [🧨 Diffusers library](https://github.com/huggingface/diffusers).

Let's get started!

## 1. How to use `StableDiffusionXLPipeline`

Before diving into the theoretical aspects of how Stable Diffusion functions,
let's try it out a bit 🤗.

In this section, we show how you can run text to image inference in just a few lines of code!

### Setup

First, please make sure you are using a GPU runtime to run this notebook, so inference is much faster. If the following command fails, use the `Runtime` menu above and select `Change runtime type`.

In [None]:
!nvidia-smi

Next, you should install `diffusers` as well `scipy`, `ftfy` and `transformers`. `accelerate` is used to achieve much faster loading.

In [None]:
!pip install diffusers==0.30.2
!pip install transformers scipy ftfy accelerate

### Stable Diffusion Pipeline

`StableDiffusionPipeline` is an end-to-end inference pipeline that you can use to generate images from text with just a few lines of code. We will however explore `StableDiffusionXLPipeline`, which utilises SDXL, a diffusion model that far outperforms Stable Diffusion 1.5

First, we load the pre-trained weights of all components of the model. In this notebook we use Stable Diffusion XL version 1.0 ([stabilityai/stable-diffusion-xl-base-1.0 ](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)), but there are other variants that you may want to try:
* [pt-sk/stable-diffusion-1.5](https://huggingface.co/pt-sk/stable-diffusion-1.5)
* [stabilityai/stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base)
* [stabilityai/stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1). This version can produce images with a resolution of 768x768, while the others work at 512x512.

In addition to the model id ([stabilityai/stable-diffusion-xl-base-1.0 ](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)), we're also passing a specific  `torch_dtype` to the `from_pretrained` method.

We want to ensure that every free Google Colab can run Stable Diffusion, hence we use the fp16 variant and tell `diffusers` to expect the weights in float16 precision by passing `torch_dtype=torch.float16`.

If you want to ensure the highest possible precision, please make sure to remove `torch_dtype=torch.float16` at the cost of a higher memory usage.

In [None]:
import torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16")

Next, let's move the pipeline to GPU to have faster inference.

In [None]:
pipe = pipe.to("cuda")

And we are ready to generate images:

In [None]:
prompt = "a photograph of Central Park in the style of Leonardo Da Vinci"
image = pipe(prompt).images[0]  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)

# Now to display an image you can either save it such as:
image.save(f"cp_x_vinci.png")

# or if you're in a google colab you can directly display it with
image

In [None]:
prompt = "a photograph of Statute of Liberty in the style of Claude Monet"
image = pipe(prompt).images[0]  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)

# Now to display an image you can either save it such as:
image.save(f"sol_x_monet.png")

# or if you're in a google colab you can directly display it with
image

Running the above cell multiple times will give you a different image every time. If you want deterministic output you can pass a random seed to the pipeline. Every time you use the same seed you'll have the same image result.