# **Simplified Stable Diffusion**


Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/).
It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and can run on many consumer GPUs.
For the further information about Stable Diffusion and this notebook, check out [Stable Diffusion Website](https://stability.ai/blog/stable-diffusion-public-release).



### Setup

Make sure to use a GPU runtime to run this notebook, so inference is much faster.
If the following command fails, use the `Runtime` menu above and select `Change runtime type`.

In [None]:
!nvidia-smi

Wed Jul 12 00:13:36 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   52C    P8    12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

Then, install `diffusers`,`transformers`,`scipy`, and `ftfy`. `accelerate` is used for faster loading.

In [None]:
!pip install diffusers==0.11.1
!pip install transformers scipy ftfy accelerate
!pip install anvil-uplink

Collecting diffusers==0.11.1
  Downloading diffusers-0.11.1-py3-none-any.whl (524 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m524.9/524.9 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting importlib-metadata (from diffusers==0.11.1)
  Downloading importlib_metadata-6.8.0-py3-none-any.whl (22 kB)
Collecting huggingface-hub>=0.10.0 (from diffusers==0.11.1)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: importlib-metadata, huggingface-hub, diffusers
Successfully installed diffusers-0.11.1 huggingface-hub-0.16.4 importlib-metadata-6.8.0
Collecting transformers
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m43.6 MB/s[0m eta [36m0:00:00[0m
Collecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.w

### Stable Diffusion Pipeline

`StableDiffusionPipeline` is an end-to-end inference pipeline that you can use to generate images from text with just a few lines of code.

First, load the pre-trained weights of all components of the model. Stable Diffusion version 1.4 ([CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)) is used for this notebook. In addition to the modelid [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4), we're also passing a specific `revision` and `torch_dtype` to the `from_pretrained` method.

To make sure that it works for every free Google Colab, the weights from the half-precision branch [`fp16`](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/fp16) are loaded and also tell `diffusers` to expect the weights in float16 precision by passing `torch_dtype=torch.float16`.


In [None]:
# network import
import torch
from diffusers import StableDiffusionPipeline

# anvil interface imports
import anvil.server
anvil.server.connect("server_H25NBV2WLDYQHXHJGONGUORW-KCKQ4L2USRALWBPW")

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)

Connecting to wss://anvil.works/uplink
Anvil websocket open
Connected to "Default Environment" as SERVER


Downloading (…)ain/model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

text_encoder/pytorch_model.fp16.safetensors not found


Fetching 28 files:   0%|          | 0/28 [00:00<?, ?it/s]

Downloading (…)nfig-checkpoint.json:   0%|          | 0.00/209 [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

Downloading (…)_checker/config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

Downloading (…)cheduler_config.json:   0%|          | 0.00/313 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading model.fp16.safetensors:   0%|          | 0.00/608M [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading pytorch_model.fp16.bin:   0%|          | 0.00/608M [00:00<?, ?B/s]

Downloading (…)_encoder/config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/492M [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

Downloading model.fp16.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

Downloading pytorch_model.fp16.bin:   0%|          | 0.00/246M [00:00<?, ?B/s]

Downloading (…)tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

Downloading (…)tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Downloading (…)9ce/unet/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

Downloading (…)on_pytorch_model.bin:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

Downloading (…)torch_model.fp16.bin:   0%|          | 0.00/1.72G [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/1.72G [00:00<?, ?B/s]

Downloading (…)ch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

Downloading (…)69ce/vae/config.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

Downloading (…)on_pytorch_model.bin:   0%|          | 0.00/335M [00:00<?, ?B/s]

Downloading (…)torch_model.fp16.bin:   0%|          | 0.00/167M [00:00<?, ?B/s]

Downloading (…)del.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Downloading (…)ch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
The config attributes {'scaling_factor': 0.18215} were passed to AutoencoderKL, but are not expected and will be ignored. Please verify your config.json configuration file.


Then, move the pipeline to GPU to have faster inference.

In [None]:
pipe = pipe.to("cuda")

In [None]:
import io

def img_to_media_obj(img):
  img_byte_arr = io.BytesIO()
  img.save(img_byte_arr, format='JPEG')
  img_byte_arr = img_byte_arr.getvalue()
  media_obj = anvil.BlobMedia(content_type="image/jpeg", content=img_byte_arr)
  return media_obj

And we are ready to generate images:

In [None]:
@anvil.server.callable
def generate_image(user_input):
  # Write a prompt message about the image that you want to generate(This gives slightly different images with the same prompt)
  image = pipe(user_input).images[0]  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)

  # Now to display an image, you can either save it such as:
  #image.save(f"a flying platypus.png")

  # or if you're in a google colab you can directly display it with
  #return image
  return img_to_media_obj(image)

In [None]:
anvil.server.wait_forever()

  0%|          | 0/50 [00:00<?, ?it/s]