## Stable Diffusion Text-guided In-painting on IPU

This notebook demonstrates how a stable diffusion inference pipeline can be run on Graphcore IPUs.

### Requirements

* An enabled Poplar SDK environment (or Paperspace account with access to the PyTorch IPU runtime)
* Additional dependencies installable via pip (done below)
* Access to the pretrained Stable-Diffusion-v1-5 checkpoint (done below)

In [None]:
%%capture
!pip install -r requirements.txt
!pip install "ipywidgets>=7,<8"

Values for machine size and cache directories can be configured through environment variables or directly in the notebook:

In [None]:
import os

pod_type = os.getenv("GRAPHCORE_POD_TYPE", "pod16")
executable_cache_dir = os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "/tmp/exe_cache/")

To download the pretrained Stable-Diffusion-v1-5 checkpoint, we must first authenticate to the Hugging Face Hub. Begin by creating a read access token on the [Hugging Face website](https://huggingface.co/settings/tokens) (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your read token:

In [None]:
from huggingface_hub import notebook_login

notebook_login()

If you have not done so already, you will need to accept the User License on the [model page](https://huggingface.co/runwayml/stable-diffusion-inpainting).

### Pipeline Creation

We are now ready to import and run the pipeline.

In [None]:
import torch

from ipu_models import IPUStableDiffusionInpaintPipeline

In [None]:
pipe = IPUStableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", 
    revision="fp16", 
    torch_dtype=torch.float16,
    ipu_config={
        "executable_cache_dir": executable_cache_dir,
    }
)
pipe.enable_attention_slicing()

In [None]:
image_width = os.getenv("STABLE_DIFFUSION_INPAINT_DEFAULT_WIDTH", default=512)
image_height = os.getenv("STABLE_DIFFUSION_INPAINT_DEFAULT_HEIGHT", default=512)
image_dimensions = (image_width, image_height)

We run a dummy generation step to trigger the one-time compilation process. This should take on the order of 15 minutes.

In [None]:
from PIL import Image

pipe("apple", 
     image=Image.new("RGB", image_dimensions), 
     mask_image=Image.new("RGB", image_dimensions, (255, 255, 255)), 
     guidance_scale=7.5
);

### Image Generation

We preprocess and visualize a context image which will be used to initialize the latents passed to the UNet.

In [None]:
import requests
from io import BytesIO

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

def download_image(url):
    response = requests.get(url)
    return Image.open(BytesIO(response.content)).convert("RGB")

In [None]:
image = download_image(img_url).resize(image_dimensions)
image

In [None]:
mask_image = download_image(mask_url).resize(image_dimensions)
mask_image

Below you will find an example prompt. We encourage you to try your own!

In [None]:
prompt = "a mecha robot sitting on a bench"
pipe(prompt, image=image, mask_image=mask_image, guidance_scale=7.5).images[0]