<a href="https://colab.research.google.com/github/kaiu85/stable-diffusion-workshop/blob/main/Cool_Applications/inpainting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# In-painting pipeline for Stable Diffusion using 🧨 Diffusers 

This notebook shows how to do text-guided in-painting with Stable Diffusion model using  🤗 Hugging Face [🧨 Diffusers library](https://github.com/huggingface/diffusers). 

For a general introduction to the Stable Diffusion model please refer to this [colab](https://colab.research.google.com/github/kaiu85/stable-diffusion-workshop/blob/main/stable_diffusion.ipynb).

First, let us check again, if our instance is using a (GPU) graphics card to accelerate our computations. If yes, then !nvidia-smi should print out some informations, such as GPU type (likely Tesla T4), GPU memory (around 15GB), ... 
If this command fails, you can change the runtime settings via "Runtime -> Change runtime type" (German: "Laufzeit -> Laufzeittyp ändern") and select "GPU".

In [None]:
!nvidia-smi

Now let us install the fantastic diffusers library and some other required libraries again.

In [None]:
!pip install -qq -U diffusers==0.6.0 transformers ftfy gradio
!pip install git+https://github.com/huggingface/diffusers.git

Remember, that for in-painting, i.e. "filling holes" in a picture using a diffusion-model, the U-net has to be trained differently: It now also has to take the given image parts as an additional input (together with the text prompt), so that it can generate the missing parts consistently.

In this post we'll use the `runwayml/stable-diffusion-inpainting` model released by Runwayml so you'll need to  visit [its Huggingface model-card](https://huggingface.co/runwayml/stable-diffusion-inpainting) while being logged-in, read the license and tick the checkbox if you agree. 

If you accept the license while being logged-in with an account, for which you already have created an access token, you will be able to just use this token to log into Hugging face and download the pre-trained model. You can generate new tokens and view your existing tokens here: https://huggingface.co/settings/tokens.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

Now let's import the ``StableDiffusionInpaintPipeline`` and some other useful packages.

In [None]:
import inspect
from typing import List, Optional, Union

import numpy as np
import torch

import PIL
from PIL import Image
import gradio as gr
from diffusers import StableDiffusionInpaintPipeline

Now we can generate an inpainting pipeline object and move it to the GPU ( ```
device = "cuda"``).

In [None]:
device = "cuda"
model_path = "runwayml/stable-diffusion-inpainting"

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    model_path,
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)

As in previous notebooks, we define a helper function "image_grid", which just lets us display multiple images in a nice grid view.

In [None]:
import requests
from io import BytesIO

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = PIL.Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

In the next code cells, we download an image and a mask image. The mask image tells the pipeline, which pixels of the image should be replaced by samples from the latent-diffusion model (white) and which pixels should be kept (black).

In [None]:
# Dowload the image
!wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png

In [None]:
# Preprocess and display the image
filename = 'overture-creations-5sI6fQgYIuo.png'

image = Image.open(filename).convert("RGB")
image = image.resize((512, 512))
image

In [None]:
# Download the mask image
!wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png

In [None]:
# Preprocess and display the mask image
filename = 'overture-creations-5sI6fQgYIuo_mask.png'

mask_image = Image.open(filename).convert("RGB")
mask_image = mask_image.resize((512, 512))
mask_image

Remember that the mask image tells the pipeline, which pixels of the image should be replaced (white) and which pixels should be kept (black).

Now let's define a prompt, which guides the generation process and see, how the generated results look like for three (``num_samples``) samples.

In [None]:
prompt = "a tucan sitting on a bench"

guidance_scale=7.5
num_samples = 3
generator = torch.Generator(device="cuda").manual_seed(0) # change the seed to get different results

images = pipe(
    prompt=prompt,
    image=image,
    mask_image=mask_image,
    guidance_scale=guidance_scale,
    generator=generator,
    num_images_per_prompt=num_samples,
).images

To compare the in-painted images with the original image, we add the original images as first item to the list ``images``.

In [None]:
# insert initial image in the list so we can compare side by side
images.insert(0, image)

Now we display the list of images using our ``image_grid`` function (with a single row).

In [None]:
image_grid(images, 1, num_samples + 1)

To work with your own images, you can just open the file explorer on the left (Bildschirm­foto 2022-11-17 um 17.37.59.png-Icon) and upload (Bildschirm­foto 2022-11-17 um 17.42.25.png-Icon) your own image and mask image files and change the filenames in the above cells accordingly.

However, creating the mask files require you to use some image editing software ([GIMP](https://www.gimp.org) is a great, free and open source image editing software, but takes some time to learn).

Thus, to make experimenting with your own images easier and more fun, we will use the ``gradio`` package in the next few cells, to generate a **graphical user interface**, which lets you upload images and mask them easily.

### Interactive Gradio Demo

First we install the gradio package and import it as ``gr``.

In [None]:
!pip install -q gradio
import gradio as gr

Gradio needs a function, which it can call to generate an inpainted image, based on an image, a mask image, and a prompt. Thus, we define such a function in the next cell.

In [None]:
def predict(dict, prompt):
  image =  dict['image'].convert("RGB").resize((512, 512))
  mask_image = dict['mask'].convert("RGB").resize((512, 512))
  images = pipe(prompt=prompt, image=image, mask_image=mask_image).images
  return(images[0])

The next few lines of code just create a very simple user interface, where you can upload and mask an image (``gr.Image``) and add a text prompt (``gr.Textbox``).

In [None]:
gr.Interface(
    predict,
    title = 'Stable Diffusion In-Painting',
    inputs=[
        gr.Image(source = 'upload', tool = 'sketch', type = 'pil'),
        gr.Textbox(label = 'prompt')
    ],
    outputs = [
        gr.Image()
        ]
).launch(debug=True)

# Assignment

Again, please take your time to **play with the model**. Try different input images, try masking out different regions, and see how a prompt can change the inpainted results. Try also **filling the mask regions with an empty ('') prompt**. This way you can explore, what your model expects to be in the masked-out regions, when it is not guided by an additional text prompt.

**Please** collect suprisingly good, suprisingly bad, funny and interesting results [here](https://docs.google.com/presentation/d/1n5P9JIyYoISbRIfRgXvwmXFdETlPm1FQ_S7WuqKgqBo/edit?usp=sharing). Feel free to also add links, thoughts and comments, which you want to share with the group.