<a href="https://colab.research.google.com/github/kaiu85/stable-diffusion-workshop/blob/main/Cool_Applications/image_2_image.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image2Image Pipeline for Stable Diffusion using 🧨 Diffusers 

This notebook shows how to create a custom `diffusers` pipeline for  text-guided image-to-image generation with Stable Diffusion model using  🤗 Hugging Face [🧨 Diffusers library](https://github.com/huggingface/diffusers). 

For a general introduction to the Stable Diffusion model please refer to this [colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb).



In [None]:
!nvidia-smi

In [None]:
!pip install diffusers==0.3.0 transformers ftfy
!pip install -qq "ipywidgets>=7,<8"

You need to accept the model license before downloading or using the weights. In this post we'll use model version `v1-4`, so you'll need to  visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. 

You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).

Remember, you can create and find your Huggingface tokens 
at https://huggingface.co/settings/tokens.
Alternatively, you can log into your Huggingface.co account,
click on your profile picture on the upper right
and then navigating to "Settings -> Access Tokens".

In [None]:
from huggingface_hub import notebook_login

notebook_login()

## Image2Image pipeline.

In [None]:
import inspect
import warnings
from typing import List, Optional, Union

import torch
from torch import autocast
from tqdm.auto import tqdm

from diffusers import StableDiffusionImg2ImgPipeline

Load the pipeline

In [None]:
device = "cuda"
model_path = "CompVis/stable-diffusion-v1-4"

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    model_path,
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
)
pipe = pipe.to(device)

Download an initial image and preprocess it so we can pass it to the pipeline.

In [None]:
import requests
from io import BytesIO
from PIL import Image

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_img = Image.open(BytesIO(response.content)).convert("RGB")
init_img = init_img.resize((768, 512))
init_img

Define the prompt and run the pipeline.

In [None]:
prompt = "Photograph of a fantasy landscape, highest quality, DSLR."

Here, `strength` is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input.

In [None]:
generator = torch.Generator(device=device).manual_seed(1024)
with autocast("cuda"):
    image = pipe(prompt=prompt, init_image=init_img, strength=0.99, guidance_scale=7.5, generator=generator).images[0]

In [None]:
image

In [None]:
with autocast("cuda"):
    image = pipe(prompt=prompt, init_image=init_img, strength=0.5, guidance_scale=7.5, generator=generator).images[0]

In [None]:
image

As you can see, when using a lower value for `strength`, the generated image is more closer to the original `init_image`


Now using [LMSDiscreteScheduler](https://huggingface.co/docs/diffusers/api/schedulers#diffusers.LMSDiscreteScheduler)

In [None]:
from diffusers import LMSDiscreteScheduler

lms = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear")
pipe.scheduler = lms

In [None]:
generator = torch.Generator(device=device).manual_seed(1024)
with autocast("cuda"):
    image = pipe(prompt=prompt, init_image=init_img, strength=0.75, guidance_scale=7.5, generator=generator).images[0]

In [None]:
image

Interaktive Demo

In [None]:
!pip install -q gradio
import gradio as gr

In [None]:
def predict(image, prompt, strength, guidance_scale):
  init_img =  image.convert("RGB").resize((512, 512))
  with autocast("cuda"):
    images = pipe(prompt=prompt, init_image=init_img, strength=strength, guidance_scale=guidance_scale).images
  return(images[0])

In [None]:
gr.Interface(
    predict,
    title = 'Stable Diffusion Image-2-Image',
    inputs=[
        gr.Paint(type = 'pil', shape = (512, 512)), # USE THIS LINE TO DRAW ON AN EMPTY CANVAS
        #gr.ImagePaint(type = 'pil'), # USE THIS LINE to first UPLOAD and IMAGE and then (optionally) draw on it
        gr.Textbox(label = 'prompt', value = ''),
        gr.Number(label = 'noise strength', value = 0.75),
        gr.Number(label = 'guidance scale', value = 7.5)
    ],
    outputs = [
        gr.Image()
        ]
).launch(debug=True)