# Stable Diffusion v1.5 using OpenVINO `TorchDynamo` backend

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It is trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder. See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.

This notebook demonstrates how to run stable diffusion model using [Diffusers](https://huggingface.co/docs/diffusers/index) library and [OpenVINO `TorchDynamo` backend](https://docs.openvino.ai/2023.1/pytorch_2_0_torch_compile.html) for Text-to-Image and Image-to-Image generation tasks.

# Prerequisites


In [None]:
%pip install openvino
%pip install torch==2.0.1
%pip install diffusers==0.17.1
%pip install transformers
%pip install gradio==3.36.1

You can export `OV_DEVICE` environmental variable to choose the inference device between "CPU", "GPU" or "GPU.0" mean Intel integrated GPU, "GPU.1" – Intel discrete GPU. If the system does not have an integrated GPU, use "GPU.0" for Intel discrete GPU. Read more about [Device Naming Convention](https://docs.openvino.ai/2023.0/openvino_docs_OV_UG_supported_plugins_GPU.html).

`OPENVINO_TORCH_MODEL_CACHING` variable enables saving the optimized model files to a hard drive, after the first application run. This makes them available for the following application executions, reducing the first-inference latency.

Read more about available [Environment Variables options](https://docs.openvino.ai/2023.1/pytorch_2_0_torch_compile.html#environment-variables).

In [None]:
%env OPENVINO_TORCH_BACKEND_DEVICE="CPU"
%env OPENVINO_TORCH_MODEL_CACHING=1

In [None]:
import gradio as gr
import random
import torch
import time
import openvino.torch # noqa: F401

from socket import gethostbyname, gethostname
from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline

# Stable Diffusion with Diffusers library

To work with Stable Diffusion v1, we will use Hugging Face Diffusers library. To experiment with Stable Diffusion models, Diffusers exposes the [StableDiffusionPipeline](https://huggingface.co/docs/diffusers/using-diffusers/conditional_image_generation) and [StableDiffusionImg2ImgPipeline](https://huggingface.co/docs/diffusers/using-diffusers/img2img) similar to the other [Diffusers pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview). The code below demonstrates how to create the `StableDiffusionPipeline` and `StableDiffusionImg2ImgPipeline` using `stable-diffusion-1-5` model:

In [None]:
model_id = "runwayml/stable-diffusion-v1-5"

# Pipeline for text-to-image generation
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)

# Pipeline for text-guided image-to-image generation
pipe_i2i = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float32)

The [OpenVINO TorchDynamo backend]() lets you enable [OpenVINO](https://docs.openvino.ai/2023.0/home.html) support for PyTorch models with minimal changes to the original PyTorch script.

Now we can enable the OpenVINO optimization just with [`torch.compile()` method](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html):

In [None]:
pipe.unet = torch.compile(pipe.unet, backend="openvino")
pipe_i2i.unet = torch.compile(pipe_i2i.unet, backend="openvino")

Define the inference methods for text-to-image and image-to-image generation using Diffusers pipeline:

In [None]:
time_stamps = []


def callback(iter, t, latents):
    time_stamps.append(time.time())


def txt_to_img(prompt, neg_prompt, guidance, steps, width, height, generator):
    return pipe(
        prompt,
        negative_prompt=neg_prompt,
        num_inference_steps=int(steps),
        guidance_scale=guidance,
        width=width,
        height=height,
        generator=generator,
        callback=callback,
        callback_steps=1).images


def img_to_img(prompt, neg_prompt, img, strength, guidance, steps, generator):
    img = img['image']
    return pipe_i2i(
        prompt,
        negative_prompt=neg_prompt,
        image=img,
        num_inference_steps=int(steps),
        strength=strength,
        guidance_scale=guidance,
        generator=generator,
        callback=callback,
        callback_steps=1).images

# Run Text-to-Image or Image-to-Image generation
Now you can start the demo, choose the inference mode, define prompts (and input image for Image-to-Image generation) and run inference pipeline.
Optionally, you can also change some input parameters.

In [None]:
def error_str(error, title="Error"):
    return f"""#### {title}
            {error}""" if error else ""


def on_mode_change(mode):
    return gr.update(visible=mode == modes['img2img']), \
        gr.update(visible=mode == modes['txt2img'])


def inference(inf_mode, prompt, guidance=7.5, steps=25, width=768, height=768, seed=-1, img=None, strength=0.5, neg_prompt=""):
    if seed == -1:
        seed = random.randint(0, 10000000)
    generator = torch.Generator().manual_seed(seed)
    res = None

    global time_stamps
    time_stamps = []
    try:
        if inf_mode == modes['txt2img']:
            res = txt_to_img(prompt, neg_prompt, guidance, steps, width, height, generator)
        elif inf_mode == modes['img2img']:
            if img is None:
                return None, None, gr.update(visible=True, value=error_str("Image is required for Image to Image mode"))
            res = img_to_img(prompt, neg_prompt, img, strength, guidance, steps, generator)
    except Exception as e:
        return None, None, gr.update(visible=True, value=error_str(e))
    
    warmup_duration = time_stamps[1] - time_stamps[0]
    generation_rate = (steps - 1) / (time_stamps[-1] - time_stamps[1])
    res_info = "Warm up time: " + str(round(warmup_duration, 2)) + " secs "
    if (generation_rate >= 1.0):
        res_info = res_info + ", Performance: " + str(round(generation_rate, 2)) + " it/s "
    else:
        res_info = res_info + ", Performance: " + str(round(1 / generation_rate, 2)) + " s/it "

    return res, gr.update(visible=True, value=res_info), gr.update(visible=False, value=None)


modes = {
    'txt2img': 'Text to Image',
    'img2img': 'Image to Image',
}

with gr.Blocks(css="style.css") as demo:
    gr.HTML(
        f"""
            Model used: {model_id}         
        """
    )
    with gr.Row():

        with gr.Column(scale=60):
            with gr.Group():
                prompt = gr.Textbox("a photograph of an astronaut riding a horse", label="Prompt", max_lines=2)
                neg_prompt = gr.Textbox("frames, borderline, text, character, duplicate, error, out of frame, watermark, low quality, ugly, deformed, blur", label="Negative prompt")
                res_img = gr.Gallery(label="Generated images", show_label=False)
            error_output = gr.Markdown(visible=False)

        with gr.Column(scale=40):
            generate = gr.Button(value="Generate")

            with gr.Group():
                inf_mode = gr.Dropdown(list(modes.values()), label="Inference Mode", value=modes['txt2img'])
                
                with gr.Column(visible=False) as i2i:
                    image = gr.Image(label="Image", height=128, type="pil", tool='sketch')
                    strength = gr.Slider(label="Transformation strength", minimum=0, maximum=1, step=0.01, value=0.5)

            with gr.Group():
                with gr.Row() as txt2i:
                    width = gr.Slider(label="Width", value=512, minimum=64, maximum=1024, step=8)
                    height = gr.Slider(label="Height", value=512, minimum=64, maximum=1024, step=8)

            with gr.Group():
                with gr.Row():
                    steps = gr.Slider(label="Steps", value=20, minimum=1, maximum=50, step=1)
                    guidance = gr.Slider(label="Guidance scale", value=7.5, maximum=15)

                seed = gr.Slider(-1, 10000000, label='Seed (-1 = random)', value=-1, step=1)
            
            res_info = gr.Markdown(visible=False)

    inf_mode.change(on_mode_change, inputs=[inf_mode], outputs=[
                    i2i, txt2i], queue=False)

    inputs = [inf_mode, prompt, guidance, steps,
              width, height, seed, image, strength, neg_prompt]
    
    outputs = [res_img, res_info, error_output]
    prompt.submit(inference, inputs=inputs, outputs=outputs)
    generate.click(inference, inputs=inputs, outputs=outputs)

ipaddr = gethostbyname(gethostname())
demo.queue().launch(debug=True, server_name=ipaddr)
