# Image generation with Latent Consistency Model and OpenVINO

This module is based on the OpenVINO notebook [Image generation with Latent Consistency Model and OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/latent-consistency-models-image-generation)

If you are running this on your own, not as part of a workshop, install packages using `requirements.txt` and run the `setup.py` script before using this notebook.

This module enables experimentation of loading the Text Encoder, UNet, and VAE Decoder models on different devices to see the effects on the overall inference time for the pipeline. It loads the original LCM_Dreamshaper pipeline, saving certain components from it, then it builds an OpenVINO pipeline that combines those components with OpenVINO versions of the Text Encoder, UNet, and VAE Decoder. You can select the inference device for each of those models - if the selection changes, it rebuilds the pipeline.

## Imports

In [None]:
import torch
import openvino as ov
import openvino.properties as properties
from diffusers import DiffusionPipeline
from pathlib import Path
import os, gc, time
import gradio as gr

import OVLatentConsistencyModelPipeline as ov_lcm
import LCM_utils as utils

core = ov.Core()

model_path = "models"
lcm_model_path = f"{model_path}/LCM_Dreamshaper_v7"
TEXT_ENCODER_OV_PATH = Path(f"{model_path}/text_encoder.xml")
UNET_OV_PATH = Path(f"{model_path}/unet.xml")
VAE_DECODER_OV_PATH = Path(f"{model_path}/vae_decoder.xml")

skip_safety_checker=False
prev_text_enc_device = None
prev_unet_device = None
prev_vae_device = None
ov_pipe = None

## Prepare inference pipeline
[back to top ⬆️](#Table-of-contents:)

![lcm-pipeline](https://user-images.githubusercontent.com/29454499/277402235-079bacfb-3b6d-424b-8d47-5ddf601e1639.png)

The pipeline takes a latent image representation and a text prompt is transformed to text embedding via CLIP's text encoder as an input. The initial latent image representation generated using random noise generator. In difference, with original Stable Diffusion pipeline, LCM also uses guidance scale for getting timestep conditional embeddings as input for diffusion process, while in Stable Diffusion, it used for scaling output latents.

Next, the U-Net iteratively *denoises* the random latent image representations while being conditioned on the text embeddings. The output of the U-Net, being the noise residual, is used to compute a denoised latent image representation via a scheduler algorithm. LCM introduces own scheduling algorithm that extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with non-Markovian guidance.
The *denoising* process is repeated given number of times (by default 50 in original SD pipeline, but for LCM small number of steps required ~2-8) to step-by-step retrieve better latent image representations.
When complete, the latent image representation is decoded by the decoder part of the variational auto encoder.

In [None]:
def generate(
    text_enc_device,
    unet_device,
    vae_device,
    prompt: str,
    num_inference_steps: int = 4,
    progress=gr.Progress(track_tqdm=True),
):
    global ov_pipe
    global text_enc_ov, unet_ov, vae_decoder_ov, prev_text_enc_device, prev_unet_device, prev_vae_device
    
    build_pipe = False
    
    seed = utils.randomize_seed_fn(seed=0, randomize_seed=True)
    torch.manual_seed(seed)

    # Compile each model to the specified device. If the model is already compiled to that device, then don't recompile
    if text_enc_device != prev_text_enc_device:
        text_enc_ov = core.compile_model(TEXT_ENCODER_OV_PATH, text_enc_device)
        prev_text_enc_device = text_enc_device
        build_pipe = True

    if unet_device != prev_unet_device:
        unet_ov = core.compile_model(UNET_OV_PATH, unet_device)         
        prev_unet_device = unet_device
        build_pipe = True

    if vae_device != prev_vae_device:
        ov_config = {"INFERENCE_PRECISION_HINT": "f32"} if vae_device != "CPU" else {}
        vae_decoder_ov = core.compile_model(VAE_DECODER_OV_PATH, vae_device, ov_config)
        prev_vae_device = vae_device
        build_pipe = True

    # Configure the pipeline, enabling the optional safety checker, which detects NSFW content
    # This uses the above compiled models, and reuses the tokenizer, feature extractor, scheduler, 
    #   and safety checker from the original LCM pipeline
    if build_pipe == True:
        output_msg = "(Re)building pipeline.\n"
        pipe = DiffusionPipeline.from_pretrained(lcm_model_path)
        scheduler = pipe.scheduler
        tokenizer = pipe.tokenizer
        feature_extractor = pipe.feature_extractor if not skip_safety_checker else None
        safety_checker = pipe.safety_checker if not skip_safety_checker else None
        del pipe
        gc.collect()
        ov_pipe = ov_lcm.OVLatentConsistencyModelPipeline(
            tokenizer=tokenizer,
            text_encoder=text_enc_ov,
            unet=unet_ov,
            vae_decoder=vae_decoder_ov,
            scheduler=scheduler,
            feature_extractor=feature_extractor,
            safety_checker=safety_checker,
        )
    else:
        output_msg = "Running on existing pipeline.\n"
        
    output_msg = output_msg + f"  Text Encoder running on {text_enc_ov.get_property(properties.execution_devices)}\n"
    output_msg = output_msg + f"  UNet running on {unet_ov.get_property(properties.execution_devices)}\n"
    output_msg = output_msg + f"  VAE Decoder running on {vae_decoder_ov.get_property(properties.execution_devices)}\n"    
        
    start_time = time.time()
    
    result = ov_pipe(
        prompt=prompt,
        width=512,
        height=512,
        guidance_scale=8.0,
        num_inference_steps=num_inference_steps,
        num_images_per_prompt=1,
        lcm_origin_steps=50,
        output_type="pil",
    ).images
    output_msg = output_msg + f"\nRan inference in {time.time() - start_time:.2f} seconds"
    for img in result:
        time.sleep(1)
        yield img[0], output_msg
    return

## Interactive demo

In [None]:
demo = utils.build_gr_blocks(generate)
demo.queue().launch(share=False, inline=True, height=1000)