# GUI for Stable Diffusion

# Stable Diffusion
Written by Jasmine Sandhu (Acknowledgements: Jim Bednar, Maxime Liquet, Philipp Rudiger)<br>
Created: Jan, 2023<br>
Last updated: Jan, 2023

## Stable Diffusion, Diffusers library

[Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion#:~:text=Stable%20Diffusion%20is%20a%20deep,guided%20by%20a%20text%20prompt) is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions. 

This example uses the [Diffusers library](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) with checkpoints from the runwayml and CompVis repositories. [Diffusers on github](https://github.com/huggingface/diffusers#stable-diffusion-is-fully-compatible-with-diffusers). Blogpost on [Stable Diffusion with Diffusers](https://huggingface.co/blog/stable_diffusion)

### Performance: GPU

The example assumes it will run on a GPU. It can be modified to run on a CPU but image generation will take on the order of minutes as opposed to seconds.


### Limitations

The models were trained on images with resolution of 512x512. The diffusers pipeline and subsequently the UI allows creation of images with different resolutions; however, the image quality degrades if deviating from the resolution used to train the model. 


### Seed

The idea behind stable diffusion is to start with a noisy image, with the goal of removing gaussian noise in each inference step. The seed value determines the randomness and the output generated. By default the seed is randomized in this application with the opportunity to explore generated images for the same prompt. Fixing the seed will recreate the same image for a given resolution. As noted above, changing the resolution will also change the image output.

In [None]:
import time
from collections import deque
from contextlib import contextmanager

import torch
import random
from diffusers import StableDiffusionPipeline

import panel as pn
from bokeh.models.formatters import PrintfTickFormatter

pn.extension()

In [None]:
# create a context manager to measure execution time and print it to the console
@contextmanager
def exec_time(description="Task"):
    st = time.perf_counter()
    yield 
    print(f"{description}: {time.perf_counter() - st:.2f} sec")


The `init_model` function will first look in the default cache location used by huggingface to find downloaded pretrained model. If these haven't been downloaded yet, it will first download the models. On subsequent restarts of the app, it'll load the models from the local cache. These can also be downloaded separately as follows:
  
  ```
  pipe, cache = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", return_cached_folder=True, local_files_only=False)
  pipe, cache = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", return_cached_folder=True, local_files_only=False)
  print(cache) # to see the default cache location
  ```

In addition to caching the pretrained model, we also initialize and cache the diffusers pipeline inside `panel.state.cache`. This ensures that each new visitor to the page does not require creating and destroying a new diffusers pipeline.
The initial page load takes an extra ~10 sec or so and allocates the GPU memory required to load the pipeline in memory but subsequent visitors get this pipeline from panel's cache. The memory overhead from here is the amount needed to generate the image  text prompt.
Below is an example output of the `nvidia-smi` running on a machine with 2 Quadro RTX 8000 GPUs, after both models load.

```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 8000     Off  | 00000000:15:00.0 Off |                  Off |
| 33%   33C    P8    24W / 260W |     48MiB / 49152MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 8000     Off  | 00000000:2D:00.0 Off |                  Off |
| 33%   40C    P8    29W / 260W |   5933MiB / 49152MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2024      G   /usr/lib/xorg/Xorg                 23MiB |
|    0   N/A  N/A      2545      G   /usr/bin/gnome-shell               20MiB |
|    1   N/A  N/A      2024      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A   2263594      C   .../diffusers/bin/python3.11     5925MiB |
+-----------------------------------------------------------------------------+
```

In [None]:
# initialize models and define function for image generation
# use only downloaded models
random_int_range = 1, int(1e6)
def init_model(model, gpu_id=1, torch_dtype=None, local_files_only=True):
    print(f"Init model: {model}")
    if torch_dtype:
        pipe = StableDiffusionPipeline.from_pretrained(model, torch_dtype=torch_dtype,
                                                       local_files_only=local_files_only)
    else:
        pipe = StableDiffusionPipeline.from_pretrained(model, local_files_only=local_files_only)
        

    # can use nvidia-smi to check and set this so you're not running on the same one as panel serve
    # it just makes it a little more responsive
    if torch.cuda.is_available():
        pipe.to(f"cuda:{gpu_id}")
    return pipe     


if 'pipelines' in pn.state.cache:
    print(f"load from cache")
    pipelines = pn.state.cache['pipelines']
    pseudo_rand_gen = pn.state.cache['pseudo_rand_gen']
else:
    models = ['runwayml/stable-diffusion-v1-5', 
              'CompVis/stable-diffusion-v1-4'
             ]
    
    with exec_time("Load models"):
        pipelines = dict()
        for m in models:
            try: 
                pipelines[m] = init_model(m, torch_dtype=torch.float16)
            except OSError:
                pipelines[m] = init_model(m, torch_dtype=torch.float16, local_files_only=False)
            
        
    if torch.cuda.is_available():
        pseudo_rand_gen = torch.Generator(device='cuda')
    else:
        pseudo_rand_gen = torch.Generator()

    pn.state.cache['pipelines'] = pipelines
    pn.state.cache['pseudo_rand_gen'] = pseudo_rand_gen
    print(f"Save to cache")

default_model = next(iter(pipelines))
    
def generate_image(
    prompt,
    negative_prompt=None,
    model=default_model,
    height=512,
    width=512,
    guidance_scale=7.5,
    num_inference_steps=30,
    seed=None,
):
    pipe = pipelines[model]
    
    if not seed or seed < random_int_range[0]:
        seed = random.randint(*random_int_range)
    
    generator = pseudo_rand_gen.manual_seed(seed)
    res = pipe(prompt=prompt,
               negative_prompt=negative_prompt,
               guidance_scale=guidance_scale,
               height=height,
               width=width,
               num_inference_steps=num_inference_steps,
               generator=generator,
              )
    return res.images[0], seed

The various panel widgets in this code block affect the image generation. When rendered with a template, the sidebar should ideally start out collapsed with only the `Prompt` text box visible. A user writes a prompt, hits enter which triggers the callback to invoke the image generation function. Opening the sidebar provides more options. A user can set various options, then click `Generate` to create image with those options or hit enter on the prompt. If the prompt does not change, hitting enter will not generate a new image - use the `Generate` button to create new images with the same prompt. Below is a description of each option.


__Prompt__: Enter a text you wish to use for image generation. Some examples below:

  1. Wildflowers on a mountain side 
  1. A dream of a distant planet, with multiple moons
  1. valley of flowers in the Himalayas
  
__Negative Prompt__: Negative prompt is what the model will try to remove from the image. For instance, in example (1) above, you can add `yellow` to negative prompt to remove yellow flowers

__Pretrained Model__: These are the models, download from hugging face, used for inference.

__Height, Width__: Height and width in pixels of the images.

__Guidance Scale__: Also known as CFG (Classifier-free guidance scale). Typically use a value between 7 to 8.5. As you increase this value, the model will try to match the prompt at the expense of image quality or diversity of the image.

__# of steps__: The number of denoising steps taken by the model. As you increase the number of steps the image gets more refined; however, it takes longer to generate.

__Seed__: The random seed used when create the noise for the image. This is randomly generated and used for each image. It can be manually set by selecting this checkbox. To reproduce an image, select this option, then copy/paste the URL. Be sure to uncheck the box to revert back to randomly generating the images. 

In [None]:
#
# Create display widgets and bind them to generate_image function
#
# save the history in a rolling queue
history = deque([], 5)

# TODO: docorate function with @pn.cache https://panel.holoviz.org/user_guide/Performance_and_Debugging.html#caching-in-memory
def generate_image_with_options(event, prompt):
    if not prompt:
        return pn.Spacer(height=height.value)
    
    # also sync the URL bar with options so we can copy/paste to get the exact same image
    #
    # TODO: 
    #    Needed these in the callback to have them included in the URL (reproduce in simple example and log an issue)
    #    can we improve API so sync can take a list?
    #
    # unsync if user un-checks the seed button - use this as a proxy to make URL available to share
    if pn.state.location:
        for w in widgets_for_url_sync:
            pn.state.location.sync(*w)

    if not set_seed.value:
        seed.value = random.randint(*random_int_range)
    
    print("----------------------------")
    print(f"Seed provided: {seed.value}")
    with exec_time("Generate image"):
        image, image_seed = generate_image(prompt=prompt, negative_prompt=neg_prompt.value,
                                           model=model.value, height=height.value, width=width.value,
                                           guidance_scale=guidance_scale.value, num_inference_steps=num_inference_steps.value,
                                           seed=seed.value)

    print(f"Seed used: {image_seed}")
    print("------------------------")

    # resize the image and add it to the history
    # TODO: also need to add parameters used to create image. It should be a clickable so image can be reconstructed
    history.append(image.resize((100, 100)))
    if len(prompt_history) == len(history):
        prompt_history[0:] = [*history]
    else:
        # keep appendijng history
        prompt_history.append(history[-1])
        
    return image


## Widgets on main page
prompt = pn.widgets.TextInput(name='Prompt', value=None)

## Widgets in the sidebar
neg_prompt = pn.widgets.TextInput(name='Negative Prompt', value=None)

model = pn.widgets.Select(name='Pretrained Model', options=list(pipelines), value=default_model)

size_range = [448 + i*2**6 for i in range(10)]
height = pn.widgets.DiscreteSlider(name='Height', options=size_range, value=size_range[1])
width = pn.widgets.DiscreteSlider(name='Width', options=size_range, value=size_range[1])

# The CFG scale adjusts how much the image looks closer to the prompt and/ or input image. 
# If CFG Scale is greater, the output will be more in line with the input prompt and/or input image, but it will be distorted. 
# The lower the CFG Scale value, the more likely it is to drift away from the prompt or the input image, but the better quality.
#
guidance_scale = pn.widgets.FloatSlider(start=5, end=10, step=0.1, value=7.5, 
                                        format=PrintfTickFormatter(format='%.1f'),
                                        name='Guidance scale')
num_inference_steps = pn.widgets.IntSlider(name='# of steps', start=10, end=75, value=30)


# add this to URL when it is copied
set_seed = pn.widgets.Checkbox(name='Fix seed', width=140, height=50, visible=True, value=False)

# Don't even display it to the user - just use it to figure out whether to use the seed from URL or generate a new one
seed = pn.widgets.IntInput(name='', value=random.randint(*random_int_range), 
                           start=random_int_range[0], end=random_int_range[1], step=10, visible=False, width=140)

#updating_share_url = False
def make_seed_visible(enable):
    # make the seed value visible
    seed.visible = enable

pn.bind(make_seed_visible, set_seed, watch=True)

gen_button = pn.widgets.Button(name='Generate', button_type='primary')

model_output = pn.param.ParamFunction(pn.bind(generate_image_with_options, gen_button, prompt))

# TODO: Neet something clickable here so we can generate image from the history / pick it up from cache
prompt_history = pn.FlexBox()

sidebar_widgets = [
    neg_prompt,
    model,
    height,
    width,
    guidance_scale,
    num_inference_steps,
    pn.Row(set_seed, seed),
    gen_button,
]
if pn.state.location:
    pn.state.location.sync(prompt, {'value': 'prompt'})
    pn.state.location.sync(set_seed, {'value': 'set_seed'})

# widgets for URL sync
# TODO: see if we can use this list to perhaps cache images
widgets_for_url_sync =[
    #(prompt, {'value': 'prompt'}),
    (neg_prompt, {'value': 'neg_prompt'}),
    (model, {'value': 'model'}),
    (height, {'value': 'height'}),
    (width, {'value': 'width'}),
    (guidance_scale, {'value': 'cfg'}),
    (num_inference_steps, {'value': 'steps'}),
    (seed, {'value': 'seed'}),
    #(set_seed, {'value': 'set_seed'})
]

## logo / headers / 
logo  = """<a href="http://panel.pyviz.org">
           <img src="https://panel.pyviz.org/_static/logo_stacked.png" 
            width=150 height=127 align="left" margin=20px>"""

title = 'Stable Diffusion with Panel UI'

desc = pn.pane.HTML("""
    The <a href="http://panel.pyviz.org">Panel</a> library from <a href="https://holoviz.org/">HoloViz</a> 
    lets you make widget-controlled apps. Here you can use the
    <a href="https://huggingface.co/docs/diffusers/index">diffusers</a> library to
    generate images from pretrained diffusion models. Panel is used to create the UI for the pipeline.""", width=250)

## Customize image generation layout
tweak_image_gen = pn.Column(*sidebar_widgets)

## Final display
output = pn.Column(model_output, prompt_history, sizing_mode='stretch_width')
pn.Column(logo, title, desc,
    prompt, 
    pn.Row(tweak_image_gen, output),
    sizing_mode='stretch_width')

### Use a template

Use a template to get a clean look and feel.

TODO: Start out with the sidebar collapsed.

In [None]:
template = pn.template.MaterialTemplate(
    title=title,
)

template.sidebar.append(logo)
template.sidebar.append(desc.clone(width=300, margin=(20, 5)))
template.sidebar.append(tweak_image_gen)

template.main.append(pn.Column(prompt, output, sizing_mode='stretch_width'))

template.servable();