# Stable Diffusion Inference using HuggingFace

This notebook aims to show you how to run a Stable Diffusion model using the `diffusers` library from HuggingFace.

More information in: https://huggingface.co/docs/diffusers/en/quicktour

### Check that the GPU is working

In [None]:
! nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

## Install Diffusers library

Here is the command to install diffusers, transformers, torch and accelerate libs:

In [None]:
! pip install --upgrade diffusers transformers torch accelerate

## Import libraries

In [None]:
import os
import torch
from PIL import Image

import warnings
warnings.filterwarnings('ignore')

## Create an output folder

Check if output directory exists. If folder doesn't exist, then create it:

In [None]:
OUTPUT_DIR = 'lab_1_generated_outputs/'

if not os.path.isdir(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)

## Select model and parameters

Stable Diffusion models in HuggingFace:
- https://huggingface.co/CompVis/stable-diffusion-v1-4
- https://huggingface.co/runwayml/stable-diffusion-v1-5

In [None]:
# Model parameters ----------------------------------------
MODEL_ID = "runwayml/stable-diffusion-v1-5"
# MODEL_ID = "CompVis/stable-diffusion-v1-4"

# GPU parameters  ----------------------------------------
DEVICE = "cuda"  # Use cuda to run on GPU

# Scheduler parameters ----------------------------------------
SCHEDULER = "EULER_ANCESTRAL"  # Choose from ["EULER_ANCESTRAL", "EULER", "DDIMS", "K-LMS", "PNDM"]
BETA_END = 0.012
BETA_SCHEDULE = "scaled_linear"
BETA_START = 0.00085
NUM_TRAIN_STEPS = 1000

## Pipeline creation with a specific scheduler

If you have small GPU (less than 10GB) then you must use `float16` precision instead of `float32`.

More info about schedulers: https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers


In [None]:
from diffusers import (
    StableDiffusionPipeline,
    DDIMScheduler,
    LMSDiscreteScheduler,
    PNDMScheduler,
    EulerDiscreteScheduler,
    EulerAncestralDiscreteScheduler,
)

def create_pipeline(model_path):

    SCHEDULER_MAP = {
        "DDIMS": DDIMScheduler,
        "EULER_ANCESTRAL": EulerAncestralDiscreteScheduler,
        "EULER": EulerDiscreteScheduler,
        "K-LMS": LMSDiscreteScheduler,
        "PNDM": PNDMScheduler,
    }

    scheduler = SCHEDULER_MAP[SCHEDULER](
        beta_start=BETA_START,
        beta_end=BETA_END,
        beta_schedule=BETA_SCHEDULE,
        # num_train_timesteps=NUM_TRAIN_STEPS,
    )

    pipe = StableDiffusionPipeline.from_pretrained(
        model_path,
        scheduler=scheduler,
        torch_dtype=torch.float16,
        revision="fp16",
        # safety_checker=None,
    ).to(DEVICE)

    return pipe

The `safety_checker` parameter is to filter out unsafe content from generated images. If enabled it returns a completely black image when the generated image violates certain rules, the "NSFW" (not safe for work) concept embeddings generated from CLIP

### Create a new pipeline. This can take a few minutes... be patient :)

In [None]:
pipe = create_pipeline(model_path=MODEL_ID)

## Let's play!! Generate one image

In [None]:
# Write the prompt or instructions for generating the image
prompt = "a photo of an astronaut riding a horse on mars"

# Generate the image
image = pipe(prompt, guidance_scale=7.5, num_inference_steps=50, height=512, width=512).images[0]

# Save the image
image.save(f"{OUTPUT_DIR}/astronaut_rides_horse.png")

# Show the image
image

## Using SEED
You will have seen that if you run the previous cell it generates a completely new image and overwrites the one you had already saved! :(
To avoid this, it is best to set a seed.

In [None]:
INFERENCE_SEED = 1122334455

# Set seed
custom_generator = torch.Generator(device='cuda').manual_seed(INFERENCE_SEED)

# And then add "generator=custom_generator" as pipe() inference parameter
image = pipe(prompt, generator=custom_generator, guidance_scale=7.5, num_inference_steps=20, height=512, width=512).images[0]
image

Now we see that it always generates the same image even if you repeat the execution.

## Generator method

In [None]:
def generate_image(pipe, prompt, steps=80, h=512, w=512, guidance_scale=7, strength=0.75, seed=custom_generator, save_image=True): 
    print(prompt)
    image = pipe(prompt, num_inference_steps=steps, height=h, width=w, guidance_scale=guidance_scale, strength=strength, generator=seed)["images"][0]
    display(image)
    if save_image == True:
        outfilename = f'{OUTPUT_DIR}/{INFERENCE_SEED}_1_' + prompt.replace(' ', '_') + '.png'
        image.save(outfilename)


## Generate image

In [None]:
prompt = "a photo of an astronaut riding a horse on mars"
generate_image(pipe, prompt)

### Image Size

In my machine there is a big GPU that can generate images larger than 2048x2048... but a memory error occurs if you try to run it in smaller GPUs.

`Experiment`: try to find the limit of your GPU. But... as you can see, the model does not generate the images properly if you ask it to generate at a larger size than the one it is trained to generate.

In [None]:
prompt = "a photo of an astronaut riding a horse on mars"
generate_image(pipe, prompt, h=128, w=128)

### Prompt Enginering

In [None]:
prompt = "a female warrior"
generate_image(pipe, prompt)

In [None]:
prompt = "a portrait of a female warrior, by greg rutkowski, highly detailed, HQ, symmetrical, trending on artstation, digital painting, artstation, concept art, smooth, sharp focus, illustration, cinematic lighting"
generate_image(pipe, prompt)

### Negative Prompt

`Experiment (optional)` : Search for information on `negative prompts` and how to use it with HuggingFace diffusers library.

### Guidance Scale

In [None]:
prompt = "overgrown foliage taking over an abandoned robot body, close - up, biopunk, bokeh, beautiful, lens flare, emotional, sweet, flowers, detailed, picture, trending on artstation, award - winning, shiny, golden"
generate_image(pipe, prompt, steps=200, guidance_scale=1, strength=0.9, h=512, w=768)

FYI: `"guidance_scale"` is a parameters related to how close the image should be to the prompt. However, high values may not work correctly, depending on each model.

In [None]:
prompt = "overgrown foliage taking over an abandoned robot body, close - up, biopunk, bokeh, beautiful, lens flare, emotional, sweet, flowers, detailed, picture, trending on artstation, award - winning, shiny, golden"
generate_image(pipe, prompt, steps=200, guidance_scale=20, strength=0.9, h=512, w=768)

In [None]:
prompt = "a portrait of Elon Musk as superman, realistic portrait, symmetrical, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, cinematic lighting, art by artgerm and greg rutkowski"
generate_image(pipe, prompt, steps=100, guidance_scale=13, h=512, w=768)

---

## Generate more than one image

In [None]:
def generate_images(pipe, prompt, negative_prompt="", num_images=5, steps=50, h=512, w=512, guidance_scale=7.5, strength=0.75, seed=custom_generator, save_image=True):

    print(prompt)   
    
    images = pipe(
        prompt,
        height=h,
        width=w,
        negative_prompt=negative_prompt,
        num_images_per_prompt=num_images,
        num_inference_steps=steps,
        guidance_scale=guidance_scale,
        strength=strength,
        generator=seed
    ).images

    for i, image in enumerate(images):
        display(image)

        if save_image==True:
            outfilename = 'generated_outputs/' + f'{INFERENCE_SEED}_{i}_' + prompt.replace(' ', '_') + '.png'
            image.save(outfilename)

In [None]:
prompt = "overgrown foliage taking over an abandoned robot body, close - up, biopunk, bokeh, beautiful, lens flare, emotional, sweet, flowers, detailed, picture, trending on artstation, award - winning, shiny, golden"
generate_images(pipe, prompt)

---

### Generate and save all images together as a grid

In [None]:
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols
    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

def generate_images_grid(pipe, prompt, negative_prompt="", num_images=5, steps=50, h=512, w=512, guidance_scale=7.5, strength=0.75, seed=custom_generator, save_image=True):
    
    print(prompt)
    
    images = pipe(
        prompt,
        height=h,
        width=w,
        negative_prompt=negative_prompt,
        num_images_per_prompt=num_images,
        num_inference_steps=steps,
        guidance_scale=guidance_scale,
        strength=strength,
        generator=seed
    ).images

    grid = image_grid(images, rows=len(images), cols=1)

    if save_image==True:
        outfilename = 'generated_outputs/' + f'{INFERENCE_SEED}_grid_' + prompt.replace(' ', '_') + '.png'
        grid.save(outfilename)    
        
    display(grid)   


In [None]:
prompt = "overgrown foliage taking over an abandoned robot body, close - up, biopunk, bokeh, beautiful, lens flare, emotional, sweet, flowers, detailed, picture, trending on artstation, award - winning, shiny, golden"
generate_images_grid(pipe, prompt, steps=20)

---

# Image-to-Image text guided generation
https://huggingface.co/docs/diffusers/en/using-diffusers/img2img

In [None]:
import requests
from PIL import Image
from io import BytesIO

# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))

prompt = "A fantasy landscape"

image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5, num_inference_steps=100, height=512, width=768)["images"][0]

# Print and show results:
print("")
print("This is the original image:")
display(init_image)
print("")
print(f'The textual prompt is: ""{prompt}""')
print("")
print("And this is the generated image:")
display(image)

---

## Text-to-Video Generation with AnimateDiff

Did you know that you can also generate video or GIFs with Stable Diffusion?

`Experiment (optional)`: Find out how to generate your own GIFs with the diffusers library and AnimateDiff
https://huggingface.co/docs/diffusers/en/api/pipelines/animatediff#text-to-video-generation-with-animatediff

---