# Easel AI: Founding AI Engineer | Thumbs Up Challenge

Author: Sukruth Gowdru Lingaraju.  
Email: [sg2257@cornell.edu](mailto:sg2257@cornell.edu).  
September 5th, 2023

This notebook performs fine-tuning on the baseline [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) stable diffusion model based on a custom dataset - [glsukki/thumbs_up_v2](https://huggingface.co/datasets/glsukki/thumbs_up_v2) - to generate images for the prompt style `<person><trigger_word>`.  

NOTE: To successfully run and train the model on the custom dataset, connect to a GPU having a runtime memory ` >= 30 GB`

## Install dependencies and library requirements

In [None]:
!cd /content/
!git clone https://github.com/huggingface/diffusers.git
!pip install ./diffusers
!pip install -U -r /content/diffusers/examples/text_to_image/requirements.txt
!pip install tensorboard==2.12.3
!pip install wandb
!pip install -U torchmetrics

import os
import torch
import random
import numpy as np
import matplotlib.pyplot as plt
from functools import partial
from diffusers import StableDiffusionPipeline
from torchmetrics.functional.multimodal import clip_score

seed = random.randint(1, 999999)
generator = torch.manual_seed(seed)

In [None]:
# GPU Compute check
!nvidia-smi

# Set Environment variables

MODEL_NAME: refers to the baseline model being imported for fine-tuning on the custom dataset.  
DATASET_NAME: refers to the custom dataset used for fine-tuning.  
OUTPUT_DIR: refers to the name of the fine-tuned model pushed to HuggingFace.

In [None]:
os.environ['MODEL_NAME'] = f'runwayml/stable-diffusion-v1-5'
os.environ['DATASET_NAME'] = f'glsukki/thumbs_up_test_2'
os.environ['OUTPUT_DIR'] = f'stable-diffusion-thumbs-up-8000_1e-05'

In [None]:
from huggingface_hub import notebook_login
notebook_login()

### Thumbs_Up Stable Diffusion Model

#### Dataset

Custom dataset consisting of ```glsukki/thumbs_up```'s dataset on HuggingFace which consists of
*   ```thumbs up``` images from [LINK](https://www.dropbox.com/s/xtspymftfmjofo6/thumbs-up.zip?dl=0)
  *   ```# of images = 121```
  *   ```type: JPEG```
  *   ```shape: 512x512```
*   ```person: sukruth``` custom images
  *   ```# of images = 336```
  *   ```type: JPEG```
  *   ```shape: 512x512```

We fine-tune the ```runwayml/stable-diffusion-v1-5``` model by feeding the ```glsukki/thumbs_up_v2``` dataset as the input for training on-top of the base-model.

In [None]:
!accelerate config default --mixed_precision fp16

In [None]:
!accelerate launch diffusers/examples/text_to_image/train_text_to_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --use_ema \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --mixed_precision="fp16" \
  --max_train_steps=8000 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --validation_prompt='sukruth thumbs up' \
  --push_to_hub \
  --checkpointing_steps=100000 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir=$OUTPUT_DIR

### Load the fine-tuned trained model and generate the images

In [None]:
# thumbs_up_model_id = './stable-diffusion-thumbs-up-8000_1e-05' # for faster model load time if the model is present locally
thumbs_up_model_id = 'glsukki/stable-diffusion-thumbs-up-8000_1e-05'
thumbs_up_pipe = StableDiffusionPipeline.from_pretrained(thumbs_up_model_id, torch_dtype=torch.float16)
thumbs_up_pipe = thumbs_up_pipe.to("cuda")

### Model Evaluation

Evaluation Metric: ```CLIP Score```

Used to measure the compatibility/similarity between the SD generated image to its corresponding caption/prompt pair.

In [None]:
clip_score_fn = partial(clip_score, model_name_or_path="openai/clip-vit-base-patch16")

## Calculate the similarity score and return the value
## Higher the score, closer similarity of the image to the prompt given
def calculate_clip_score(images, prompts):
    images_int = (images * 255).astype("uint8")
    if len(prompts) == 1:
        images_int = np.expand_dims(images_int, axis=0)  # Add a batch dimension
    clip_score = clip_score_fn(torch.from_numpy(images_int).permute(0, 3, 1, 2), prompts).detach()
    return round(float(clip_score), 4)

### Image Generation

Define the prompt(s) on which the image has to be based on and generated by the fine-tuned model

In [None]:
# A list of 25 prompts containing <person_name> followed by <trigger_word>
# Here, the keyword's values are as follows:
# - <person_name> : 'sukruth'
# - <trigger_word> : 'thumbs up'

diff_prompts = [
    "Sukruth giving a thumbs up at a sunny beach.",
    "Sukruth enthusiastically showing a thumbs up in a bustling city.",
    "Sukruth flashing a thumbs up sign while hiking in the mountains.",
    "Sukruth smiling and giving a thumbs up in a cozy coffee shop.",
    "Sukruth offering a thumbs up gesture at a lively music festival.",
    "Sukruth expressing approval with a thumbs up on a snowy mountain peak.",
    "Sukruth presenting a confident thumbs up in a modern office.",
    "Sukruth happily giving a thumbs up while cooking in the kitchen.",
    "Sukruth giving a thumbs up during an intense workout at the gym.",
    "Sukruth showing a thumbs up while exploring a lush forest.",
    "Sukruth delivering a thumbs up during a thrilling roller coaster ride.",
    "Sukruth flashing a thumbs up on a serene lakeside dock.",
    "Sukruth enthusiastically giving a thumbs up on a skateboard.",
    "Sukruth expressing approval with a thumbs up at an art gallery.",
    "Sukruth smiling and showing a thumbs up at a rooftop party.",
    "Sukruth offering a thumbs up sign while on a hot air balloon ride.",
    "Sukruth confidently giving a thumbs up at a technology conference.",
    "Sukruth giving a thumbs up during a fun-filled family picnic.",
    "Sukruth flashing a thumbs up while exploring an ancient temple.",
    "Sukruth happily showing a thumbs up on a colorful carnival ride.",
    "Sukruth expressing approval with a thumbs up at a bustling market.",
    "Sukruth smiling and giving a thumbs up on a scenic road trip.",
    "Sukruth offering a thumbs up sign during a karaoke night.",
    "Sukruth confidently giving a thumbs up at a sports stadium.",
    "Sukruth giving a thumbs up while enjoying a picnic in the park."
]

same_prompt = ['sukruth thumbs up']

In [None]:
import os

def generateImages(prompts, model_version):
  # # Display the images generated by the prompts in an Image Grid
  num_rows = 5
  num_cols = 5
  sd_image, axs = plt.subplots(num_rows, num_cols, figsize=(10, 10))
  sd_image.tight_layout()

  image_directory = f'./sd_{model_version}'

  if not os.path.exists(image_directory):
    os.mkdir(image_directory)

  if len(prompts) == 1:
    for prompt in range(25):
      row_idx = prompt // num_cols
      col_idx = prompt % num_cols

      generator_1 = torch.manual_seed(random.randint(1, 99999))
      image = thumbs_up_pipe(prompts[0], generator = generator_1).images[0]
      image.save(f'./{image_directory}/prompt_{prompt + 1}.png')

      # # Calculate the CLIP score for the same image
      np_image = thumbs_up_pipe(prompts[0], generator = generator_1, output_type="numpy").images[0]
      sd_clip_score = calculate_clip_score(np_image, [prompts[0]])
      # print(f"CLIP score: {sd_clip_score}")

      ax = axs[row_idx, col_idx]
      ax.imshow(image)
      ax.set_title(f'Prompt {prompt + 1}\'s Image \nCLIP Score = {sd_clip_score}')
      ax.axis('off')
  else:
    # # Iterate over all the prompts, and generate the image respectively
    # # Once the image is generated, save it to file locally and add it as a subplot to the Prompt Image's grid
    for prompt in range(len(prompts)):

      row_idx = prompt // num_cols
      col_idx = prompt % num_cols

      image = thumbs_up_pipe(prompts[prompt], generator = generator).images[0]
      image.save(f'./{image_directory}/prompt_{prompt + 1}.png')

      # # Calculate the CLIP score for the same image
      np_image = thumbs_up_pipe(prompts[prompt], generator = generator, output_type="numpy").images[0]
      sd_clip_score = calculate_clip_score(np_image, [prompts[prompt]])
      # print(f"CLIP score: {sd_clip_score}")

      ax = axs[row_idx, col_idx]
      ax.imshow(image)
      ax.set_title(f'Prompt {prompt + 1}\'s Image \nCLIP Score = {sd_clip_score}')
      ax.axis('off')

  plt.show()
  sd_image.savefig(f'./{image_directory}/{model_version}_prompts.png')

In [None]:
# for different prompts
diff_model_name = '8000_121_336_diff'
generateImages(diff_prompts, diff_model_name)

# for same prompts
same_model_name = '8000_121_336_same'
generateImages(same_prompt, same_model_name)

### Baseline Model Inference : ```runwayml/stable-diffusion-v1-5```

In [None]:
from diffusers import StableDiffusionPipeline
import torch
import random

generator = torch.manual_seed(random.randint(1, 9999))

base_model_id = "runwayml/stable-diffusion-v1-5"
base_pipe = StableDiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16)
base_pipe = base_pipe.to("cuda")

base_prompt = ["sukruth thumbs up"]
PIL_image = base_pipe(base_prompt, generator = generator).images[0]
image = base_pipe(base_prompt, generator = generator, output_type='numpy').images[0]

PIL_image.save("./base_model_prompt_1.png")

## Check the similarity score for the image generated by the baseline model to the corresponding prompt
sd_clip_score = calculate_clip_score(image, base_prompt)
print(f"CLIP score: {sd_clip_score}")