<a href="https://colab.research.google.com/github/justinramseywork-jpg/diabetes_model005/blob/main/Copy_of_IOD_Lab_10_1_AI_Image_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 10.1 - AI Image Generation using GPUs and Diffusers

## Introduction

*Note*: This notebook is to be run in a Kaggle environment with a GPU:

1. Go to https://www.kaggle.com/code after creating a free account at kaggle.com
2. Go to File -> Import Notebook within the website and drag-and-drop this notebook file to upload it.
3. Go to Settings -> Turn on Internet if it is not already enabled

Later we will enable the GPU.

The purpose of this lab is to learn how to develop generative image models using the diffusers library and GPU acceleration, enabling creative and controllable visual synthesis from text and images. The diffusers library by Hugging Face runs and trains diffusion models (https://huggingface.co/docs/diffusers/en/index).

Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio.

## Initial Exploration

Experiment with image prompting at sites such as https://deepai.org/, chatgpt/, http://gemini.google.com, https://x.com/i/grok

Examples of prompts can be found at https://lexica.art/, krea.ai and https://www.midjourney.com/explore

In this notebook images will be generated with different models, together with image-to-image transformations and interpolations.

## Setup

To enable the GPU go to Settings -> Accelerator -> GPU T4 x2. You have a limit of 30 hours of GPU use per month, so remember to turn off the GPU (Settings -> Accelerator -> None) when not needed.


In [None]:
# After running this cell you should see
# '/device:GPU:0' (it's ok if additional messages appear)

import tensorflow as tf
tf.test.gpu_device_name()

The following cell installs the diffusers library (supported by the PyTorch 'torch' deep learning framework) and the transformers library. The transformers library (also by Hugging Face) makes it easy to use modern AI models without having to build them from scratch.

In [None]:
!pip install diffusers["torch"] transformers

The `DiffusionPipeline` class allows one to generate images with a simple interface - all the model components are automatically loaded.

In [None]:
import torch
from diffusers import DiffusionPipeline

In [None]:
# Releases cached unused memory for reuse by PyTorch
torch.cuda.empty_cache()

## Unprompted image generation

To begin with, we generate images from a pretrained model that was trained on butterfly images. Unprompted images can be used to evaluate image quality, diversity, and realism without prompt bias. It helps visualise the range of possibilities that the model has learned.

In [None]:
# If the kernel dies, simply rerun the previous two code cells and this cell
# The .to("cuda") method in PyTorch moves a model or tensor from the CPU to the GPU.
generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128").to("cuda")

image = generator().images[0]
image

Rerun the code below to see a new image is generated from a different random sample of noise

In [None]:
image = generator().images[0]
image

### Changing the number of inference steps

Set num_inference_steps to 1, 2, 3 to see the image at early stages of generation.

In [None]:
# 1 step
image = generator(num_inference_steps=1).images[0]
image

In [None]:
# 2 steps
# ANSWER:


In [None]:
# 3 steps
# ANSWER:


**Exercise**: Use the larger "google/ddpm-celebahq-256" model instead of "anton-l/ddpm-butterflies-128" with 100 inference steps to generate a realistic face.

In [None]:
# ANSWER


## Prompted image generation

Next we observe the impact of a detailed prompt using the popular Stable Diffusion model.

In [None]:
model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"

pipeline = DiffusionPipeline.from_pretrained(model_id, use_safetensors=True)

In [None]:
prompt = "a panda playing guitar in a bamboo forest, cartoon style, soft lighting, in a playful mood"
# feel free to change this

In [None]:
pipeline = pipeline.to("cuda")

In [None]:
generator = torch.Generator("cuda").manual_seed(0)

In [None]:
%%time
image = pipeline(prompt, generator=generator).images[0]
image #note that 50 is now the default number of inference steps

We can make this run faster on GPUs by setting torch_dtype to torch.float16 (the default is 32-bit precision).

In [None]:
%time
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, use_safetensors=True)
pipeline = pipeline.to("cuda")
generator = torch.Generator("cuda").manual_seed(0)

image = pipeline(prompt, generator=generator).images[0]
image

The following code enables a grid of sample images to be generated.

In [None]:
def get_inputs(batch_size=1):
    generator = [torch.Generator("cuda").manual_seed(i) for i in range(batch_size)]
    prompts = batch_size * [prompt]
    num_inference_steps = 20

    return {"prompt": prompts, "generator": generator, "num_inference_steps": num_inference_steps}

In [None]:
from diffusers.utils import make_image_grid

In [None]:
pipeline.enable_attention_slicing() #breaks large attention computations into smaller chunks

In [None]:
%%time
images = pipeline(**get_inputs(batch_size=8)).images
make_image_grid(images, rows=2, cols=4)

We now try a more advanced model:

In [None]:
pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16)
pipe.to("cuda");
# it may take a couple of minutes for everything (~15GB) to be downloaded to Kaggle

In [None]:
#prompt = "a cartoon of a 1980s car on a desert highway, midday sun"

image = pipe(prompt).images[0]

image

**Exercise**: Using the same pipe create two images - one with a less sophisticated and one with a more detailed prompt.

In [None]:
# ANSWER
easy_prompt =


In [None]:
# ANSWER
detailed_prompt =


## Image to image

In this section we take an existing image and edit it using prompting.

In [None]:
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image

In [None]:
pipeline = AutoPipelineForImage2Image.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True
)
pipeline.enable_model_cpu_offload()
# enables offloading of model weights from the GPU to the CPU when the GPU memory is insufficient
# for the entire model.

# ~13GB is downloaded here



In [None]:

init_image = load_image('https://plus.unsplash.com/premium_photo-1674917000586-b7564f21540e').resize((512, 512))


In [None]:
prompt = "make this look like a painting"
image = pipeline(prompt, image=init_image, strength = 0.05, guidance_scale=1.0).images[0]
# higher guidance scale - closer to prompt (default = 7.5)
# higher strength - more creativity (default = 0.8)
make_image_grid([init_image, image], rows=1, cols=2)

**Exercise**: investigate what happens when the guidance scale and strength are changed.

## Image interpolation

Finally we see how we can interpolate images based on two distinct prompts. Feel free to change the prompts below.

In [None]:
import numpy as np
from PIL import Image

# Load Stable Diffusion pipeline
pipe = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
).to("cuda")

pipe.enable_attention_slicing()

# Two text prompts
prompt1 = "a futuristic city at sunset"
prompt2 = "a medieval village in winter"

# Get text embeddings
text_input1 = pipe.tokenizer(prompt1, return_tensors="pt").input_ids.to("cuda")
text_input2 = pipe.tokenizer(prompt2, return_tensors="pt").input_ids.to("cuda")

embeds1 = pipe.text_encoder(text_input1)[0]
embeds2 = pipe.text_encoder(text_input2)[0]

# Interpolate between embeddings
steps = 10
interpolated_images = []

for alpha in np.linspace(0, 1, steps):
    blended_embed = (1 - alpha) * embeds1 + alpha * embeds2
    image = pipe(prompt_embeds=blended_embed).images[0]
    interpolated_images.append(image)

Observe the similarity of the prompts via cosine similarity:

In [None]:
from torch.nn.functional import cosine_similarity
similarity = cosine_similarity(embeds1, embeds2).mean()
print(f"Cosine similarity: {similarity.item():.4f}")

In [None]:
import matplotlib.pyplot as plt

# Display results
fig, axs = plt.subplots(1, steps, figsize=(20, 4))
for i, img in enumerate(interpolated_images):
    axs[i].imshow(img)
    axs[i].axis("off")
plt.tight_layout()
plt.show()

We can save this as an image or video file for download:

In [None]:
import imageio
gif_path = "interpolated_images.gif"
imageio.mimsave(gif_path, interpolated_images, fps=4)
#download the image after finding the file under the /kaggle/working folder in the Output section in the sidebar.


In [None]:
imageio.mimsave("video.mp4", interpolated_images, fps=4)

## BONUS

Experiment with other models at https://huggingface.co/models using the image-related filters.

## Summary

We have learnt how to enable GPUs in Kaggle and use the diffusers library for image generation of diffusion models with different models and settings. We used unprompted and prompted image generation, then looked at the tasks of image-to-image changes via prompting as well as image interpolation.

Turn off GPU usage by going to Settings -> Accelerator -> None in Kaggle. Under "Session options" on the right side panel you will be able to see how much of your 30-hour quota has been used up.

Save and download this file for submission.

## References

https://huggingface.co/docs/diffusers/en/index

https://huggingface.co/docs/diffusers/main/using-diffusers/write_own_pipeline