[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1untfZXgw5QVkyVxNt7AtxS5zyd6mz5wn)

Author:
- **Safouane El Ghazouali**,
- Ph.D. in AI,
- Senior data scientist and researcher at TOELT LLC,
- Lecturer at HSLU

# -----  -----  -----  -----  -----  -----  -----  -----

# Hands-on: Diffusion Models with Hugging Face Diffusers

Welcome to this comprehensive hands-on notebook on using diffusion models with the Hugging Face Diffusers library! We'll focus on small, efficient Stable Diffusion models for various use cases, including text-to-image generation, inpainting, and more. This tutorial teaches how to load models from Hugging Face and apply them practically.

![Diffusion Example](https://learnopencv.com/wp-content/uploads/2023/01/diffusion-models-unconditional_image_generation-1.gif)

### Why Use Diffusion Models?
- **Generative Power**: Create realistic images from text descriptions.
- **Versatility**: Supports generation, editing (inpainting), variation, and guided synthesis.
- **Efficiency with Small Models**: Use lightweight variants for faster inference on limited hardware.
- **Hugging Face Ecosystem**: Easy access to pre-trained models and pipelines.

### What You'll Learn
- Installing and setting up Diffusers.
- Loading small Stable Diffusion models from Hugging Face.
- Text-to-image generation.
- Inpainting (filling masked regions).
- Image-to-image variation.
- ControlNet for structure-guided generation (optional, if resources allow).
- Tips for prompt engineering and parameter tuning.

# 🧰 Environment Setup

Install Diffusers and dependencies. Use torch with CUDA if available for GPU acceleration.

In [None]:
!pip install -q diffusers transformers accelerate torch torchvision requests pillow matplotlib

### Import Libraries

In [None]:
import torch
from diffusers import StableDiffusionPipeline, StableDiffusionInpaintPipeline, StableDiffusionImg2ImgPipeline, ControlNetModel, StableDiffusionControlNetPipeline
import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np

device = 'cuda' if torch.cuda.is_available() else 'cpu'
torch_dtype = torch.float16 if device == 'cuda' else torch.float32
print(f'Using device: {device} with dtype: {torch_dtype}')

# 📚 Understanding Diffusion Models

Diffusion models generate images by iteratively denoising random noise guided by text prompts. Stable Diffusion is a popular latent diffusion model.

- **Small Models**: We'll use 'runwayml/stable-diffusion-v1-5' for balance; 'stabilityai/sd-turbo' for speed.
- **Pipelines**: Diffusers provides ready-to-use pipelines for different tasks.
- **Parameters**: num_inference_steps (quality vs. speed), guidance_scale (adherence to prompt).

Reference: [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/index)

# 📦 Loading a Small Stable Diffusion Model

Load the base pipeline for text-to-image.

In [None]:
model_id = 'runwayml/stable-diffusion-v1-5'
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch_dtype).to(device)
print('Stable Diffusion loaded!')

# 🖼️ Text-to-Image Generation

Generate images from text prompts.

In [None]:
prompt = 'A futuristic cityscape at sunset, cyberpunk style'
negative_prompt = 'blurry, low quality'

image_gen = pipe(
    prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=20,  # Small for speed
    guidance_scale=7.5
).images[0]

plt.imshow(image_gen)
plt.title('Generated Image')
plt.axis('off')
plt.show()

# Explanation
# - prompt: Describes desired image.
# - negative_prompt: Avoids unwanted elements.
# - guidance_scale: Higher values follow prompt closely.

# 🎨 Inpainting: Editing Images

Fill masked regions based on prompts. First, create a mask.

In [None]:
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
# Load inpaint pipeline
inpaint_pipe = StableDiffusionInpaintPipeline.from_pretrained(
    model_id,
    torch_dtype=torch_dtype
).to(device)
# Sample image and mask
init_image = Image.open(BytesIO(requests.get(img_url).content)).convert('RGB')
mask_image = Image.open(BytesIO(requests.get(mask_url).content)).convert('RGB')
# Display init and mask
fig, axs = plt.subplots(1, 2)
axs[0].imshow(init_image)
axs[0].set_title('Initial Image')
axs[1].imshow(mask_image)
axs[1].set_title('Mask')
plt.show()
# Inpaint
prompt_inpaint = 'An orange cat'
image_inpaint = inpaint_pipe(
    prompt=prompt_inpaint,
    image=init_image,
    mask_image=mask_image,
    num_inference_steps=50,
    guidance_scale=9.5
).images[0]
plt.imshow(image_inpaint)
plt.title('Inpainted Image')
plt.axis('off')
plt.show()

# 🔄 Image-to-Image Variation

Generate variations of an input image guided by a prompt.

In [None]:
# Load img2img pipeline
img2img_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    model_id,
    torch_dtype=torch_dtype
).to(device)

# Use init_image from above
prompt_i2i = 'A cyberpunk version of the girl with a pearl earring'
image_i2i = img2img_pipe(
    prompt=prompt_i2i,
    image=init_image.resize((512, 512)),
    strength=0.75,  # How much to change
    num_inference_steps=20,
    guidance_scale=7.5
).images[0]

plt.imshow(image_i2i)
plt.title('Image-to-Image Variation')
plt.axis('off')
plt.show()

# 🕹️ ControlNet: Structure-Guided Generation

Use ControlNet for edge-guided generation (requires additional model).

In [None]:
import cv2

# Load ControlNet
controlnet = ControlNetModel.from_pretrained('lllyasviel/sd-controlnet-canny', torch_dtype=torch_dtype)
control_pipe = StableDiffusionControlNetPipeline.from_pretrained(
    model_id,
    controlnet=controlnet,
    torch_dtype=torch_dtype
).to(device)

# Generate Canny edges from init_image
init_np = np.array(init_image.resize((512, 512)))
canny_image = cv2.Canny(init_np, 100, 200)
canny_image = Image.fromarray(canny_image)

# Display Canny
plt.imshow(canny_image, cmap='gray')
plt.title('Canny Edges')
plt.axis('off')
plt.show()

# Generate
prompt_control = 'A colorful abstract painting of a woman'
image_control = control_pipe(
    prompt_control,
    image=canny_image,
    num_inference_steps=20,
    guidance_scale=7.5
).images[0]

plt.imshow(image_control)
plt.title('ControlNet Generation')
plt.axis('off')
plt.show()

# 🔄 Using Other Small Models (e.g., SD-Turbo)

Load SD-Turbo for faster generation.

In [None]:
turbo_id = 'stabilityai/sd-turbo'
turbo_pipe = StableDiffusionPipeline.from_pretrained(turbo_id, torch_dtype=torch_dtype).to(device)

prompt_turbo = 'A serene mountain landscape'
image_turbo = turbo_pipe(
    prompt_turbo,
    num_inference_steps=1,  # Turbo is fast!
    guidance_scale=0.0  # Classifier-free
).images[0]

plt.imshow(image_turbo)
plt.title('SD-Turbo Generation')
plt.axis('off')
plt.show()