# Fine-tuning Stable Diffusion XL on AWS for Generative AI-powered Product Concept Design

Fine-tuning the Latest Stable Diffusion XL 1.0 Foundation Model on AWS with DreamBooth and Hugging Face’s AutoTrain

Blog post: https://garystafford.medium.com/fine-tuning-stable-diffusion-xl-on-aws-for-generative-ai-powered-product-concept-design-dae6f4c8c8fa

### References

Stable Diffusion Models:
- <https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0>
- <https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0>
- <https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/latent_upscale>
- <https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler>

Code Source:

- <https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_Dreambooth.ipynb>

Code Reference:

- <https://huggingface.co/docs/autotrain/index>
- <https://www.youtube.com/watch?v=gF078Lhnr94>
- <https://huggingface.co/blog/stable_diffusion>
- <https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img>
-<https://huggingface.co/blog/stable_diffusion>

### SageMaker Studio Notebook Environment


Fine-tuning model environment (`ml.g4dn.4xlarge`):

<img src="./screengrabs/kernel_sdxl_finetune.png" width="480" border=1>

Inference environment (`ml.g5.4xlarge`):

<img src="./screengrabs/kernel_sdxl_inference.png" width="480" border=1>


## Install Packages and Set Configuration

In [None]:
%%sh

export PIP_ROOT_USER_ACTION=ignore

pip install -Uq pip
pip install -Uq autotrain-advanced

pip install -q ipywidgets==7.8.1
pip list | grep 'ipywidgets'

In [None]:
# optional: restart kernel to update packages first time
import os

os._exit(00)

In [None]:
import torch

# should be 2.0.0
print(torch.__version__)

In [None]:
from tqdm.notebook import tqdm

## SDXL Fine-tuning and Inference Configuration

In [None]:
import os

# project configuration
project_name = "mb_amg_gt_oue_dreambooth"
model_name_base = "stabilityai/stable-diffusion-xl-base-1.0"
model_name_refiner = "stabilityai/stable-diffusion-xl-refiner-1.0"
model_name_upscaler_4x = "stabilityai/stable-diffusion-x4-upscaler"
model_name_latent_upscaler = "stabilityai/sd-x2-latent-upscaler"

# fine-tuning prompts
# 'oue' is a rare tokens, car is a class
instance_prompt = "photo of oue car"
class_prompt = "photo of a car"
image_path = "./images/car/"

# fine-tuning hyperparameters
learning_rate = 1e-4
num_steps = 500
batch_size = 1
gradient_accumulation = 4
resolution = 1024

# set env. vars for autotrain
os.environ["PROJECT_NAME"] = project_name
os.environ["MODEL_NAME"] = model_name_base
os.environ["INSTANCE_PROMPT"] = instance_prompt
os.environ["CLASS_PROMPT"] = class_prompt
os.environ["IMAGE_PATH"] = image_path
os.environ["LEARNING_RATE"] = str(learning_rate)
os.environ["NUM_STEPS"] = str(num_steps)
os.environ["BATCH_SIZE"] = str(batch_size)
os.environ["GRADIENT_ACCUMULATION"] = str(gradient_accumulation)
os.environ["RESOLUTION"] = str(resolution)

## Quick Test of Base Model without Fine-tuning

In [None]:
from diffusers import DiffusionPipeline, StableDiffusionXLImg2ImgPipeline
import torch

device = "cpu" # cpu or cuda

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    model_name_base,
    torch_dtype=torch.float16,
)
pipeline.to(device)

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    model_name_refiner,
    torch_dtype=torch.float16,
)
refiner.to(device)

subject_prompt = subject_prompt = """photo of car, sporty, fast, sleek, sexy, 
aggressive, high performance, daytime, futuristic cityscape"""

subject_negative_prompt = """person, people, human, rider, floating objects, text, 
words, writing, letters, phrases, trademark, watermark, icon, logo, banner, signature, 
username, monochrome, cropped, cut-off"""

refiner_prompt = "ultra-high-definition, photorealistic, 8k uhd, high-quality, ultra sharp detail"

refiner_negative_prompt = """low quality, low-resolution, out of focus, blurry, 
grainy, artifacts, defects, jpeg artifacts, noise"""

for seed in range(0):
    generator = torch.Generator(device).manual_seed(seed)
    base_image = pipeline(
        prompt=f"{subject_prompt}, {refiner_prompt}", 
        negative_prompt=f"{subject_negative_prompt}, {refiner_negative_prompt}",
        num_inference_steps=100,
        generator=generator,
        height=1024,
        width=1024,
        output_type="latent",
    ).images[0]
        
    refined_image = refiner(
        prompt=refiner_prompt, 
        negative_prompt=refiner_negative_prompt,
        num_inference_steps=20,
        generator=generator, 
        image=base_image,
    ).images[0]

    refined_image.save(f"./generated_images/car_base_model_photo_square_{seed}.png")

## Fine-tuning the SDXL 1.0 Base Model using DreamBooth

In [None]:
# autotrain references:
# https://huggingface.co/spaces/lora-library/LoRA-DreamBooth-Training-UI/resolve/main/train_dreambooth_lora.py
# https://github.com/huggingface/autotrain-advanced/blob/main/src/autotrain/trainers/dreambooth/params.py

!autotrain dreambooth --help

In [None]:
!autotrain dreambooth \
    --model ${MODEL_NAME} \
    --project-name ${PROJECT_NAME} \
    --image-path "${IMAGE_PATH}" \
    --prompt "${INSTANCE_PROMPT}" \
    --class-prompt "${CLASS_PROMPT}" \
    --resolution ${RESOLUTION} \
    --batch-size ${BATCH_SIZE} \
    --num-steps ${NUM_STEPS} \
    --gradient-accumulation ${GRADIENT_ACCUMULATION} \
    --lr ${LEARNING_RATE} \
    --fp16 \
    --gradient-checkpointing

## Quick Test of Fine-tuned Model

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    model_name_base,
    torch_dtype=torch.float16,
)
pipeline.to(device)

pipeline.load_lora_weights(
    project_name, 
    weight_name="pytorch_lora_weights.safetensors"
)

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    model_name_refiner,
    torch_dtype=torch.float16,
)
refiner.to(device)

subject_prompt = subject_prompt = """oue, photo of oue car, sporty, fast, sleek, sexy, 
aggressive, high performance, daytime, futuristic cityscape"""

subject_negative_prompt = """person, people, human, rider, floating objects, text, 
words, writing, letters, phrases, trademark, watermark, icon, logo, banner, signature, 
username, monochrome, cropped, cut-off"""

refiner_prompt = "ultra-high-definition, photorealistic, 8k uhd, high-quality, ultra sharp detail"

refiner_negative_prompt = """low quality, low-resolution, out of focus, blurry, 
grainy, artifacts, defects, jpeg artifacts, noise"""

for seed in range(0, 10):
    generator = torch.Generator(device).manual_seed(seed)
    base_image = pipeline(
        prompt=f"{subject_prompt}, {refiner_prompt}", 
        negative_prompt=f"{subject_negative_prompt}, {refiner_negative_prompt}",
        num_inference_steps=100,
        generator=generator,
        height=1024,
        width=1024,
        output_type="latent",
    ).images[0]
        
    refined_image = refiner(
        prompt=refiner_prompt, 
        negative_prompt=refiner_negative_prompt,
        num_inference_steps=20,
        generator=generator, 
        image=base_image,
    ).images[0]

    refined_image.save(f"./generated_images/car_finetuned_model_photo_square_{seed}.png")

## Rough Product Sketches of Scooter with Fine-tuned Model

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    model_name_base,
    torch_dtype=torch.float16,
)
pipeline.to(device)

pipeline.load_lora_weights(
    project_name, 
    weight_name="pytorch_lora_weights.safetensors"
)

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    model_name_refiner,
    torch_dtype=torch.float16,
)
refiner.to(device)

subject_prompt = """oue, marker rendering of oue electric scooter, concept art, 
futuristic cityscape, high contrast, black and white, black marker, marker drawing, 
sketch, monochromatic illustration, illustrative, graphic, muted, expressive strokes"""

subject_negative_prompt = """person, people, human, rider, floating objects, colors, text, 
words, writing, letters, phrases, trademark, watermark, icon, logo, banner, signature, 
username, cropped, cut-off, patterned background"""

# we don't want a photographic image
refiner_prompt = "sharp, crisp, in-focus, uncropped, high-quality"

refiner_negative_prompt = """photographic, photo, photorealistic, low quality, low-resolution, 
out of focus, blurry, grainy, artifacts, defects, jpeg artifacts, noise"""

for seed in range(0, 10):
    generator = torch.Generator(device).manual_seed(seed)
    base_image = pipeline(
        prompt=f"{subject_prompt}, {refiner_prompt}", 
        negative_prompt=f"{subject_negative_prompt}, {refiner_negative_prompt}",
        num_inference_steps=100,
        generator=generator,
        height=768,
        width=1024,
        output_type="latent",
    ).images[0]
        
    refined_image = refiner(
        prompt=refiner_prompt, 
        negative_prompt=refiner_negative_prompt,
        num_inference_steps=20,
        generator=generator, 
        image=base_image,
    ).images[0]
    
    refined_image.save(f"./generated_images/scooter_finetuned_sketch_wide_{seed}.png")

## Color Marker Renderings of Scooter with Fine-tuned Model

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    model_name_base,
    torch_dtype=torch.float16,
)
pipeline.to(device)

pipeline.load_lora_weights(
    project_name, 
    weight_name="pytorch_lora_weights.safetensors"
)

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    model_name_refiner,
    torch_dtype=torch.float16,
)
refiner.to(device)

subject_prompt = """oue, marker rendering of oue electric scooter, concept art, 
futuristic cityscape, solid color background, bright vibrant colors, marker, sketch, 
illustration, illustrative, marker drawing, expressive strokes, graphic"""

subject_negative_prompt = """person, people, human, rider, floating objects, text, 
words, writing, letters, phrases, trademark, watermark, icon, logo, banner, signature, 
username, monochrome, cropped, cut-off, patterned background"""

# we don't want a photographic image
refiner_prompt = "sharp, crisp, in-focus, uncropped, high-quality"

refiner_negative_prompt = """photographic, photo, photorealistic, low quality, 
low-resolution, out of focus, blurry, grainy, artifacts, defects, jpeg artifacts, noise"""

for seed in range(0, 10):
    generator = torch.Generator(device).manual_seed(seed)
    base_image = pipeline(
        prompt=f"{subject_prompt}, {refiner_prompt}", 
        negative_prompt=f"{subject_negative_prompt}, {refiner_negative_prompt}",
        num_inference_steps=100,
        generator=generator,
        height=768,
        width=1024,
        output_type="latent",
    ).images[0]

    refined_image = refiner(
        prompt=refiner_prompt, 
        negative_prompt=refiner_negative_prompt,
        num_inference_steps=20,
        generator=generator, 
        image=base_image,
    ).images[0]
    
    refined_image.save(f"./generated_images/scooter_finetuned_color_maker_wide_{seed}.png")

## Photorealistic Images of Scooter with Fine-tuned Model

In [None]:
pipeline = DiffusionPipeline.from_pretrained(
    model_name_base,
    torch_dtype=torch.float16,
)
pipeline.to(device)

pipeline.load_lora_weights(
    project_name, 
    weight_name="pytorch_lora_weights.safetensors"
)

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    model_name_refiner,
    torch_dtype=torch.float16,
)
refiner.to(device)

subject_prompt = """oue, photo of a oue electric scooter, sleek, smooth curves, colorful, 
daytime, urban, futuristic cityscape"""

subject_negative_prompt = """person, people, human, rider, floating objects, text, 
words, writing, letters, phrases, trademark, watermark, icon, logo, banner, signature, 
username, monochrome, cropped, cut-off, patterned background"""

refiner_prompt = "ultra-high-definition, photorealistic, 8k uhd, high-quality, ultra sharp detail"

refiner_negative_prompt = """low quality, low-resolution, out of focus, blurry, 
grainy, artifacts, defects, jpeg artifacts, noise"""

for seed in range(0, 1):
    generator = torch.Generator(device).manual_seed(seed)
    base_image = pipeline(
        prompt=f"{subject_prompt}, {refiner_prompt}", 
        negative_prompt=f"{subject_negative_prompt}, {refiner_negative_prompt}",
        num_inference_steps=100,
        generator=generator,
        height=768,
        width=1024,
        output_type="latent",
    ).images[0]
        
    refined_image = refiner(
        prompt=refiner_prompt, 
        negative_prompt=refiner_negative_prompt,
        num_inference_steps=20,
        generator=generator, 
        image=base_image,
    ).images[0]
    
    refined_image.save(f"./generated_images/scooter_finetuned_photo_wide_{seed}.png")

## Scaling Images

Using the `StableDiffusionUpscalePipeline` and the `stabilityai/stable-diffusion-x4-upscaler` model to upscale an existing image. Upscaling can also be included as part of the pipeline.

In [None]:
from PIL import Image
image = Image.open(r"./image_samples/image_705.png")
image.show()

In [None]:
from diffusers import StableDiffusionUpscalePipeline

upscaler = StableDiffusionUpscalePipeline.from_pretrained(
    model_name_upscaler_4x,
    torch_dtype=torch.float16,
)

upscaler.to(device)

refiner_prompt = ""

refiner_negative_prompt = """low quality, low-resolution, out of focus, blurry, grainy, artifacts, defects, 
jpeg artifacts, noise"""

generator = torch.Generator(device).manual_seed(0)

upscaled_image = upscaler(
    prompt=refiner_prompt,
    negative_prompt=refiner_negative_prompt,
    num_inference_steps=15,
    generator=generator,
    image=image,
).images[0]

upscaled_image.save(f"./upscaled_example_{seed}.png")

### CUDA Memory Issues using GPU vs. CPU

Single GPU-based instances, even `ml.g5.4xlarge` will frequently run out of memory (`OutOfMemoryError`). Notebook environment instance needs to be shutdown, then restarted. Using the `Restart Kernel...` command is not sufficient.

Example CUDA memory error resulting from using anything smaller than a `ml.g5.2xlarge` GPU instance:

```OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 14.76 GiB total capacity; 13.21 GiB already allocated; 362.75 MiB free; 13.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF```

Example CUDA memory errors resulting from using anything smaller than a `ml.g5.4xlarge` GPU instance:

```OutOfMemoryError: CUDA out of memory. Tried to allocate 1.98 GiB (GPU 0; 22.20 GiB total capacity; 15.94 GiB already allocated; 1.26 GiB free; 19.60 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF```

In [None]:
print(torch.cuda.mem_get_info())

In [None]:
print(torch.cuda.memory_summary(device="cuda", abbreviated=False))

In [None]:
import gc
gc.collect()

import torch
torch.cuda.empty_cache()