<a href="https://colab.research.google.com/github/skillfi/google-colab/blob/notebooks/diffusion_finetune_skyscrapers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Step 1: Download the base model and sample some images from it

Install all the Huggingface python packages

In [None]:
%%capture
!pip install git+https://github.com/huggingface/diffusers.git
!pip install accelerate
!pip install datasets
!pip install bitsandbytes

Set an environment variable for the base model to be fine-tuned

In [None]:
%env MODEL_NAME=stabilityai/sdxl-turbo

Define a simple function to plot a list of images returned from the model when generating

In [None]:
def plot_images(images):
    from matplotlib import pyplot as plt
    plt.figure()
    f, axarr = plt.subplots(1, len(images), figsize=(20,10))
    for ax, img in zip(axarr.flatten(), images):
        ax.imshow(img)
        ax.axis('off')
    plt.show()

Import the relevant python libraries to load and sample from the Stable Diffusion model

In [None]:
import os
import torch

from diffusers import StableDiffusionPipeline
from diffusers import DiffusionPipeline

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

In this step we initialise the model and move it to the GPU (you need to choose the correct runtime on google colab to run this command). This step will also trigger the download of the model. The model is a few GB, it might take some time but it will be faster than what you expect. 😏

In [None]:
pipe = StableDiffusionPipeline.from_pretrained(os.getenv('MODEL_NAME'), torch_dtype=torch.float16)
pipe = pipe.to("cuda")

We set the text prompt for generating the images and run the actual generation.

In [None]:
prompt = "isometric view of a skyscraper in the style of a city building game"
images = pipe(prompt, num_images_per_prompt=6).images

Let's plot the images from the base model with our function:

In [None]:
plot_images(images)

**IMPORTANT:** We need to free up the memory of the GPU to be able to start the actual training, let's delete the python variables and collect all the garbage using the garbage collector. Finally we use torch to empty the GPU memory

In [None]:
# Flush the GPU memory to be able to run the training
del pipe
del images

In [None]:
import gc
gc.collect()
torch.cuda.empty_cache()

## Step 2: Fine-tune the model

Let's clone the dataset and the hugging face code which contains the finetuning script

In [None]:
!git clone https://github.com/huggingface/diffusers.git

Cloning into 'diffusers'...
remote: Enumerating objects: 65872, done.[K
remote: Counting objects: 100% (15556/15556), done.[K
remote: Compressing objects: 100% (1763/1763), done.[K
remote: Total 65872 (delta 14736), reused 13940 (delta 13729), pack-reused 50316[K
Receiving objects: 100% (65872/65872), 45.98 MiB | 15.36 MiB/s, done.
Resolving deltas: 100% (48896/48896), done.


Set some flags for the finetuning script, the dataset to be used and the number of iterations. Since the dataset we chose is small and we want to finetune quickly using the colab free tier, let's only finetune for 50 epochs

In [None]:
%%capture
%env dataset_name=SkillFi/diffusion-people-1.0
%env MODEL_NAME=runwayml/stable-diffusion-v1-5
# No need to train the model for long to see meaningful results.
%env max_training_epochs = 1000

Now we run the actual fine-tuning script. **IMPORTANT:** Remember if you want to train on google's free T4 GPU it's crucial to add the flag

```
--use_8bit_adam
```
We will save the model in the *city-building-model* folder


In [None]:
from huggingface_hub import login
login('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')

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
!pip install peft

Collecting peft
  Downloading peft-0.11.1-py3-none-any.whl (251 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/251.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━[0m [32m204.8/251.6 kB[0m [31m6.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: peft
Successfully installed peft-0.11.1


In [None]:
# The --use_8bit_adam flag is crucial to be able to train on the T4 GPU which has only 15GB of memory
!accelerate launch diffusers/examples/text_to_image/train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$dataset_name \
  --mixed_precision="fp16" \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --use_8bit_adam \
  --gradient_accumulation_steps=4 \
  --max_train_steps=$max_training_epochs \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --lr_scheduler="cosine" --lr_warmup_steps=0 \
  --output_dir="diffusion-people-3.1" \
  --push_to_hub \
  --hub_model_id="SkillFi/diffusion-people-3.1" \
  --report_to=wandb \
  --checkpointing_steps=500 \
  --validation_prompt="Karina" \
  --seed=1336

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Steps:  16% 16/100 [03:24<18:55, 13.51s/it, lr=9.38e-6, step_loss=0.142]{'requires_safety_checker', 'image_encoder'} was not found in config. Values will be initialized to default values.

Loading pipeline components...:   0% 0/7 [00:00<?, ?it/s][ALoaded safety_checker as StableDiffusionSafetyChecker from `safety_checker` subfolder of runwayml/stable-diffusion-v1-5.

Loading pipeline components...:  14% 1/7 [00:00<00:04,  1.22it/s][A{'prediction_type', 'timestep_spacing'} was not found in config. Values will be initialized to default values.
Loaded scheduler as PNDMScheduler from `scheduler` subfolder of runwayml/stable-diffusion-v1-5.
{'shift_factor', 'use_quant_conv', 'force_upcast', 'scaling_factor', 'use_post_quant_conv', 'latents_mean', 'latents_std'} was not found in config. Values will be initialized to default values.
Loaded vae as AutoencoderKL from `vae` subfolder of runwayml/stable-diffusion-v1-5.

Loading pi

## Step 3: Sample from the finetuned model

We load the new model in the GPU (this time we don't need to Download anything as the fine-tuned model has been saved locally) and generate some more images

In [None]:
pipe = StableDiffusionPipeline.from_pretrained('SkillFi/diffusion-people-3.1', torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "Karina"
images = pipe(prompt, num_images_per_prompt=6).images

EntryNotFoundError: 404 Client Error. (Request ID: Root=1-669e4c71-1c2a459a3726a11e28e88004;46075f84-09c4-42ac-9147-25cfe51e7725)

Entry Not Found for url: https://huggingface.co/SkillFi/diffusion-people-3.1/resolve/main/model_index.json.

The new images! Hopefully the quality improved thanks to the finetuning process. You can play with the number of epochs to see how the fine-tuning process impacts the final output 💪

In [None]:
plot_images(images)