## Fine-Tuning Stable Diffusion with LoRA

I created this file using the following resources:

*   [Fine-Tuning Stable Diffusion with LoRA](https://machinelearningmastery.com/fine-tuning-stable-diffusion-with-lora) (Webpage)
*   [Fine-Tune Stable Diffusion with LoRA for as Low as $1](https://youtu.be/Zev6F0T1L3Y?t=458) (Video)
*   [Pokémon Dataset](https://huggingface.co/datasets/svjack/pokemon-blip-captions-en-zh) (Dataset)
*   [Diffusers GitHub Repository](https://github.com/huggingface/diffusers/) (GitHub)
*   [How to Fine-Tune with LoRA by Hugging Face](https://huggingface.co/docs/diffusers/en/training/lora) (Documentation)

### 1. Prepare the Dataset
In this guide, we will fine-tune a Stable Diffusion model using LoRA with the **[Pokémon dataset](https://huggingface.co/datasets/svjack/pokemon-blip-captions-en-zh)** from Hugging Face. This dataset can be easily replaced with another dataset.

If you are creating your **own dataset**, you can prepare a CSV file named `metadata.csv`. The first column should contain `file_name`, and the second column should contain corresponding text captions. [[1](https://machinelearningmastery.com/fine-tuning-stable-diffusion-with-lora/)]

### 2. Set Up the Environment
Next, install the required libraries, including `diffusers`, `accelerate`, and `wandb`. [[1]](https://machinelearningmastery.com/fine-tuning-stable-diffusion-with-lora/). The main script used for this purpose can be found [here](https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/train_text_to_image_lora.py).

### 3. Run the Training Script
Use the `accelerate launch` command with the training script, specifying the dataset, model, and hyperparameters. A detailed explanation of each parameter can be found [here](https://learnopencv.com/fine-tuning-stable-diffusion-3-5m/).

### 4. Training Process
The training process can take several hours, even with a high-end GPU.

### 5. Using Your Trained LoRA Model
After training, you will have a small weight file, typically named `pytorch_lora_weights.safetensors`. You can use it by loading it into a Stable Diffusion pipeline with:
`pipe.unet.load_attn_procs(model_path)`


In [1]:
!pip install git+https://github.com/huggingface/diffusers
!pip install accelerate wandb
!pip install -r https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/requirements.txt

!accelerate config default
# accelerate configuration saved at $HOME/.cache/huggingface/accelerate/default_config.yaml

!accelerate config default
# accelerate configuration saved at $HOME/.cache/huggingface/accelerate/default_config.yaml

Collecting git+https://github.com/huggingface/diffusers
  Cloning https://github.com/huggingface/diffusers to /tmp/pip-req-build-4wk6we6_
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/diffusers /tmp/pip-req-build-4wk6we6_
  Resolved https://github.com/huggingface/diffusers to commit 464374fb87610c53b2cf81e08d3df628fada3ce4
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: diffusers
  Building wheel for diffusers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for diffusers: filename=diffusers-0.33.0.dev0-py3-none-any.whl size=3288304 sha256=c8dcf28316d0e1e9c9eee6f844a7eeb7841dfaf3d00eb17d526a05e0d02c9a9c
  Stored in directory: /tmp/pip-ephem-wheel-cache-wzt492fg/wheels/90/fb/48/a310c271ab42899362ff272062ced42133e5c4c9d0ce77df68
Successfully built diffusers
Installing collected packa

In [2]:
import wandb
import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler, AutoPipelineForText2Image
from huggingface_hub import model_info

Now we download the code we using fine-tuning

In [4]:
!wget -q https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/train_text_to_image_lora.py

In [17]:
!ls
%cd sample_data/
!ls
%cd ..

sample_data  train_text_to_image_lora.py
/content/sample_data
anscombe.json		     california_housing_train.csv  mnist_train_small.csv
california_housing_test.csv  mnist_test.csv		   README.md
/content


In [None]:
%env MODEL_NAME=runwayml/stable-diffusion-v1-5
%env OUTPUT_DIR=./finetune_lora/pokemon
%env HUB_MODEL_ID=pokemon-lora
%env DATASET_NAME=svjack/pokemon-blip-captions-en-zh


In [None]:
!export MODEL_NAME="runwayml/stable-diffusion-v1-5"
!export OUTPUT_DIR="./finetune_lora/pokemon"
!export HUB_MODEL_ID="pokemon-lora"
!export DATASET_NAME="svjack/pokemon-blip-captions-en-zh"

!mkdir -p $OUTPUT_DIR

!accelerate launch --mixed_precision="bf16"  train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --dataloader_num_workers=8 \
  --resolution=512 \
  --center_crop \
  --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=15000 \
  --learning_rate=1e-04 \
  --max_grad_norm=1 \
  --lr_scheduler="cosine" \
  --lr_warmup_steps=0 \
  --output_dir=${OUTPUT_DIR} \
  --checkpointing_steps=500 \
  --caption_column="en_text" \
  --validation_prompt="A pokemon with blue eyes." \
  --seed=1337

2025-02-10 10:02:05.124237: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739181725.166864    8473 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739181725.181577    8473 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
02/10/2025 10:02:09 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: bf16

scheduler/scheduler_config.json: 100% 308/308 [00:00<00:00, 2.78MB/s]
{'rescale_betas_zero_snr', 'prediction_type', 'thresholding', 'timestep_spacing', 'dynamic_thresholding_ratio', 'clip_sample_range', 'sample_max_value', 'variance_type'} was not found 