# Simple LoRA Training with Custom Dataset
## Following Machine Learning Mastery Blog

This notebook follows the blog post step-by-step to train a LoRA on your custom dataset.


## Step 1: Install Required Libraries


In [3]:
# Install required libraries as mentioned in the blog
%pip install git+https://github.com/huggingface/diffusers
%pip install accelerate wandb
%pip install -r https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/requirements.txt
%pip install peft>=0.17.0

# Configure accelerate
!accelerate config default

Collecting git+https://github.com/huggingface/diffusers
  Cloning https://github.com/huggingface/diffusers to /tmp/pip-req-build-9fhmtq69
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/diffusers /tmp/pip-req-build-9fhmtq69
  Resolved https://github.com/huggingface/diffusers to commit a58a4f665b4aa86205fb8c1795e79c331d65bb18
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Configuration already exists at /root/.cache/huggingface/accelerate/default_config.yaml, will not override. Run `accelerate config` manually or pass a different `save_location`.


In [8]:
%pip install peft==0.17.0

Collecting peft==0.17.0
  Downloading peft-0.17.0-py3-none-any.whl.metadata (14 kB)
Downloading peft-0.17.0-py3-none-any.whl (503 kB)
Installing collected packages: peft
  Attempting uninstall: peft
    Found existing installation: peft 0.7.0
    Uninstalling peft-0.7.0:
      Successfully uninstalled peft-0.7.0
Successfully installed peft-0.17.0


In [9]:
# Test imports as suggested in the blog
import wandb
import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler, AutoPipelineForText2Image
from huggingface_hub import model_info

print("✅ All imports successful!")
print(f"CUDA available: {torch.cuda.is_available()}")


✅ All imports successful!
CUDA available: True


## Step 2: Download Training Script


In [10]:
# Download the training script as mentioned in the blog
!wget -q https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/train_text_to_image_lora.py

print("✅ Training script downloaded!")


✅ Training script downloaded!


## Step 3: Prepare Your Dataset

Create a directory with:
- Your images (JPG, PNG, etc.)
- A `metadata.csv` file with columns: `file_name`, `caption`

Example metadata.csv:
```
file_name,caption
image_0.png,a drawing of a green pokemon with red eyes
image_1.png,a green and yellow toy with a red nose
image_2.png,a red and white ball with an angry look on its face
```


In [19]:
# Set your dataset path here
DATASET_PATH = "/content/gdrive/MyDrive/arcade_comp_results/lora/datasets/channel_set"  # Change this to your dataset directory

import os
if os.path.exists(DATASET_PATH):
    print(f"✅ Dataset found at: {DATASET_PATH}")

    # Check for metadata.csv
    metadata_file = os.path.join(DATASET_PATH, "metadata.csv")
    if os.path.exists(metadata_file):
        print("✅ metadata.csv found")

        # Show dataset info
        import pandas as pd
        df = pd.read_csv(metadata_file)
        print(f"📊 Dataset has {len(df)} images")
        print("Sample entries:")
        print(df.head())
    else:
        print("❌ metadata.csv not found - please create it!")
else:
    print(f"❌ Dataset directory not found: {DATASET_PATH}")
    print("Please create your dataset directory with images and metadata.csv")


✅ Dataset found at: /content/gdrive/MyDrive/arcade_comp_results/lora/datasets/channel_set
✅ metadata.csv found


EmptyDataError: No columns to parse from file

## Step 4: Configure Training Parameters

Following the blog's example parameters:


In [14]:
# Training configuration following the blog
MODEL_NAME = "runwayml/stable-diffusion-v1-5"
OUTPUT_DIR = "/content/gdrive/MyDrive/arcade_comp_results/lora/finetune_lora/channel_set_1st"
HUB_MODEL_ID = "my-sy-custom-lora"

# Create output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"Model: {MODEL_NAME}")
print(f"Dataset: {DATASET_PATH}")
print(f"Output: {OUTPUT_DIR}")


Model: runwayml/stable-diffusion-v1-5
Dataset: /content/gdrive/MyDrive/arcade_comp_results/lora/my_custom_dataset
Output: /content/gdrive/MyDrive/arcade_comp_results/lora/finetune_lora/channel_set_1st


## Step 5: Start Training

This follows the exact command structure from the blog:


In [None]:
# Training command following the blog exactly
training_command = f"""
accelerate launch --mixed_precision="bf16" train_text_to_image_lora.py \\
  --pretrained_model_name_or_path={MODEL_NAME} \\
  ----train_data_dir={DATASET_PATH} \\
  --dataloader_num_workers=8 \\
  --resolution=512 \\
  --center_crop \\
  --random_flip \\
  --train_batch_size=1 \\
  --gradient_accumulation_steps=4 \\
  --max_train_steps=1000 \\
  --learning_rate=1e-04 \\
  --max_grad_norm=1 \\
  --lr_scheduler="cosine" \\
  --lr_warmup_steps=0 \\
  --output_dir={OUTPUT_DIR} \\
  --checkpointing_steps=500 \\
  --caption_column="caption" \\
  --validation_prompt="A beautiful artwork in my custom style." \\
  --seed=1337
"""

print("Training command:")
print(training_command)
print("\n⚠️ This will take several hours to complete!")


In [11]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
# Actually run the training (uncomment the line below)
# !{training_command}

print("Uncomment the line above to start training!")
print("Training will create checkpoints every 500 steps in:", OUTPUT_DIR)


## Step 6: Test Your Trained LoRA

Following the blog's usage example:


In [None]:
# Method 1: Manual loading (from the blog)
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from huggingface_hub import model_info
import torch

# Check if LoRA was trained
lora_file = os.path.join(OUTPUT_DIR, "pytorch_lora_weights.safetensors")

if os.path.exists(lora_file):
    print(f"✅ LoRA found: {lora_file}")

    # Load base model
    pipe = StableDiffusionPipeline.from_pretrained(MODEL_NAME, torch_dtype=torch.float16)
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

    # Load LoRA weights
    pipe.unet.load_attn_procs(OUTPUT_DIR)
    pipe.to("cuda")

    print("✅ LoRA loaded successfully!")

else:
    print("❌ LoRA file not found. Complete training first.")


In [None]:
# Method 2: Auto pipeline (easier method from the blog)
if os.path.exists(lora_file):
    pipeline = AutoPipelineForText2Image.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.float16
    ).to("cuda")

    pipeline.load_lora_weights(OUTPUT_DIR, weight_name="pytorch_lora_weights.safetensors")

    # Generate test image
    image = pipeline("A beautiful artwork in my custom style").images[0]

    # Display the image
    import matplotlib.pyplot as plt
    plt.figure(figsize=(8, 8))
    plt.imshow(image)
    plt.axis('off')
    plt.title("Generated with Custom LoRA")
    plt.show()

    # Save the image
    image.save("custom_lora_test.png")
    print("✅ Test image saved as: custom_lora_test.png")

else:
    print("❌ Complete training first!")


## Done! 🎉

Your LoRA is now trained and ready to use. The main file is:
- `pytorch_lora_weights.safetensors` in your output directory

You can use this LoRA file in:
- Python code (as shown above)
- Automatic1111 WebUI
- ComfyUI  
- Any other Stable Diffusion interface that supports LoRA
