# SDXL LoRA Training on Google Colab

Train a character-specific LoRA for Stable Diffusion XL using a free Colab GPU. This notebook will:

- Set up the Colab runtime (GPU) and dependencies
- Clone or upload this repository (`qylysh-higgsfiled1`)
- Validate reference images (aldar1–aldar5)
- Run the real LoRA training script
- Package trained LoRA weights for download
- Smoke-test generation on the Colab GPU
- Provide steps to integrate the LoRA into your local app

Tip: Colab sessions are ephemeral. Save artifacts to Drive or download them before disconnecting.

In [None]:
# 1) Create and Verify Colab Runtime
import sys, os, platform, site
from pathlib import Path

print("Python:", sys.version)
print("Platform:", platform.platform())
print("CWD:", os.getcwd())
print("Site-packages:", site.getsitepackages())
print("\nListing current directory:")
!ls -la

print("\nGPU status (if GPU runtime is enabled):")
try:
    import torch
    print("torch", torch.__version__, "CUDA available:", torch.cuda.is_available())
except Exception as e:
    print("Torch not available yet:", e)

!nvidia-smi || echo "No NVIDIA GPU detected. Enable GPU in Runtime > Change runtime type."

In [None]:
# 2) Clone repo (or use manual upload fallback)
import os, zipfile
from pathlib import Path

repo_url = "https://github.com/sapogeth/qylysh-higgsfiled1.git"
repo_dir = Path("/content/qylysh-higgsfiled1")

if repo_dir.exists():
    print("Repository exists. Syncing to origin/main...")
    %cd {repo_dir}
    !git fetch --all
    # Ensure we're on main and hard reset to remote
    !git checkout main || true
    !git reset --hard origin/main
    !git log -1 --oneline
else:
    print("Cloning repo...")
    rc = os.system(f"git clone {repo_url} {repo_dir}")
    if rc != 0:
        print("\n✗ Git clone failed. Falling back to manual upload.")
        from google.colab import files
        print("Please upload a zip of the repository (e.g., repo.zip)")
        uploaded = files.upload()  # Prompts file picker
        zip_name = list(uploaded.keys())[0]
        with zipfile.ZipFile(zip_name, 'r') as z:
            z.extractall('/content')
        # Try to find repo folder
        if (Path('/content/qylysh-higgsfiled1').exists()):
            repo_dir = Path('/content/qylysh-higgsfiled1')
        else:
            # If the root folder differs, set it heuristically
            for p in Path('/content').iterdir():
                if p.is_dir() and (p / 'requirements.txt').exists() and (p / 'config.py').exists():
                    repo_dir = p
                    break
    %cd {repo_dir}

print("\nUsing repo at:", repo_dir)

In [None]:
# 3) Install dependencies for training (per-session)
print("Installing Python dependencies... (this may take 2-4 minutes on first run)")
!pip install -q -r requirements.txt
# Ensure CUDA-capable torch if needed (Colab often provides it by default)
!pip install -q --upgrade torch torchvision
!pip install -q ipywidgets

import torch
print("\n✓ torch:", torch.__version__, "CUDA:", torch.cuda.is_available())

In [None]:
# 3.5) Clear corrupted Hugging Face cache (run if model loading fails)
# This removes cached SDXL files so they'll re-download cleanly
import shutil
from pathlib import Path

hf_cache = Path('/root/.cache/huggingface')
if hf_cache.exists():
    print(f"Clearing Hugging Face cache at {hf_cache}...")
    shutil.rmtree(hf_cache, ignore_errors=True)
    print("✓ Cache cleared. Models will re-download on next run.")
else:
    print("Cache directory doesn't exist yet (normal on first run).")

In [None]:
# 4) Verify project layout and reference images
from pathlib import Path

needed = [Path('aldar1.png'), Path('aldar2.png'), Path('aldar3.png'), Path('aldar4.png'), Path('aldar5.png')]
missing = [str(p) for p in needed if not p.exists()]
print("Project files present:")
!ls -la | head -n 50

if missing:
    print("\n✗ Missing reference images:")
    for m in missing:
        print("  -", m)
    print("\nUpload the missing files using the cell below.")
else:
    print("\n✓ All reference images found:")
    for p in needed:
        print("  -", p)

# Optional upload helper if something is missing
if missing:
    from google.colab import files
    print("\nUse the chooser to upload the missing images now.")
    uploaded = files.upload()
    print("Uploaded:", list(uploaded.keys()))

In [None]:
# 5) Run real LoRA training script (uses GPU)
%env DEBUG=False

import config
print("Device:", config.get_device())
print("DType:", config.get_dtype())

# Start training (this can take 30-90+ minutes on first run due to model download)
!python train_lora_real.py

In [None]:
# 6) Locate trained LoRA weights (file or folder)
from pathlib import Path

file_candidate = Path('models/aldar_kose_lora.safetensors')
folder_candidate = Path('models/aldar_kose_lora.safetensors')  # save_pretrained may create a folder

lora_path = None
if file_candidate.exists() and file_candidate.is_file():
    lora_path = file_candidate
    print("✓ Found LoRA file:", lora_path.resolve())
elif folder_candidate.exists() and folder_candidate.is_dir():
    lora_path = folder_candidate
    print("✓ Found LoRA folder:", lora_path.resolve())
else:
    print("✗ Could not find LoRA weights at expected paths.")
    print("Check training logs for save locations, or adjust this cell to your output path.")

lora_path

In [None]:
# 7) Smoke test: generate one image on Colab GPU
# Workaround: Colab torch versions sometimes have mtia import errors; this wrapper catches them
import os
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'  # prevent MPS-related errors on non-Mac

try:
    # Silence torch.compile and mtia warnings
    import warnings
    warnings.filterwarnings('ignore', category=UserWarning)
    warnings.filterwarnings('ignore', category=FutureWarning)
    
    from local_image_generator import LocalImageGenerator
    from IPython.display import display
    
    print("Initializing generator (this loads SDXL + LoRA, may take 1-3 min)...")
    gen = LocalImageGenerator(lazy_load=False)
    
    print("Generating test image...")
    img = gen.generate_single("aldar_kose_character in the steppe, 2D storybook illustration")
    
    out_path = "/content/lora_test_colab.png"
    img.save(out_path)
    print("✓ Saved test image:", out_path)
    display(img)
    
except ImportError as e:
    print("✗ Import failed (likely missing dependencies):", e)
    print("Run cell 3 again to ensure all packages are installed.")
except AttributeError as e:
    if 'mtia' in str(e) or '_set_stream_by_id' in str(e):
        print("✗ Known torch.mtia compatibility issue detected.")
        print("Workaround: Restart the runtime (Runtime > Restart runtime) and re-run from cell 1.")
        print("This is a Colab/torch version mismatch; does not affect training or local usage.")
    else:
        print("✗ Attribute error:", e)
except Exception as e:
    print("✗ Test generation failed:", e)
    import traceback
    traceback.print_exc()

In [None]:
# 8) Package LoRA for download
from pathlib import Path
import shutil, os
from IPython.display import display

exports = Path('/content/exports')
exports.mkdir(exist_ok=True)

artifact_paths = []
if 'lora_path' in globals() and lora_path:
    if lora_path.is_dir():
        zip_out = exports / 'aldar_kose_lora_export.zip'
        if zip_out.exists():
            zip_out.unlink()
        !zip -r -q {zip_out} {lora_path}
        print("✓ Zipped LoRA folder to:", zip_out)
        artifact_paths.append(zip_out)
    elif lora_path.is_file():
        dst = exports / 'aldar_kose_lora.safetensors'
        shutil.copy2(lora_path, dst)
        print("✓ Copied LoRA file to:", dst)
        artifact_paths.append(dst)
else:
    print("✗ lora_path not set. Skipping packaging.")

print("\nArtifacts:")
for p in artifact_paths:
    print(" -", p.resolve())

In [None]:
# 9) Download artifact(s) to your machine
from google.colab import files

if 'artifact_paths' in globals() and artifact_paths:
    for p in artifact_paths:
        print("Offering download:", p)
        files.download(str(p))
else:
    print("No artifacts to download. Run the previous cell after training.")

## Integrate your trained LoRA locally

1) Move the downloaded LoRA into your repo's `models/` folder.
   - If you downloaded a `.safetensors` file: place it at `models/aldar_kose_lora.safetensors`
   - If you downloaded a zip containing a folder: unzip under `models/` and update the path accordingly

2) Ensure `config.LORA_PATH` points to that artifact (file or folder).

3) Quick local test (optional):
   - Activate your venv and run:
   - python -c "from local_image_generator import LocalImageGenerator; gen=LocalImageGenerator(lazy_load=False); img=gen.generate_single('aldar_kose_character in the steppe'); img.save('test.png')"

4) Start the app:
   - python app.py
   - Open http://localhost:8080 and generate a storyboard.

Notes:
- Colab sessions are temporary; re-run the training cells if disconnected.
- Training time depends on GPU availability, model cache, and settings (30–90+ min).