# Unsloth Fine-tuning on Google Colab

Train and fine-tune LLMs with Unsloth on Google Colab's free GPU.

**Before you start:**
1. Runtime → Change runtime type → GPU → T4 GPU (free tier)
2. Make a copy of this notebook to your Google Drive

**Total time:** varies by hardware (setup + training)

## Step 1: Setup Environment

Install dependencies (may take several minutes)

In [None]:
%%capture
# Install dependencies in the correct order
!pip install --upgrade pip

# Core ML frameworks
!pip install "trl>=0.12.0" "peft>=0.13.0" "bitsandbytes>=0.45.0" "transformers[sentencepiece]>=4.46.0"

# PyTorch
!pip install torch==2.8.0 torchvision --index-url https://download.pytorch.org/whl/cu121

# Unsloth
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# xformers
!pip install --no-deps "xformers>=0.0.32,<0.0.33" --index-url https://download.pytorch.org/whl/cu121

# Additional dependencies
!pip install datasets huggingface_hub accelerate sentencepiece protobuf python-dotenv

print("✅ Installation complete!")

## Step 2: Clone Repository

In [None]:
# Clone the repository (removes old version if exists)
import os
import shutil

# Change to /content first (in case we're inside the repo)
%cd /content

repo_path = '/content/unsloth-finetuning'

if os.path.exists(repo_path):
    print("🔄 Removing old repository to fetch latest changes...")
    shutil.rmtree(repo_path)

print("📥 Cloning repository...")
!git clone https://github.com/farhan-syah/unsloth-finetuning.git
%cd unsloth-finetuning
print("✅ Repository cloned!")


## Step 3: Configure Training

Create a custom YAML configuration or use the defaults.

**Two options:**
- **Quick test:** Use `quick_test.yaml` (already included, rank=32, 1 epoch, 100 samples)
- **Custom config:** Create your own YAML below

For this demo, we'll create a custom Colab-optimized config.

In [None]:
# Create custom YAML configuration for Colab
yaml_config = """
# Colab-optimized configuration
# Model selection
model:
  base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
  inference_model: null  # Skip GGUF in Colab (no llama.cpp)
  output_name: auto

# Dataset
dataset:
  name: yahma/alpaca-cleaned
  max_samples: 100  # Quick test - use 0 for full dataset

# Training configuration
training:
  lora:
    rank: 16
    alpha: 32
    dropout: 0.0
    use_rslora: false
  
  batch:
    size: 2
    gradient_accumulation_steps: 4  # Effective batch size = 8
  
  optimization:
    learning_rate: 2e-4
    optimizer: adamw_8bit
    warmup_ratio: 0.1
    warmup_steps: 0
    max_grad_norm: 1.0
    use_gradient_checkpointing: true
  
  epochs:
    num_train_epochs: 1
    max_steps: 50  # Quick test - use 0 for full training
  
  data:
    packing: false
    seed: 3407
    max_seq_length: 2048

# Logging
logging:
  logging_steps: 5
  save_steps: 25
  save_total_limit: 2
  save_only_final: true

# Output (skip GGUF for Colab)
output:
  formats: []  # Empty = no GGUF conversion

# Benchmark
benchmark:
  max_tokens: 512
  batch_size: 8
  default_backend: transformers
  default_tasks:
    - ifeval
"""

# Save as colab_config.yaml
with open('colab_config.yaml', 'w') as f:
    f.write(yaml_config)

print("✅ Created colab_config.yaml")
print("\n📊 Configuration:")
print("   Model: Llama-3.2-1B-Instruct (4-bit)")
print("   Dataset: alpaca-cleaned (100 samples)")
print("   Training: 50 steps, LoRA rank 16")
print("   Batch: 2 × 4 accumulation = 8 effective")
print("   Output: Merged 16-bit only (no GGUF)")
print("\n💡 Edit the yaml_config string above to customize settings")

## Step 4: Preprocess Dataset

Analyze your dataset and get smart configuration recommendations.

**This step:**
- Preprocesses and analyzes your dataset (cached automatically)
- Shows sequence length statistics
- Recommends optimal batch size and training steps
- No manual configuration needed - uses colab_config.yaml

**Note:** Preprocessed data is cached. If you change the dataset or max_seq_length in Step 3, just rerun that cell - preprocessing will detect changes automatically.

In [None]:
# Preprocess dataset using YAML config
!python scripts/preprocess.py --config colab_config.yaml

## Step 5: Train Model

Train with automatic LoRA backup system - existing adapters are backed up automatically.

**This will take ~2 minutes for quick test (50 steps)**

**Backup system:**
- Existing LoRA adapters are automatically backed up before training
- Backups saved with timestamps: `lora.backup.20250124_143022/`
- No need for manual file management

In [None]:
# Run training with YAML config
!python scripts/train.py --config colab_config.yaml

## Step 6: Build Merged Model

Create merged model (LoRA + base model combined) in safetensors format.

**Why skip GGUF in Colab?**
- GGUF conversion requires llama.cpp (not available in Colab)
- **Better workflow:** Create merged model here, convert to GGUF locally (CPU-only, no GPU needed)

**This step creates:** `merged_16bit/` folder with complete model

In [None]:
# Build merged model using YAML config
!python scripts/build.py --config colab_config.yaml

## Step 7: Save Your Model

**You have two models to save:**

1. **LoRA adapters** (~80-100MB) - Small, efficient, requires base model to use
2. **Merged model** (size varies by model) - Complete model, ready to use anywhere

**Choose your preferred method:**
- **Option A (Recommended):** HuggingFace Hub - Free, unlimited storage, easy sharing
- **Option B:** Google Drive - Simple, but limited free storage (15GB)

In [None]:
# Check your model output
import os

# List output directories
output_dirs = [d for d in os.listdir('outputs') if os.path.isdir(os.path.join('outputs', d))]
if output_dirs:
    model_dir = output_dirs[0]
    print(f"✅ Your model is in: outputs/{model_dir}")
    print(f"\nContents:")
    !ls -lh outputs/{model_dir}
    print(f"\nLoRA adapters: outputs/{model_dir}/lora/")
    print(f"Merged model: outputs/{model_dir}/merged_16bit/")
else:
    print("❌ No model found in outputs/")

### Option A: Push to HuggingFace Hub (Recommended)

**Why HuggingFace?**
- Free, unlimited storage
- Easy sharing and version control
- Direct integration with transformers, Ollama, etc.

**Steps:**
1. Get your HuggingFace token: https://huggingface.co/settings/tokens (create with "Write" access)
2. Run the cells below to push both LoRA and merged models

In [None]:
# A1. Configure HuggingFace settings
from huggingface_hub import login, HfApi
import os

# Try to get HF_TOKEN from Colab secrets first (recommended)
try:
    from google.colab import userdata
    HF_TOKEN = userdata.get('HF_TOKEN')
    print("✅ Using HF_TOKEN from Colab secrets")
except:
    # Fallback to .env or interactive input
    HF_TOKEN = os.getenv('HF_TOKEN', '')
    if not HF_TOKEN:
        print("💡 TIP: Add HF_TOKEN to Colab secrets (🔑 icon in left sidebar) for easier reuse")

# If you didn't set HF_USERNAME in Step 3, set it here
if not HF_USERNAME:
    HF_USERNAME = "your-username"  # Your HuggingFace username

# Repository names (auto-generated from model_dir by default)
LORA_REPO_NAME = f"{model_dir}-lora"
MERGED_REPO_NAME = f"{model_dir}"  # No suffix for merged model

print(f"HuggingFace Username: {HF_USERNAME}")
print(f"\nRepositories to create:")
print(f"   1. {HF_USERNAME}/{LORA_REPO_NAME} (LoRA adapters, ~80MB)")
print(f"   2. {HF_USERNAME}/{MERGED_REPO_NAME} (Merged model, size varies by model)")
print(f"\n💡 Later you can also create: {HF_USERNAME}/{model_dir}-GGUF (for GGUF quantized)")
print(f"\n📖 How to set up Colab secrets:")
print(f"   1. Click the 🔑 icon in the left sidebar")
print(f"   2. Add new secret: Name='HF_TOKEN', Value=<your token from https://huggingface.co/settings/tokens>")
print(f"   3. Toggle 'Notebook access' ON for this notebook")
print(f"\nReady to push? Run the next cell.")


In [None]:
# A2. Push both models to HuggingFace Hub
from huggingface_hub import login, HfApi, create_repo
from pathlib import Path
import os
import subprocess

# Login to HuggingFace
if HF_TOKEN:
    login(token=HF_TOKEN)
else:
    print("\n🔐 No HF_TOKEN found. Please enter your token:")
    print("   Get it from: https://huggingface.co/settings/tokens")
    login()  # Will prompt interactively

api = HfApi()

# Get model paths
lora_path = f"outputs/{model_dir}/lora"
merged_path = f"outputs/{model_dir}/merged_16bit"

# Calculate sizes
lora_size = sum(f.stat().st_size for f in Path(lora_path).rglob('*') if f.is_file())
lora_size_mb = lora_size / (1024 * 1024)

merged_size = sum(f.stat().st_size for f in Path(merged_path).rglob('*') if f.is_file())
merged_size_gb = merged_size / (1024 * 1024 * 1024)

print("="*60)
print("UPLOADING TO HUGGINGFACE HUB")
print("="*60)

# Generate README files using standardized script
print("\n[0/3] Generating model cards...")
try:
    result = subprocess.run(
        ["python", "scripts/generate_readme_train.py"],
        capture_output=True,
        text=True,
        timeout=10
    )
    if result.returncode == 0:
        print("      ✅ Model cards generated")
    else:
        print(f"      ⚠️  Warning: {result.stderr}")
except Exception as e:
    print(f"      ⚠️  Could not generate model cards: {e}")

# 1. Push LoRA adapters
lora_repo_id = f"{HF_USERNAME}/{LORA_REPO_NAME}"
print(f"\n[1/3] Pushing LoRA adapters to {lora_repo_id}...")
print(f"      Size: {lora_size_mb:.1f} MB")

try:
    create_repo(repo_id=lora_repo_id, repo_type="model", exist_ok=True)
    api.upload_folder(
        folder_path=lora_path,
        repo_id=lora_repo_id,
        repo_type="model",
        commit_message="Upload LoRA adapters"
    )
    print(f"      ✅ LoRA adapters uploaded!")
    print(f"      🔗 https://huggingface.co/{lora_repo_id}")
except Exception as e:
    print(f"      ❌ Error: {e}")

# 2. Push merged model
merged_repo_id = f"{HF_USERNAME}/{MERGED_REPO_NAME}"
print(f"\n[2/3] Pushing merged model to {merged_repo_id}...")
print(f"      Size: {merged_size_gb:.2f} GB (this will take several minutes)")

try:
    create_repo(repo_id=merged_repo_id, repo_type="model", exist_ok=True)
    api.upload_folder(
        folder_path=merged_path,
        repo_id=merged_repo_id,
        repo_type="model",
        commit_message="Upload merged model"
    )
    print(f"      ✅ Merged model uploaded!")
    print(f"      🔗 https://huggingface.co/{merged_repo_id}")
except Exception as e:
    print(f"      ❌ Error: {e}")

print("\n" + "="*60)
print("UPLOAD COMPLETE")
print("="*60)
print(f"\n📦 Your models on HuggingFace:")
print(f"   • LoRA: https://huggingface.co/{lora_repo_id}")
print(f"   • Merged: https://huggingface.co/{merged_repo_id}")
print(f"\n💡 Use the merged model with:")
print(f"   • transformers: model = AutoModelForCausalLM.from_pretrained('{merged_repo_id}')")
print(f"   • Ollama: ollama pull hf.co/{merged_repo_id}")
print(f"\n📝 Model cards generated from training configuration")

### Option B: Google Drive (Alternative)

In [None]:
# B1. Upload to Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Copy to Google Drive
!mkdir -p /content/drive/MyDrive/unsloth-models
!cp -r outputs/* /content/drive/MyDrive/unsloth-models/

print("✅ Model copied to Google Drive: MyDrive/unsloth-models/")
print("")
print("📁 Your model contains:")
print("   - lora/ - LoRA adapters (~80MB)")
print("   - merged_16bit/ - Merged model in safetensors format (size varies by model)")
print("")
print("⚠️  Note: Google Drive free tier has 15GB storage limit")
print("Next: Download from Google Drive to convert to GGUF locally")

## Step 8: Convert to GGUF Locally (Optional)

After uploading to HuggingFace, you can download and convert to GGUF on your local machine.

**Why local conversion?**
- GGUF conversion is CPU-only (no GPU needed, works on any machine)
- llama.cpp not available in Colab
- Better for creating multiple quantization formats

---

### Quick Start: Download and Convert in 3 Commands

Run these commands on your **local machine**:

```bash
# 1. Setup (one-time only)
git clone https://github.com/farhan-syah/unsloth-finetuning.git
cd unsloth-finetuning
bash setup.sh  # Installs dependencies + llama.cpp

# 2. Download your models from HuggingFace
python scripts/push.py --all

# 3. Create YAML config with your HF username
cat > my_config.yaml <<EOF
model:
  base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
  inference_model: unsloth/Llama-3.2-1B-Instruct
  output_name: auto

dataset:
  name: yahma/alpaca-cleaned
  max_samples: 0

output:
  formats:
    - gguf_q4_k_m
    - gguf_q5_k_m
EOF

# 4. Convert to GGUF
python scripts/build.py --config my_config.yaml
```

That's it! Your GGUF files will be in `outputs/[model-name]/gguf/`

---

### Available GGUF Quantizations

Edit the `output.formats` section in your YAML config:

| Format | Size (1B) | Quality | Use Case |
|--------|-----------|---------|----------|
| `gguf_q4_k_m` | ~0.6GB | Good | **Recommended** - best balance |
| `gguf_q5_k_m` | ~0.8GB | Better | Higher quality |
| `gguf_q8_0` | ~1.2GB | Excellent | Near original quality |
| `gguf_f16` | ~2.0GB | Best | Full precision (largest) |

Example for multiple formats:
```yaml
output:
  formats:
    - gguf_q4_k_m
    - gguf_q5_k_m
    - gguf_q8_0
```

---

### Using Your GGUF Model

#### With Ollama:

```bash
cd outputs/Llama-3.2-1B-alpaca-cleaned/gguf
ollama create my-model -f Modelfile
ollama run my-model "Hello!"
```

#### With llama.cpp:

```bash
./llama.cpp/llama-cli \
  -m outputs/model-name/gguf/model.Q4_K_M.gguf \
  -p "Hello!" --temp 0.7
```

## Step 9: Test Your Model (Optional)

Quick test of your fine-tuned model:

In [None]:
from unsloth import FastLanguageModel
import torch

# Load your fine-tuned model
model_path = f"outputs/{model_dir}/lora"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_path,
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)

# Test prompt
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
What is machine learning?

### Response:
"""

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("\n" + "="*50)
print("MODEL RESPONSE:")
print("="*50)
print(response)
print("="*50)

## 🎉 Done!

Your model has been trained and is ready to use!

**Next steps:**
1. Download the model from Google Drive or HuggingFace
2. Use it locally with Ollama or transformers
3. Share it on HuggingFace Hub

**Resources:**
- [Documentation](https://github.com/farhan-syah/unsloth-finetuning/tree/main/docs)
- [Training Guide](https://github.com/farhan-syah/unsloth-finetuning/blob/main/docs/TRAINING.md)
- [FAQ](https://github.com/farhan-syah/unsloth-finetuning/blob/main/docs/FAQ.md)