# Unsloth Fine-tuning on Google Colab

Train and fine-tune LLMs with Unsloth on Google Colab's free GPU.

**Before you start:**
1. Runtime â†’ Change runtime type â†’ GPU â†’ T4 GPU (free tier)
2. Make a copy of this notebook to your Google Drive

**Total time:** ~10-15 minutes (setup + training)

## Step 1: Setup Environment

Install dependencies (takes ~5 minutes)

In [None]:
%%capture
# Install dependencies in the correct order
!pip install --upgrade pip

# Core ML frameworks
!pip install "trl>=0.12.0" "peft>=0.13.0" "bitsandbytes>=0.45.0" "transformers[sentencepiece]>=4.46.0"

# PyTorch
!pip install torch==2.8.0 torchvision --index-url https://download.pytorch.org/whl/cu121

# Unsloth
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# xformers
!pip install --no-deps "xformers>=0.0.32,<0.0.33" --index-url https://download.pytorch.org/whl/cu121

# Additional dependencies
!pip install datasets huggingface_hub accelerate sentencepiece protobuf python-dotenv

print("âœ… Installation complete!")

## Step 2: Clone Repository

In [None]:
# Clone the repository
!git clone https://github.com/farhan-syah/unsloth-finetuning.git
%cd unsloth-finetuning

print("âœ… Repository cloned!")

## Step 3: Configure Training

Edit these settings for your training run:

In [None]:
# ============================================
# CONFIGURATION - Edit these settings
# ============================================

# Model Selection (choose based on use case)
LORA_BASE_MODEL = "unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit"  # 2B model, fits T4 GPU
# LORA_BASE_MODEL = "unsloth/Qwen3-4B-unsloth-bnb-4bit"  # 4B model (needs A100)

# Dataset
DATASET_NAME = "yahma/alpaca-cleaned"  # Change to your dataset

# Training Mode
# Quick test (recommended for first run)
MAX_STEPS = 50              # Train for 50 steps only (~2 minutes)
DATASET_MAX_SAMPLES = 100   # Use 100 samples only

# Full training (uncomment to use)
# MAX_STEPS = 0               # Train for full epochs
# DATASET_MAX_SAMPLES = 0     # Use all samples

# Training Parameters
MAX_SEQ_LENGTH = 2048
LORA_RANK = 16              # Use 64 for production
LORA_ALPHA = 32             # Use 128 for production
BATCH_SIZE = 2
GRADIENT_ACCUMULATION_STEPS = 2
LEARNING_RATE = 2e-4
NUM_TRAIN_EPOCHS = 1
WARMUP_STEPS = 2

# Output Formats
OUTPUT_FORMATS = "gguf_q4_k_m"  # Create GGUF for Ollama
# OUTPUT_FORMATS = ""  # Empty = no GGUF conversion (faster)

# Output naming
OUTPUT_MODEL_NAME = "auto"  # Auto-generate name

# Author
AUTHOR_NAME = "Your Name"  # Your name for model card

print("âœ… Configuration set!")
print(f"Model: {LORA_BASE_MODEL}")
print(f"Dataset: {DATASET_NAME}")
print(f"Training: {MAX_STEPS} steps, {DATASET_MAX_SAMPLES} samples")

## Step 4: Create .env File

In [None]:
# Create .env file with configuration
env_content = f"""
# Model
LORA_BASE_MODEL={LORA_BASE_MODEL}
INFERENCE_BASE_MODEL=
OUTPUT_MODEL_NAME={OUTPUT_MODEL_NAME}

# Dataset
DATASET_NAME={DATASET_NAME}
DATASET_MAX_SAMPLES={DATASET_MAX_SAMPLES}
MAX_STEPS={MAX_STEPS}

# Training
MAX_SEQ_LENGTH={MAX_SEQ_LENGTH}
LORA_RANK={LORA_RANK}
LORA_ALPHA={LORA_ALPHA}
BATCH_SIZE={BATCH_SIZE}
GRADIENT_ACCUMULATION_STEPS={GRADIENT_ACCUMULATION_STEPS}
LEARNING_RATE={LEARNING_RATE}
NUM_TRAIN_EPOCHS={NUM_TRAIN_EPOCHS}
WARMUP_STEPS={WARMUP_STEPS}
PACKING=false

# Optimization
USE_GRADIENT_CHECKPOINTING=true
MAX_GRAD_NORM=1.0
OPTIM=adamw_8bit

# Logging
LOGGING_STEPS=5
SAVE_STEPS=25
SAVE_TOTAL_LIMIT=2
SAVE_ONLY_FINAL=true

# Monitoring
WANDB_ENABLED=false

# Output
OUTPUT_FORMATS={OUTPUT_FORMATS}
OUTPUT_DIR_BASE=./outputs
PREPROCESSED_DATA_DIR=./data/preprocessed
CACHE_DIR=./cache

# HuggingFace
PUSH_TO_HUB=false
HF_USERNAME=your_username
HF_MODEL_NAME=auto
HF_TOKEN=

# Author
AUTHOR_NAME={AUTHOR_NAME}

# Advanced
SEED=3407
FORCE_PREPROCESS=false
FORCE_RETRAIN=true
FORCE_REBUILD=true
CHECK_SEQ_LENGTH=false
"""

with open('.env', 'w') as f:
    f.write(env_content)

print("âœ… .env file created!")

## Step 5: Train Model

This will take ~2 minutes for quick test, or hours for full training.

In [None]:
# Run training
!python train.py

## Step 6: Build/Convert Model (Optional)

Convert to merged model and GGUF format (takes ~5 minutes).

In [None]:
# Build merged model and GGUF
# Note: GGUF conversion requires llama.cpp which may not work on Colab
# Set OUTPUT_FORMATS="" in config above to skip GGUF conversion

!python build.py

## Step 7: Download Your Model

Download the trained model to your local machine:

In [None]:
# List output directories
!ls -lh outputs/

# Find your model directory
import os
output_dirs = [d for d in os.listdir('outputs') if os.path.isdir(os.path.join('outputs', d))]
if output_dirs:
    model_dir = output_dirs[0]
    print(f"\nâœ… Your model is in: outputs/{model_dir}")
    print(f"\nContents:")
    !ls -lh outputs/{model_dir}

In [None]:
# Option 1: Download via Colab Files (for small files)
from google.colab import files

# Download LoRA adapters (small, ~80MB)
!zip -r lora_adapters.zip outputs/*/lora/
files.download('lora_adapters.zip')

In [None]:
# Option 2: Upload to Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Copy to Google Drive
!mkdir -p /content/drive/MyDrive/unsloth-models
!cp -r outputs/* /content/drive/MyDrive/unsloth-models/

print("âœ… Model copied to Google Drive: MyDrive/unsloth-models/")

In [None]:
# Option 3: Push to HuggingFace Hub
# Uncomment and run this if you want to upload to HuggingFace

# !pip install huggingface_hub
# from huggingface_hub import login, HfApi
# 
# # Login to HuggingFace
# login()  # This will prompt for your token
# 
# # Upload model
# api = HfApi()
# model_path = f"outputs/{model_dir}/merged_16bit"
# repo_id = "your-username/your-model-name"  # Change this!
# 
# api.upload_folder(
#     folder_path=model_path,
#     repo_id=repo_id,
#     repo_type="model"
# )
# 
# print(f"âœ… Model uploaded to: https://huggingface.co/{repo_id}")

## Step 8: Test Your Model (Optional)

Quick test of your fine-tuned model:

In [None]:
from unsloth import FastLanguageModel
import torch

# Load your fine-tuned model
model_path = f"outputs/{model_dir}/lora"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_path,
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)

# Test prompt
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
What is machine learning?

### Response:
"""

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("\n" + "="*50)
print("MODEL RESPONSE:")
print("="*50)
print(response)
print("="*50)

## ðŸŽ‰ Done!

Your model has been trained and is ready to use!

**Next steps:**
1. Download the model from Google Drive or HuggingFace
2. Use it locally with Ollama or transformers
3. Share it on HuggingFace Hub

**Resources:**
- [Documentation](https://github.com/farhan-syah/unsloth-finetuning/tree/main/docs)
- [Training Guide](https://github.com/farhan-syah/unsloth-finetuning/blob/main/docs/TRAINING.md)
- [FAQ](https://github.com/farhan-syah/unsloth-finetuning/blob/main/docs/FAQ.md)