<a href="https://colab.research.google.com/github/farhan1301/Fine-Tuning-LLM/blob/main/colab/COLAB_QUICKSTART.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üöÄ Complete LLM Fine-tuning Pipeline - Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/farhan1301/Fine-Tuning-LLM/blob/main/colab/COLAB_QUICKSTART.ipynb)

**All-in-one notebook** for fine-tuning Llama 3.2 1B on BRD extraction.

## ‚ö° Requirements
- Google Colab with **T4 GPU** (free tier)
- Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí **T4 GPU**

## ‚è±Ô∏è Timeline
- Setup: 5-10 min
- Data Generation: 1-2 hours (or use pre-generated)
- Training: 1-2 hours on T4 GPU
- **Total: 2-4 hours** (vs 12-24 hours on CPU!)

## üìã Before Starting
1. Enable GPU runtime
2. Get API keys:
   - Hugging Face: https://huggingface.co/settings/tokens
   - Anthropic: https://console.anthropic.com/
3. Accept Llama 3.2 license: https://huggingface.co/meta-llama/Llama-3.2-1B

---
# Part 1: Setup (5 minutes)
---

## Check GPU

In [1]:
import torch

if not torch.cuda.is_available():
    raise RuntimeError("‚ö†Ô∏è No GPU! Go to: Runtime ‚Üí Change runtime type ‚Üí T4 GPU")

print("‚úì GPU Available:", torch.cuda.get_device_name(0))
print("‚úì GPU Memory:", torch.cuda.get_device_properties(0).total_memory / 1e9, "GB")

‚úì GPU Available: Tesla T4
‚úì GPU Memory: 15.828320256 GB


## Mount Google Drive

In [None]:
from google.colab import drive
import os

drive.mount('/content/drive')

# Project directory in Google Drive
PROJECT_DIR = '/content/drive/MyDrive/LLM_Finetuning'
os.makedirs(f"{PROJECT_DIR}/data", exist_ok=True)
os.makedirs(f"{PROJECT_DIR}/models", exist_ok=True)
os.chdir(PROJECT_DIR)

print(f"‚úì Working directory: {PROJECT_DIR}")

## Install Packages

In [None]:
%%capture
!pip install -q -U transformers peft trl bitsandbytes accelerate
!pip install -q -U datasets pydantic anthropic gradio
!pip install -q 'numpy<2.0'

import transformers, peft, trl
print(f"‚úì Transformers: {transformers.__version__}")
print(f"‚úì PEFT: {peft.__version__}")
print(f"‚úì TRL: {trl.__version__}")

## Configure API Keys

In [None]:
from google.colab import userdata
from getpass import getpass
import os

# Try Colab secrets first, fallback to manual input
try:
    HF_TOKEN = userdata.get('HUGGINGFACE_TOKEN')
    ANTHROPIC_KEY = userdata.get('ANTHROPIC_API_KEY')
    print("‚úì Loaded from Colab secrets")
except:
    print("Enter your API keys:")
    HF_TOKEN = getpass("Hugging Face token: ")
    ANTHROPIC_KEY = getpass("Anthropic API key: ")

os.environ['HF_TOKEN'] = HF_TOKEN
os.environ['ANTHROPIC_API_KEY'] = ANTHROPIC_KEY

# Login to HF
from huggingface_hub import login
login(token=HF_TOKEN)
print("‚úì Logged in to Hugging Face")

---
# Part 2: Data Generation (1-2 hours)
---

**Option A:** Generate 1000 BRDs (1-2 hours, costs ~$3-5 API credits)

**Option B:** Use smaller dataset for testing (100 samples, 10-15 min, ~$0.50)

In [None]:
# Choose dataset size
NUM_SAMPLES = 100  # Change to 1000 for full dataset

print(f"Will generate {NUM_SAMPLES} BRD samples")
print(f"Estimated time: {NUM_SAMPLES * 0.002:.0f} minutes")
print(f"Estimated cost: ${NUM_SAMPLES * 0.005:.2f}")

## Generate Synthetic BRDs

In [None]:
import anthropic
import json
import random
from tqdm import tqdm
import time

client = anthropic.Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])

PROJECT_TYPES = [
    "Web Application", "Mobile Application", "REST API", "Data Pipeline",
    "ML Model", "E-commerce Platform", "CRM System", "Dashboard"
]

INDUSTRIES = [
    "Financial Services", "Healthcare", "E-commerce", "Education",
    "Real Estate", "Retail", "SaaS", "Manufacturing"
]

COMPLEXITY = [
    {"level": "Simple", "hours_range": (80, 300), "rate": 75},
    {"level": "Medium", "hours_range": (300, 800), "rate": 100},
    {"level": "Complex", "hours_range": (800, 2000), "rate": 125},
]

def generate_brd(project_type, industry, complexity, team_size):
    effort_hours = random.randint(*complexity["hours_range"])
    hourly_rate = complexity["rate"] + random.randint(-15, 15)
    cost_usd = effort_hours * hourly_rate
    hours_per_week = team_size * 40
    timeline_weeks = max(1, round(effort_hours / hours_per_week))

    prompt = f"""Generate a realistic 2-3 paragraph Business Requirements Document for a {complexity['level'].lower()} {project_type.lower()} in {industry}.

Include:
- Project overview and objectives
- Technical scope and features
- Resource needs ({team_size} team members)
- Timeline: approximately {timeline_weeks} weeks
- Effort: approximately {effort_hours} hours total
- Budget: approximately ${cost_usd:,}

Write in professional prose, not template format. Use natural variations in terminology."""

    try:
        message = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1000,
            messages=[{"role": "user", "content": prompt}]
        )
        return {
            "brd_text": message.content[0].text,
            "labels": {
                "effort_hours": float(effort_hours),
                "timeline_weeks": int(timeline_weeks),
                "cost_usd": float(cost_usd)
            }
        }
    except Exception as e:
        print(f"Error: {e}")
        return None

# Generate dataset
dataset = []
for i in tqdm(range(NUM_SAMPLES)):
    brd = generate_brd(
        random.choice(PROJECT_TYPES),
        random.choice(INDUSTRIES),
        random.choice(COMPLEXITY),
        random.choice([1,2,3,4,5])
    )
    if brd:
        brd["id"] = i
        dataset.append(brd)
    time.sleep(0.5)

print(f"\n‚úì Generated {len(dataset)} BRDs")

# Save
with open(f"{PROJECT_DIR}/data/dataset.json", "w") as f:
    json.dump(dataset, f, indent=2)
print(f"‚úì Saved to {PROJECT_DIR}/data/dataset.json")

## Format Data for Training

In [None]:
from sklearn.model_selection import train_test_split

# Format for instruction tuning
formatted_data = []
for sample in dataset:
    text = f"""### Instruction:
Extract the project estimation fields from the following Business Requirements Document.
Return a JSON object with these exact fields: effort_hours (number), timeline_weeks (number), cost_usd (number).
Return ONLY the JSON object, no additional text.

### Input:
{sample['brd_text']}

### Output:
{json.dumps(sample['labels'])}"""
    formatted_data.append({"text": text, "id": sample["id"]})

# Split
train, temp = train_test_split(formatted_data, test_size=0.2, random_state=42)
val, test = train_test_split(temp, test_size=0.5, random_state=42)

# Save
for name, data in [("train", train), ("val", val), ("test", test)]:
    with open(f"{PROJECT_DIR}/data/{name}.json", "w") as f:
        json.dump(data, f)

print(f"‚úì Train: {len(train)} | Val: {len(val)} | Test: {len(test)}")

---
# Part 3: Model Training (1-2 hours on T4 GPU)
---

## Load Model with 4-bit Quantization

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer
from datasets import load_dataset

# 4-bit quantization (GPU optimized)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

print("Loading Llama 3.2 1B...")
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-1B",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print(f"‚úì Model loaded: {model.get_memory_footprint() / 1e9:.2f} GB")

## Configure LoRA

In [None]:
# Prepare model
model = prepare_model_for_kbit_training(model)

# LoRA config (GPU optimized - higher rank than CPU)
lora_config = LoraConfig(
    r=16,  # Higher rank for GPU (was 8 for CPU)
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
print("‚úì LoRA configured")

## Load Training Data

In [None]:
dataset = load_dataset("json", data_files={
    "train": f"{PROJECT_DIR}/data/train.json",
    "validation": f"{PROJECT_DIR}/data/val.json",
})

print(f"‚úì Train: {len(dataset['train'])} | Val: {len(dataset['validation'])}")

## Configure Training (GPU Optimized)

In [None]:
OUTPUT_DIR = f"{PROJECT_DIR}/models/llama-3.2-1b-brd"

# GPU-optimized training args
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=3,
    per_device_train_batch_size=4,  # Larger than CPU (was 1)
    gradient_accumulation_steps=4,   # Smaller (was 32)
    learning_rate=2e-4,
    warmup_steps=100,
    logging_steps=10,
    save_steps=50,
    eval_steps=50,
    evaluation_strategy="steps",
    fp16=True,  # GPU supports fp16!
    gradient_checkpointing=True,
    optim="adamw_torch",
    lr_scheduler_type="cosine",
    save_total_limit=2,
    load_best_model_at_end=True,
)

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    tokenizer=tokenizer,
    args=training_args,
    max_seq_length=2048,
    dataset_text_field="text",
    packing=False,
)

print("‚úì Trainer initialized")
print(f"  Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"  Total steps: {trainer.args.max_steps}")
print(f"  Estimated time: 1-2 hours on T4 GPU")

## Start Training!

In [None]:
print("üöÄ Starting training...\n")

# Train
trainer.train()

# Save
trainer.save_model(f"{OUTPUT_DIR}/final")
tokenizer.save_pretrained(f"{OUTPUT_DIR}/final")

print("\n‚úì Training complete!")
print(f"  Model saved to: {OUTPUT_DIR}/final")

---
# Part 4: Test the Model
---

In [None]:
test_brd = """Business Requirements Document
Project: E-commerce Mobile App

We need a cross-platform mobile app for our e-commerce business with product browsing,
shopping cart, and secure checkout. The project requires 3 developers for 12 weeks.
Total estimated effort is 720 hours with a budget of $90,000."""

prompt = f"""### Instruction:
Extract the project estimation fields from the following Business Requirements Document.
Return a JSON object with these exact fields: effort_hours (number), timeline_weeks (number), cost_usd (number).
Return ONLY the JSON object, no additional text.

### Input:
{test_brd}

### Output:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1, do_sample=True)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Extracted Output:")
print("="*60)
print(result.split("### Output:")[-1].strip())
print("="*60)
print("\n‚úì Model working!")

---
# üéâ Complete!
---

### What You've Built:
- ‚úì Generated synthetic training data
- ‚úì Fine-tuned Llama 3.2 1B with QLoRA
- ‚úì Trained on T4 GPU in ~1-2 hours
- ‚úì Model extracts structured JSON from BRDs

### Your Model:
- Location: `Google Drive/LLM_Finetuning/models/llama-3.2-1b-brd/final`
- Size: ~10-50 MB (LoRA adapters only)
- Can be loaded and used anytime

### Next Steps:
1. **Download model** from Google Drive
2. **Share on Hugging Face** for portfolio
3. **Use in production** with Pydantic validation
4. **Create demo** with Gradio

### Load Model Later:
```python
from transformers import AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
model = PeftModel.from_pretrained(base, "path/to/final")
```

**Congratulations! üöÄ You've fine-tuned an LLM!**