# Fine-Tuning Pipeline - Google Colab

This notebook fine-tunes **LLaMA 3.1-8B** for emotional support and career guidance.

**Important:** 
- Mount Google Drive for persistent storage
- Set your Hugging Face token
- Request access to `meta-llama/Llama-3.1-8B-Instruct` before training
- **Requires Colab Pro** (16GB+ VRAM recommended)

## Step 1: Setup and Mount Google Drive

In [None]:
# Mount Google Drive for persistent storage
from google.colab import drive
drive.mount('/content/drive')

import os
import sys
from pathlib import Path

# Set project root (adjust if your folder is named differently)
project_root = "/content/drive/MyDrive/Career_guidance"

# Create directory if it doesn't exist
Path(project_root).mkdir(parents=True, exist_ok=True)

# Change to project directory
os.chdir(project_root)
sys.path.insert(0, project_root)

print(f"‚úì Working directory: {os.getcwd()}")
print(f"‚úì Project files: {os.listdir('.')}")

## Step 2: Install Dependencies

In [None]:
# Install PyTorch with CUDA support
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install other dependencies
!pip install -q transformers datasets peft bitsandbytes accelerate huggingface-hub python-dotenv tensorboard

print("‚úì Dependencies installed")

## Step 3: Configure Hugging Face Token

In [None]:
import os

# Set your Hugging Face token
# Get it from: https://huggingface.co/settings/tokens
os.environ['HF_TOKEN'] = 'your_token_here'  # ‚ö†Ô∏è REPLACE WITH YOUR TOKEN

# Verify token is set
if os.environ.get('HF_TOKEN') and os.environ['HF_TOKEN'] != 'your_token_here':
    print("‚úì HF_TOKEN is set")
    from huggingface_hub import login
    login(token=os.environ['HF_TOKEN'])
    print("‚úì Logged in to Hugging Face")
else:
    print("‚ö†Ô∏è  WARNING: Please set your HF_TOKEN before proceeding!")
    print("Get your token from: https://huggingface.co/settings/tokens")

## Step 4: Verify GPU (Important for 8B!)

In [None]:
import torch

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
    
    print(f"‚úì GPU: {gpu_name}")
    print(f"‚úì VRAM: {gpu_memory_gb:.1f} GB")
    print()
    
    # Check if GPU is sufficient for 8B model
    if gpu_memory_gb >= 32:
        print("‚úÖ Excellent! Your GPU has plenty of memory for 8B model.")
    elif gpu_memory_gb >= 16:
        print("‚ö†Ô∏è  Your GPU should work, but may be tight. Monitor for OOM errors.")
        print("   Consider reducing batch size if you encounter memory issues.")
    else:
        print("‚ùå WARNING: Your GPU may not have enough memory for 8B model!")
        print("   Minimum recommended: 16GB VRAM")
        print("   Consider using 1B model instead or upgrading to Colab Pro+")
else:
    print("‚ùå ERROR: No GPU detected!")
    print("   Please enable GPU in Colab: Runtime ‚Üí Change runtime type ‚Üí GPU")

## Step 5: Pre-Flight Check

In [None]:
!python scripts/preflight_check.py --model_size 8B

## Step 6: Prepare Training Data

In [None]:
!python scripts/prepare_data.py

## Step 7: Start Training (8B Model)

**Training will take 4-10 hours depending on your GPU.**

Checkpoints are automatically saved to Google Drive every 500 steps.

In [None]:
!python training/train.py --model_size 8B --output_dir ./outputs

In [None]:
## Step 8: Evaluate Model (After Training)

In [None]:
# Generate test samples and evaluate model performance
!python scripts/evaluate_model.py \
    --model_path ./outputs \
    --base_model_path meta-llama/Llama-3.1-8B-Instruct \
    --model_size 8B

## Step 9: Merge LoRA Adapters (Optional)

Merge LoRA adapters with base model for easier deployment.

In [None]:
# Merge LoRA adapters with base model
# This creates a single model file (easier to use, but larger ~16GB)
!python export/merge_lora.py \
    --base_model_path meta-llama/Llama-3.1-8B-Instruct \
    --lora_adapter_path ./outputs \
    --output_path ./outputs/merged_model \
    --model_size 8B

## Step 10: Push to Hugging Face (Optional)

Upload your model to Hugging Face Hub for sharing or backup.

In [None]:
# Upload model to Hugging Face Hub
# Replace 'your-username/your-model-name' with your repository name
!python export/push_to_huggingface.py \
    --model_path ./outputs/merged_model \
    --repo_id your-username/llama3.1-8b-emotional-career \
    --token $env:HF_TOKEN

## Monitor Training (Optional)

Run these cells in separate tabs to monitor training progress.

In [None]:
# Monitor GPU usage (updates every second)
# Press Stop when done monitoring
!nvidia-smi -l 1

### View Training Logs

In [None]:
# View latest training log
!tail -50 logs/training_*.log

### TensorBoard (Optional)

Visualize training metrics in real-time.

In [None]:
# Load TensorBoard extension
%load_ext tensorboard

# Start TensorBoard
%tensorboard --logdir ./outputs

## Resume Training (If Session Disconnects)

If Colab disconnects, you can resume from the last checkpoint.

In [None]:
# Resume from checkpoint
# Replace 'checkpoint-1000' with your latest checkpoint number
# Check available checkpoints: !ls outputs/

!python training/train.py \
    --model_size 8B \
    --output_dir ./outputs \
    --resume_from_checkpoint ./outputs/checkpoint-1000

## Check Available Checkpoints

List all saved checkpoints to find the latest one.

In [None]:
# List all checkpoints
!ls -lh outputs/checkpoint-*/

## Troubleshooting

Common issues and solutions for 8B model training.

### Out of Memory (OOM) Error

If you get OOM errors during training:

1. **Reduce batch size** - Edit `config/training_config.py`:
   - Set `per_device_train_batch_size = 1`
   - Increase `gradient_accumulation_steps = 16`

2. **Reduce sequence length**:
   - Set `max_seq_length = 1024` or `512`

3. **Use smaller model**: Switch to 1B model instead

4. **Upgrade GPU**: Get Colab Pro+ for A100 (40GB+ VRAM)

In [None]:
### Session Disconnected

If Colab disconnects:
- All checkpoints are saved to Google Drive automatically
- Resume from last checkpoint using Step 11 (Resume Training)
- Check available checkpoints with Step 12

### Slow Training

Training 8B model takes time:
- **T4 (16GB)**: 6-10 hours
- **V100 (32GB)**: 4-6 hours
- **A100 (40GB+)**: 3-5 hours

This is normal! Be patient and let it run.

## 8B Model Configuration

The pipeline automatically configures for 8B:
- **Batch size**: 2 per device
- **Gradient accumulation**: 8 steps
- **Effective batch size**: 16
- **Sequence length**: 1536 tokens
- **LoRA rank**: 32, alpha: 64
- **Learning rate**: 2e-4

These settings are optimized for Colab Pro GPUs.

In [None]:
## Expected Training Times

- **T4 (16GB)**: 6-10 hours
- **V100 (32GB)**: 4-6 hours
- **A100 (40GB+)**: 3-5 hours

Checkpoints save every 500 steps to Google Drive.

## File Locations

All files are automatically saved to Google Drive:
- **Checkpoints**: `outputs/checkpoint-XXX/` (~32MB each)
- **Final model**: `outputs/adapter_model.safetensors` (~32MB)
- **Merged model**: `outputs/merged_model/` (~16GB, after step 9)
- **Logs**: `logs/training_*.log`
- **Evaluations**: `outputs/eval_samples.jsonl`

In [None]:
## Success Checklist

After training completes, you should have:
- [ ] Final LoRA adapters in `outputs/`
- [ ] Training logs in `logs/`
- [ ] Evaluation results (after step 8)
- [ ] (Optional) Merged model (after step 9)
- [ ] (Optional) Model on Hugging Face (after step 10)

**Congratulations!** Your fine-tuned 8B model is ready! üéâ