# V1.1 Llama 3.2  — MetaMathQA fine-tuning (Colab-ready)
This notebook prepares MetaMathQA data and demonstrates a PEFT/LoRA fine-tuning workflow suitable for Llama-family causal models. Run in Google Colab with a GPU runtime for best results.

## 1. Overview
This notebook contains: (1) environment and Colab quickstart, (2) data preparation for MetaMathQA, (3) example training using Hugging Face Transformers + PEFT (LoRA), and (4) evaluation examples.

Intended usage: open in Colab (Runtime → Change runtime type → GPU), run the setup cell, prepare data, then run the training cells.

## 2. Environment & Colab quickstart
If you run this notebook locally without a CUDA GPU, training will fail or be extremely slow — prefer Colab or other GPU hosts.

Open in Colab: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/karukan/llamaFinetuning/blob/main/llama3.2-finetune-metamathqa.ipynb)

Quick steps: set Runtime→Change runtime type→GPU, run the setup cell (mount Drive if you want checkpoints persisted).

In [None]:
import torch
print('PyTorch version:', torch.__version__)
print('CUDA available?', torch.cuda.is_available())
print('CUDA devices:', torch.cuda.device_count())
if torch.cuda.is_available():
    print('Current device name:', torch.cuda.get_device_name(0))

## 4. Data Preparation
We load `meta-math/MetaMathQA` via the `datasets` library, clean the text, and format prompt-completion pairs. The target format used below is a JSONL where each line is {"prompt":..., "completion":...} suitable for many LLM fine-tuning tools.
Option 2 justification: `MetaMathQA` is chosen because it contains math reasoning Q&A examples that can help the model specialize in formal mathematical problem phrasing and solution generation—useful for benchmarking math reasoning improvements after fine-tuning.

In [None]:
from datasets import load_dataset
import json, os

# Load dataset from the hub. If you have it locally, adapt the path.
dataset_name = 'meta-math/MetaMathQA'
print('Loading dataset:', dataset_name)
try:
    ds = load_dataset(dataset_name)
except Exception as e:
    print('Failed to load directly. Check network/access or replace with local path. Error:', e)
    ds = None

# Inspect if loaded
if ds is not None:
    print(ds)
    # show a few examples (train split may be named 'train')
    for k in ds.keys():
        print('Split', k, '->', ds[k].num_rows)
    print('Example row (first train if exists):')
    split = list(ds.keys())[0]
    print(ds[split][0])

### 4b. Data cleaning and formatting helpers

In [None]:
import re

def clean_text(s):
    if s is None:
        return ''
    # Basic cleanup: normalize whitespace, remove odd control chars
    s = s.replace(chr(9), ' ').replace(chr(13), ' ').replace(chr(10), ' ')
    s = ' '.join(s.split())
    return s

def format_prompt_completion(example):
    # Adapt field names to the dataset schema. Common fields: 'question' and 'answer' or similar.
    # We'll try to handle a few variants robustly.
    q = example.get('question') or example.get('problem') or example.get('prompt') or ''
    a = example.get('answer') or example.get('solution') or example.get('target') or ''
    q = clean_text(q)
    a = clean_text(a)
    # Compose the prompt and completion; ensure completion contains an end token or newline.
    prompt = f'Question: {q}\nAnswer:'
    completion = ' ' + a + ' '  # leading space helps some tokenizers' alignment
    return {'prompt': prompt, 'completion': completion}

In [None]:
# 4c. Create train/validation split and save JSONL files
import random
from pathlib import Path

out_dir = Path('./data')
out_dir.mkdir(parents=True, exist_ok=True)

def prepare_and_save(dset, split_name='train', val_frac=0.05, seed=42, max_items=None):
    # flatten list of formatted items
    items = []
    for i, ex in enumerate(dset):
        if max_items and i >= max_items:
            break
        formatted = format_prompt_completion(ex)
        if formatted['prompt'].strip() and formatted['completion'].strip():
            items.append(formatted)
    print(f'Prepared {len(items)} cleaned examples from {split_name}')
    random.Random(seed).shuffle(items)
    cut = int(len(items) * (1 - val_frac))
    train_items = items[:cut]
    val_items = items[cut:]
    # Save as JSONL
    train_path = out_dir / f'{split_name}_train.jsonl'
    val_path = out_dir / f'{split_name}_val.jsonl'
    with open(train_path, 'w', encoding='utf-8') as f1, open(val_path, 'w', encoding='utf-8') as f2:
        for it in train_items:
            f1.write(json.dumps(it, ensure_ascii=False) + '\n')
        for it in val_items:
            f2.write(json.dumps(it, ensure_ascii=False) + '\n')
    print('Saved', train_path, 'and', val_path)
    return train_path, val_path

# Run preparation if dataset loaded
if ds is not None:
    # Use first available split (often 'train') and limit items for quick tests
    first_split = list(ds.keys())[0]
    train_file, val_file = prepare_and_save(ds[first_split], split_name=first_split, val_frac=0.05, max_items=5000)
else:
    print('Dataset not loaded; please load dataset manually or provide local files.')

### 5B. Hugging Face Transformers + PEFT (LoRA) — runnable training pipeline
This is a concrete training implementation that uses PEFT LoRA; it's widely supported and works well for parameter-efficient fine-tuning. It also demonstrates hyperparameter setup, checkpointing, and early stopping.

In [None]:
from huggingface_hub import login

print('='*70)
print('Hugging Face Authentication')
print('='*70)
print()
print('Paste your HF token when prompted.')
print('If you don\'t have a token:')
print('  1. Go to https://huggingface.co/settings/tokens')
print('  2. Create a new token with "Read" permissions')
print('  3. Paste it below')
print()

try:
    login()
    print('✓ Successfully authenticated with Hugging Face')
except Exception as e:
    print(f'❌ Authentication failed: {e}')
    print('Please check your token and try again.')
    raise

print('='*70)
print('Hugging Face Authentication')
print('='*70)
print()
print('Paste your HF token when prompted.')
print('If you don\'t have a token:')
print('  1. Go to https://huggingface.co/settings/tokens')
print('  2. Create a new token with "Read" permissions')
print('  3. Paste it below')
print()

try:
    login()
    print('✓ Successfully authenticated with Hugging Face')
except Exception as e:
    print(f'❌ Authentication failed: {e}')
    print('Please check your token and try again.')
    raise

    import google.colab
    IN_COLAB = True
except Exception:
    IN_COLAB = False

if IN_COLAB:
    print('Installing dependencies in Colab (this may take a minute)...')
    get_ipython().system('pip install -q peft accelerate bitsandbytes evaluate transformers datasets')
    print('Installation complete.')

from transformers import (AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import torch
from datasets import load_dataset, Dataset, DatasetDict

# Try importing evaluate; if it fails, try installing it again
try:
    import evaluate
    print('✓ evaluate module imported successfully')
except ImportError:
    print('Installing evaluate...')
    if IN_COLAB:
        get_ipython().system('pip install -q evaluate')
        import evaluate
        print('✓ evaluate installed and imported')
    else:
        print('⚠ evaluate not available. Install with: pip install evaluate')
        evaluate = None

In [3]:
from huggingface_hub import login
login()  # Paste HF token

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## 5. Hugging Face Authentication — Required for Llama 3.2 1B

**Important:** Llama 3.2 1B is a **gated model** and requires authentication with a Hugging Face token.

**Setup steps:**

1. **Accept the model license** on Hugging Face:
   - Go to [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
   - Click "Access repository" and accept the license

2. **Create a Hugging Face token:**
   - Go to [Hugging Face Settings → Access Tokens](https://huggingface.co/settings/tokens)
   - Click "New token", choose "Read" permissions
   - Copy the token

3. **Authenticate in this notebook:**
   - Run the cell below and paste your token when prompted
   - The token will be securely stored locally and used for all HF model loading

**After authentication**, you can proceed with tokenization and training.


In [None]:
# Tokenization and dataset creation
# Updated for Llama 3.2 1B model

from pathlib import Path

model_name_or_path = 'meta-llama/Llama-3.2-1B'

# Ensure model_name_or_path is set before running this cell
if '<LLAMA' in model_name_or_path or not model_name_or_path or model_name_or_path.startswith('<'):
    print('⚠ STOP: Set model_name_or_path first!')
    print('  Example: model_name_or_path = "meta-llama/Llama-2-7b-hf"')
    print('  Or for Llama 3.2 1B: model_name_or_path = "meta-llama/Llama-3.2-1B"')
else:
    print(f'Using model: {model_name_or_path}')
    
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True, trust_remote_code=True)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    print(f'Tokenizer loaded. Vocab size: {len(tokenizer)}')
    
    # Load JSONL datasets
    train_jsonl = Path('./data/train_train.jsonl')
    val_jsonl = Path('./data/train_val.jsonl')
    
    if not train_jsonl.exists() or not val_jsonl.exists():
        print(f'⚠ JSONL files not found. Expected:')
        print(f'  - {train_jsonl}')
        print(f'  - {val_jsonl}')
        print('Run the data preparation cells first.')
    else:
        def load_jsonl_to_dataset(path):
            import json
            items = []
            with open(path, 'r', encoding='utf-8') as f:
                for line in f:
                    if line.strip():
                        items.append(json.loads(line))
            return Dataset.from_list(items)
        
        print('Loading JSONL datasets...')
        train_ds = load_jsonl_to_dataset(train_jsonl)
        val_ds = load_jsonl_to_dataset(val_jsonl)
        print(f'Train: {len(train_ds)} examples, Val: {len(val_ds)} examples')
        
        def tokenize_fn(batch):
            # Concatenate prompt + completion for causal LM training
            texts = [x['prompt'] + x['completion'] for x in batch]
            out = tokenizer(texts, truncation=True, max_length=max_length, padding='max_length', return_tensors=None)
            out['labels'] = out['input_ids'].copy()  # For causal LM, labels = input_ids
            return out
        
        print('Tokenizing datasets (this may take a few minutes)...')
        tokenized_train = train_ds.map(tokenize_fn, batched=True, remove_columns=train_ds.column_names, batch_size=32, num_proc=4)
        tokenized_val = val_ds.map(tokenize_fn, batched=True, remove_columns=val_ds.column_names, batch_size=32, num_proc=4)
        
        print(f'Tokenized train: {len(tokenized_train)} samples')
        print(f'Tokenized val: {len(tokenized_val)} samples')
        
        # Data collator for causal LM (pads to same length within batch, no MLM)
        data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
        
        print('✓ Tokenization complete. Ready for training.')

### 5C. Tokenization setup — Llama 3.2 1B

Before running this cell, set `model_name_or_path` to your Llama 3.2 1B model path.

**To use Llama 3.2 1B:**
1. Accept the model license on Hugging Face: [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
2. In the previous cell, change:
   ```python
   model_name_or_path = 'meta-llama/Llama-3.2-1B'
   ```
3. If using Colab or restricted access, authenticate with Hugging Face:
   ```python
   from huggingface_hub import login
   login()  # paste your token when prompted
   ```
4. Then run this cell to tokenize the dataset.

The cell will:
- Load the tokenizer and set up pad tokens
- Load the prepared JSONL train/val data
- Tokenize everything to `max_length=512`
- Create a data collator ready for training

## 5D. Model Loading — Llama 3.2 1B

Load the base Llama 3.2 1B model with 4-bit quantization for memory efficiency. This cell requires:
- ✓ HF token authentication (from cell above)
- ✓ Tokenizer loaded (from tokenization cell)
- ✓ Llama 3.2 1B license accepted on Hugging Face

The model will be loaded with **4-bit quantization** to fit in GPU VRAM (tested on Google Colab T4/A100).


In [None]:
from transformers import BitsAndBytesConfig
import torch

print('='*70)
print('Loading Llama 3.2 1B Model')
print('='*70)

# Configure 4-bit quantization for memory efficiency
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

try:
    print(f'\nLoading model: meta-llama/Llama-3.2-1B')
    print('(This may take 1-2 minutes on first load...)\n')
    
    model = AutoModelForCausalLM.from_pretrained(
        'meta-llama/Llama-3.2-1B',
        quantization_config=bnb_config,
        device_map='auto',
        trust_remote_code=True,
        attn_implementation="flash_attention_2",  # Use Flash Attention if available
    )
    
    print('✓ Model loaded successfully!')
    print(f'Model dtype: {model.dtype}')
    print(f'Model device: {next(model.parameters()).device}')
    print(f'\nModel is ready for training with LoRA adapters.')
    
except Exception as e:
    print(f'❌ Failed to load model: {e}')
    print()
    print('Troubleshooting:')
    print('1. Ensure you accepted the license at: https://huggingface.co/meta-llama/Llama-3.2-1B')
    print('2. Verify your HF token has read access: https://huggingface.co/settings/tokens')
    print('3. If you\'re not authenticated, run the login cell above')
    print()
    raise


#### Training arguments, early stopping and checkpointing
We'll configure Trainer/TrainingArguments and add an EarlyStoppingCallback to stop when validation loss plateaus.

In [None]:
# Example TrainingArguments and EarlyStoppingCallback usage (uncomment to run)
# from transformers import EarlyStoppingCallback
# training_args = TrainingArguments(
#     output_dir=output_dir,
#     per_device_train_batch_size=per_device_train_batch_size,
#     per_device_eval_batch_size=per_device_eval_batch_size,
#     evaluation_strategy=evaluation_strategy,
#     save_strategy=save_strategy,
#     num_train_epochs=num_train_epochs,
#     learning_rate=learning_rate,
#     logging_steps=logging_steps,
#     fp16=fp16,
#     gradient_accumulation_steps=gradient_accumulation_steps,
#     save_total_limit=3,
#     load_best_model_at_end=True,
# )
#
# trainer = Trainer(
#     model=model,
#     args=training_args,
#     train_dataset=tokenized_train,
#     eval_dataset=tokenized_val,
#     data_collator=data_collator,
# )
#
# trainer.add_callback(EarlyStoppingCallback(early_stopping_patience=2))
# trainer.train()
print('Training arguments and trainer skeleton provided. Run when datasets and model are loaded.')

## 6. Evaluation and Analysis
We implement: (a) a simple exact-match style metric (normalized whitespace and case-insensitive), (b) generation examples before/after fine-tuning, and (c) a short analysis template.

In [None]:
# Colab setup: mount Drive (optional) and install dependencies
# Run this cell in Google Colab (it will skip installs when not in Colab)
try:
    import google.colab
    IN_COLAB = True
except Exception:
    IN_COLAB = False

if IN_COLAB:
    from google.colab import drive
    print('Mounting Google Drive...')
    drive.mount('/content/drive')
    print('Upgrading pip and installing dependencies (this may take a few minutes)')
    # Core dependencies used by this notebook; adjust as needed
    !pip install -q --upgrade pip
    !pip install -q git+https://github.com/unslothai/unsloth.git
    !pip install -q transformers datasets accelerate peft bitsandbytes evaluate sentencepiece safetensors
    # Optional: install huggingface hub to access gated weights if needed
    !pip install -q huggingface_hub
    import torch
    print('Install finished. PyTorch:', torch.__version__, 'CUDA available:', torch.cuda.is_available())
else:
    print('Not running in Colab. To use GPU, open this notebook in Google Colab (Runtime -> Change runtime type -> GPU).')