# Fine-tuning Llama 3.2 1B Base Model to Instruction Model

This notebook demonstrates how to fine-tune the Llama 3.2 1B base model into an instruction-following model using the Alpaca dataset. We'll use:
- Hugging Face Transformers for the model
- PEFT (Parameter-Efficient Fine-Tuning) with LoRA
- TRL (Transformer Reinforcement Learning) for SFT (Supervised Fine-Tuning)
- Alpaca dataset for instruction examples

## Environment Setup

First, let's install the necessary packages:

In [None]:
!pip install transformers datasets peft trl accelerate bitsandbytes wandb sentencepiece

## Imports and Setup

In [None]:
import os
import torch
import pandas as pd
import numpy as np
from datasets import load_dataset, Dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging
)
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig

# Set seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Set logging verbosity
logging.set_verbosity_info()

hf_token = ""

## Load the Alpaca Dataset

We'll load the Alpaca dataset which contains instruction-following examples

In [None]:
# Load the Alpaca dataset from Hugging Face
# Note: You can also use the JSON version at: https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json
alpaca_dataset = load_dataset("tatsu-lab/alpaca")
print(f"Dataset loaded with {len(alpaca_dataset['train'])} examples")
alpaca_dataset['train'][0]  # Display first example

## Prompt Template

Define a template for formatting our instruction inputs. This is crucial for teaching the model to respond to instructions in a consistent format.

In [None]:
def format_prompt(example):
    """Format the instruction and input into a prompt."""
    instruction = example["instruction"]
    input_text = example["input"]
    output_text = example["output"]
    
    # Construct the prompt based on whether input is empty or not
    if input_text:
        prompt = f"""<|system|>
You are a helpful assistant.
<|user|>
{instruction}

{input_text}
<|assistant|>
"""
    else:
        prompt = f"""<|system|>
You are a helpful assistant.
<|user|>
{instruction}
<|assistant|>
"""
    
    # For training, we need both the prompt and the expected output
    example["prompt"] = prompt
    example["completion"] = output_text
    
    return example

In [None]:
# Apply the prompt formatting
formatted_dataset = alpaca_dataset["train"].map(format_prompt)

# Display an example of formatted input
print("Formatted Prompt Example:")
print(formatted_dataset[0]["prompt"])
print("\nExpected Response:")
print(formatted_dataset[0]["completion"])

## Load the Llama 3.2 1B Base Model

We'll use 4-bit quantization to reduce memory requirements.

In [None]:
# Configuration for 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

In [None]:
# Load the Llama 3.2 1B base model
model_id = "meta-llama/Llama-3.2-1B"

# Note: If using Meta's model, you need to have accepted their license and have an access token
# Alternatively, you can use models from other providers that offer Llama 3.2 weights

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    # use_auth_token=True  # Use your HF token for gated models
    token=hf_token
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    padding_side="right",
    # use_auth_token=True  # Use your HF token for gated models
    token=hf_token
)

# Make sure the tokenizer has pad_token set properly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

## Configure LoRA for Parameter-Efficient Fine-Tuning

LoRA (Low-Rank Adaptation) allows us to fine-tune the model with much fewer parameters.

In [None]:
# Define LoRA configuration
peft_config = LoraConfig(
    r=8,  # Rank of the update matrices
    lora_alpha=16,  # Parameter for scaling
    lora_dropout=0.05,  # Dropout probability for LoRA layers
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"
                   ]  # Modules to apply LoRA to
)

# Prepare the model with LoRA
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

## Set Up Training Arguments

In [None]:
# Define training arguments
output_dir = "./llama-3.2-1b-alpaca-lora"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
learning_rate = 2e-4
lr_scheduler_type = "cosine"
max_steps = 100
warmup_ratio = 0.03
max_grad_norm = 0.3
group_by_length = True

# Set up the training arguments
training_args = SFTConfig(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    learning_rate=learning_rate,
    lr_scheduler_type=lr_scheduler_type,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    max_grad_norm=max_grad_norm,
    fp16=True,
    logging_steps=10,
    save_strategy="steps",
    save_steps=250,
    group_by_length=group_by_length,
    report_to="none",  # Remove or change to "none" if you don't want to use Weights & Biases
    dataset_text_field="prompt",
    max_seq_length=2048,
    packing=False,  # Set to True for more efficient training if data format allows
)

## Create SFT Trainer

TRL's SFTTrainer makes it easy to fine-tune using instruction datasets.

In [None]:
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=formatted_dataset,
    peft_config=peft_config,
    processing_class=tokenizer
)

## Start Training

In [None]:
# Start the training process
trainer.train()

## Save the Fine-tuned Model

In [None]:
# Save the trained model
trainer.model.save_pretrained(f"{output_dir}/final_model")
tokenizer.save_pretrained(f"{output_dir}/final_model")

## Test the Fine-tuned Model

In [None]:
# Load the fine-tuned model
# For inference, we load the base model and then apply the LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
    token=hf_token
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=True,
    token=hf_token
)

# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, f"{output_dir}/final_model")

In [None]:
# Test the model with a few examples
test_examples = [
    "Explain the concept of machine learning to a 10-year-old.",
    "What are three ways to improve your productivity when working from home?",
    "Write a short poem about autumn leaves."
]

# Create a text generation pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,
    temperature=0.7,
    top_p=0.9,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id
)

for example in test_examples:
    # Format the prompt
    prompt = f"""<|system|>
You are a helpful assistant.
<|user|>
{example}
<|assistant|>
"""
    
    print(f"\n\nPrompt: {example}")
    print("\nResponse:")
    response = generator(prompt)[0]["generated_text"]
    # Extract only the assistant's response
    assistant_response = response.split("<|assistant|>")[1].strip()
    print(assistant_response)

## (Optional) Merge LoRA Weights with Base Model for Easier Deployment

In [None]:
# Merge the LoRA weights with the base model
merged_model = model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained(f"{output_dir}/merged_model")
tokenizer.save_pretrained(f"{output_dir}/merged_model")

## (Optional) Upload to Hugging Face Hub

If you want to share your model with the community:

In [None]:
from huggingface_hub import HfApi

# Set your Hugging Face credentials
hf_token = "your_huggingface_token"  # Replace with your token
api = HfApi(token=hf_token)

# Set your model repository name
repo_name = "your-username/llama-3.2-1b-alpaca-instruct"  # Replace with your desired repo name

# Push to hub
model.push_to_hub(repo_name, use_auth_token=hf_token)
tokenizer.push_to_hub(repo_name, use_auth_token=hf_token)