# Fine-tuning Llama 3.2 1B Base Model to Instruction Model

This notebook demonstrates how to fine-tune the Llama 3.2 1B base model into an instruction-following model using the Alpaca dataset. We'll use:
- Hugging Face Transformers for the model
- PEFT (Parameter-Efficient Fine-Tuning) with LoRA
- TRL (Transformer Reinforcement Learning) for SFT (Supervised Fine-Tuning)
- Alpaca dataset for instruction examples

## Environment Setup

First, let's install the necessary packages:

In [None]:
!pip install transformers datasets peft trl accelerate bitsandbytes wandb sentencepiece

## Imports and Setup

In [19]:
import os
import torch
import pandas as pd
import numpy as np
import time
from datasets import load_dataset, Dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging
)
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig
from IPython.display import display, HTML

# Set seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Set logging verbosity
logging.set_verbosity_info()

# Load the Llama 3.2 1B base model
model_id = "meta-llama/Llama-3.2-1B"
output_dir = "./llama-3.2-1b-alpaca-lora"

# In the cell where hf_token is defined
hf_token_file = 'hf_token.txt'
with open(hf_token_file, 'r') as file:
    hf_token = file.read().strip()

Using device: cuda


## Load the Alpaca Dataset

We'll load the Alpaca dataset which contains instruction-following examples

In [34]:
# Load the Alpaca dataset from Hugging Face
# Note: You can also use the JSON version at: https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json
# dataset = load_dataset("tatsu-lab/alpaca", split="train")

# Load our own dataset
dataset = load_dataset("json", data_files={"train": "./dataset_self_instruction.json"}, split="train")

print(f"Dataset loaded with {len(dataset)} examples")
dataset[0]  # Display first example

Dataset loaded with 100 examples


{'instruction': 'Based on the information given, provide a comprehensive answer to the question.',
 'input': "Here's an example input following your specifications:\n\n[The Amazon rainforest, spanning across nine countries in South America, is the world’s largest tropical rainforest. It’s renowned for its incredible biodiversity, housing an estimated 10% of the world’s known species. Deforestation, primarily driven by cattle ranching and logging, poses a significant threat to its delicate ecosystem and contributes to global climate change.]\n\nQuestion: What is a major threat to the Amazon rainforest?",
 'output': 'Deforestation, primarily driven by cattle ranching and logging, poses a significant threat to the Amazon rainforest.',
 'task_type': 'qa'}

## Prompt Template

Define a template for formatting our instruction inputs. This is crucial for teaching the model to respond to instructions in a consistent format.

In [35]:
def format_prompt(example):
    """Format the instruction and input into a prompt."""
    instruction = example["instruction"]
    input_text = example["input"]
    output_text = example["output"]
    if input_text:
        prompt = f"""
        <|begin_of_text|>
            <|start_header_id|>system<|end_header_id|>
                {instruction}<|eot_id|>
    
            <|start_header_id|>user<|end_header_id|>
                {input_text}<|eot_id|>
    
            <|start_header_id|>assistant<|end_header_id|>
                {output_text}<|eot_id|>
        <|end_of_text|>"""
    else:
        prompt = f"""
        <|begin_of_text|>
            <|start_header_id|>system<|end_header_id|>
                You are a helpful assistant.<|eot_id|>
    
            <|start_header_id|>user<|end_header_id|>
                {instruction}<|eot_id|>
    
            <|start_header_id|>assistant<|end_header_id|>
                {output_text}<|eot_id|>
        <|end_of_text|>"""       
    # For training, we need both the prompt and the expected output
    example["prompt"] = prompt
    
    return example

In [36]:
# Apply the prompt formatting
formatted_dataset = dataset.map(format_prompt)
# formatted_dataset = dataset.map(lambda x: format_prompt(x))

np.random.seed(int(time.time()))    # Attempt to generate input content
testitem = np.random.randint(0, len(formatted_dataset))

# Display an example of formatted input
print("Formatted Prompt Example:")
print(formatted_dataset[testitem]["prompt"])
# print(formatted_dataset)

Map: 100%|██████████| 100/100 [00:00<00:00, 6250.36 examples/s]

Formatted Prompt Example:

        <|begin_of_text|>
            <|start_header_id|>system<|end_header_id|>
                Assess the sentiment of the following piece of text. Select one of the three sentiment categories (positive, negative, neutral) and provide a short justification for your selection.<|eot_id|>
    
            <|start_header_id|>user<|end_header_id|>
                I'm absolutely thrilled with my recent investment – the returns have exceeded all expectations! A truly fantastic outcome.<|eot_id|>
    
            <|start_header_id|>assistant<|end_header_id|>
                {"Sentiment": "positive", "Reason": "The text expresses strong positive emotions using words like 'thrilled,' 'fantastic,' and describes returns as 'exceeding all expectations,' indicating a highly favorable experience."}<|eot_id|>
        <|end_of_text|>





## Load the Llama 3.2 1B Base Model

We'll use 4-bit quantization to reduce memory requirements.

In [37]:
# Configuration for 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

In [86]:
# Note: If using Meta's model, you need to have accepted their license and have an access token
# Alternatively, you can use models from other providers that offer Llama 3.2 weights

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    token=hf_token
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    padding_side="right",
    token=hf_token
)

# Make sure the tokenizer has pad_token set properly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

loading configuration file config.json from cache at C:\Users\mrfu\.cache\huggingface\hub\models--meta-llama--Llama-3.2-1B\snapshots\4e20de362430cd3b72f300e6b0f18e50e7166e08\config.json
Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 64,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 16,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.3

## Configure LoRA for Parameter-Efficient Fine-Tuning

LoRA (Low-Rank Adaptation) allows us to fine-tune the model with much fewer parameters.

In [87]:
# Define LoRA configuration
peft_config = LoraConfig(
    r=32,  # Rank of the update matrices
    lora_alpha=32,  # Parameter for scaling
    lora_dropout=0.05,  # Dropout probability for LoRA layers
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"
                   ]  # Modules to apply LoRA to
)

# Prepare the model with LoRA
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

## Set Up Training Arguments

In [88]:
# Define training arguments
per_device_train_batch_size = 16
gradient_accumulation_steps = 1
optim = "paged_adamw_32bit"
learning_rate = 2e-4
lr_scheduler_type = "cosine"
max_steps = 1000
warmup_ratio = 0.03
max_grad_norm = 0.3
group_by_length = True

# Set up the training arguments
training_args = SFTConfig(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    learning_rate=learning_rate,
    lr_scheduler_type=lr_scheduler_type,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    max_grad_norm=max_grad_norm,
    fp16=True,
    logging_steps=10,
    save_strategy="steps",
    save_steps=250,
    group_by_length=group_by_length,
    report_to="none",  # Remove or change to "none" if you don't want to use Weights & Biases
    dataset_text_field="prompt",
    max_seq_length=1024,
    packing=False,  # Set to True for more efficient training if data format allows
)

PyTorch: setting up devices


## Create SFT Trainer

TRL's SFTTrainer makes it easy to fine-tune using instruction datasets.

In [89]:
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=formatted_dataset,
    peft_config=peft_config,
    processing_class=tokenizer
)

max_steps is given, it will override any value given in num_train_epochs
Using auto half precision backend
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


## Start Training

In [90]:
# Start the training process
trainer.train()

The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: prompt, input, task_type, output, instruction. If prompt, input, task_type, output, instruction are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 100
  Num Epochs = 143
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 1,000
  Number of trainable parameters = 22,544,384
  return fn(*args, **kwargs)


Step,Training Loss
10,3.5692
20,2.5047
30,1.4298
40,1.0214
50,0.8165
60,0.6698
70,0.5855
80,0.534
90,0.5125
100,0.4957


Saving model checkpoint to ./llama-3.2-1b-alpaca-lora\checkpoint-250

Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-1B/resolve/main/config.json.
Access to model meta-llama/Llama-3.2-1B is restricted. You must have access to it and be authenticated to access it. Please log in. - silently ignoring the lookup for the file config.json in meta-llama/Llama-3.2-1B.
tokenizer config file saved in ./llama-3.2-1b-alpaca-lora\checkpoint-250\tokenizer_config.json
Special tokens file saved in ./llama-3.2-1b-alpaca-lora\checkpoint-250\special_tokens_map.json
  return fn(*args, **kwargs)
Saving model checkpoint to ./llama-3.2-1b-alpaca-lora\checkpoint-500

Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-1B/resolve/main/config.json.
Access to model meta-llama/Llama-3.2-1B is restricted. You must have access to it and be authenticated to access it. Please log in. - silently ignoring the lookup for the file config.json in meta-llama/Llama-3.2-1

TrainOutput(global_step=1000, training_loss=0.495949960231781, metrics={'train_runtime': 393.971, 'train_samples_per_second': 40.612, 'train_steps_per_second': 2.538, 'total_flos': 1.5085165049266176e+16, 'train_loss': 0.495949960231781})

## Save the Fine-tuned Model

In [91]:
# Save the trained model
trainer.model.save_pretrained(f"{output_dir}/final_model")
tokenizer.save_pretrained(f"{output_dir}/final_model")


Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-1B/resolve/main/config.json.
Access to model meta-llama/Llama-3.2-1B is restricted. You must have access to it and be authenticated to access it. Please log in. - silently ignoring the lookup for the file config.json in meta-llama/Llama-3.2-1B.
tokenizer config file saved in ./llama-3.2-1b-alpaca-lora/final_model\tokenizer_config.json
Special tokens file saved in ./llama-3.2-1b-alpaca-lora/final_model\special_tokens_map.json


('./llama-3.2-1b-alpaca-lora/final_model\\tokenizer_config.json',
 './llama-3.2-1b-alpaca-lora/final_model\\special_tokens_map.json',
 './llama-3.2-1b-alpaca-lora/final_model\\tokenizer.json')

## Test the Fine-tuned Model

In [92]:
# Load the fine-tuned model
# For inference, we load the base model and then apply the LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
    token=hf_token
)

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    padding_side="right",
    token=hf_token
)

# Make sure the tokenizer has pad_token set properly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, f"{output_dir}/final_model")
# model = model.merge_and_unload()  # Converts it to a standard model

loading configuration file config.json from cache at C:\Users\mrfu\.cache\huggingface\hub\models--meta-llama--Llama-3.2-1B\snapshots\4e20de362430cd3b72f300e6b0f18e50e7166e08\config.json
Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 64,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 16,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "tie_word_embeddings": true,
  "torch_dtype": "float16",
  "transformers_version": "4.51.3"

In [93]:
# Test the model with a few examples
test_examples = [
    {
        "instruction": "Analyze the sentiment of the following review and determine whether it is positive, negative, or neutral. Provide your reasoning.",
        "input": "The food at this restaurant was incredibly delicious, but the service was terrible—we waited an hour to be served.",
    },
    {
        "instruction": "Determine the sentiment tendency (positive, negative, or neutral) of the following text.",
        "input": "The visual effects of the movie were stunning, but the plot was slow and lacked creativity.",
    },
    {
        "instruction": "Answer the question based on the following paragraph.",
        "input": "The solar system consists of the Sun and the celestial bodies that orbit it, including planets, moons, asteroids, and comets. There are eight major planets in order from the Sun: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Among them, Jupiter is the largest, and Earth is the only one known to support life.\n\nQuestion: Which planet is the largest in the solar system?",
    },
    {
        "instruction": "Read the following and answer the question.",
        "input": "Coffee is a beverage made from coffee beans and is known for its stimulating effect, mainly due to its caffeine content. Originating from the Ethiopian highlands, coffee later spread around the world. Today, the leading coffee-producing countries include Brazil, Vietnam, and Colombia. Common brewing methods include drip, espresso, French press, and cold brew.\n\nQuestion: Why does coffee have a stimulating effect?",
    },
    {
        "instruction": "Based on the text provided, categorize the sentiment as positive, negative, or neutral. Consider the overall impression and any explicit or implicit emotional cues.",
        "input": "Just had the best coffee date with an old friend! It was so wonderful to catch up and laugh together. Feeling so grateful for these connections.",
    },
    {
        "instruction": "Assess the overall sentiment of the following text excerpt. Classify it as either positive, negative, or neutral, and briefly explain your classification.",
        "input": "Ugh, another pop quiz? Seriously? I'm so overwhelmed with homework and barely sleeping. This school is killing me!",
    },
]


for example in test_examples:
    # Format the prompt
    prompt = f"""<|begin_of_text|>
    <|start_header_id|>system<|end_header_id|>
    {example["instruction"]}<|eot_id|>
    
    <|start_header_id|>user<|end_header_id|>
    {example["input"]}<|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>
    """
    # Encode the prompt
    inputs = tokenizer(prompt, return_tensors="pt").to(base_model.device)
    # Get the length of the input to exclude it from the output
    input_length = inputs.input_ids.shape[1]

    # Generate
    outputs = base_model.generate(
        inputs.input_ids,
        max_length=512,  # Adjust as needed
        temperature=0.1,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
        
    # Decode only the newly generated tokens (exclude the input)
    response = tokenizer.decode(outputs[0][input_length:], skip_special_tokens=True)

    output = f"""
    <span style='color: white; background-color: purple'>&nbsp;Instruction&nbsp;</span><br>
    <pre>{example["instruction"]}</pre>
    <span style='color: black; background-color: yellow'>&nbsp;Input&nbsp;</span><br>
    <pre>{example["input"]}</pre>
    <span style='color: black; background-color: cyan'>&nbsp;Output&nbsp;</span><br>
    <pre>{response}</pre>
    """
    display(HTML(output))


## (Optional) Merge LoRA Weights with Base Model for Easier Deployment

In [None]:
# Merge the LoRA weights with the base model
merged_model = model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained(f"{output_dir}/merged_model")
tokenizer.save_pretrained(f"{output_dir}/merged_model")

## (Optional) Upload to Hugging Face Hub

If you want to share your model with the community:

In [None]:
from huggingface_hub import HfApi

# Set your Hugging Face credentials
hf_token = "your_huggingface_token"  # Replace with your token
api = HfApi(token=hf_token)

# Set your model repository name
repo_name = "your-username/llama-3.2-1b-alpaca-instruct"  # Replace with your desired repo name

# Push to hub
model.push_to_hub(repo_name, use_auth_token=hf_token)
tokenizer.push_to_hub(repo_name, use_auth_token=hf_token)