
# Chapter 7: LLM Fine-Tuning and Customization

This notebook covers:
- Strategies for fine-tuning language models
- PEFT methods: LoRA and QLoRA
- Dataset formatting for supervised fine-tuning (SFT)
- Using Hugging Face's Trainer for fine-tuning
- Evaluating fine-tuned models

## Learning Objectives

- Understand the differences between full fine-tuning and PEFT
- Prepare prompt-response dataset for instruction tuning
- Fine-tune a model using LoRA with Hugging Face
- Evaluate generation quality and loss metrics



## Dataset Preparation for Supervised Fine-Tuning (SFT)

The dataset should be in a prompt-response format, typically as JSON or CSV.


In [None]:

sample_data = [
    {"prompt": "What is generative AI?", "response": "Generative AI refers to models that create new content such as text, images, or code."},
    {"prompt": "Explain LoRA.", "response": "LoRA is a parameter-efficient fine-tuning method that adds trainable low-rank matrices to attention layers."}
]

import json
with open("sft_data.json", "w") as f:
    json.dump(sample_data, f, indent=2)

print("Sample dataset saved as sft_data.json")



## Fine-Tuning using Hugging Face Trainer

We load the model and tokenizer, prepare the dataset, define training args, and launch training.


In [None]:

from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForLanguageModeling

# Load tokenizer and model
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Load data (simplified)
from datasets import Dataset
data = Dataset.from_list(sample_data)

# Tokenize
def tokenize(example):
    return tokenizer(f"### Prompt: {example['prompt']}\n### Response: {example['response']}", truncation=True)

tokenized_data = data.map(tokenize)

# Setup training
args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,
    num_train_epochs=1,
    logging_steps=1,
    save_strategy="no"
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
trainer = Trainer(model=model, args=args, train_dataset=tokenized_data, data_collator=data_collator)

trainer.train()



## LoRA (Low-Rank Adaptation)

LoRA adapts only low-rank matrices in attention layers. Useful for large model tuning on small hardware.


In [None]:

from peft import get_peft_model, LoraConfig, TaskType

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["c_attn"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, config)
model.print_trainable_parameters()



## Evaluation Metrics

Use metrics like:
- Perplexity
- BLEU, ROUGE for text outputs
- Custom scoring functions for relevance and fluency


In [None]:

prompt = "What is generative AI?"
input_ids = tokenizer(f"### Prompt: {prompt}\n### Response:", return_tensors="pt").input_ids

output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))



## Exercises

1. Convert your dataset into JSON format for instruction tuning.
2. Try LoRA fine-tuning using your custom dataset.
3. Compare full fine-tuning vs LoRA in terms of speed and accuracy.
4. Evaluate generated answers using BLEU or ROUGE.

## References

- PEFT: https://github.com/huggingface/peft
- Hugging Face Trainer: https://huggingface.co/docs/transformers/main_classes/trainer
- LoRA Paper: https://arxiv.org/abs/2106.09685
