### Parameter Efficient Fine Tuning (PEFT)

Fine tuning all the parameters in a LLM can be costly interms of computation and time. We know LLM consists of lots  of parameters. E.G. FLAN-T5 have 247577856 parameters. Tuning all of them sometimes is not feasible. Luckily PEFT can be a saviour in that case. PEFT do not train all the parameters, rather it freeze most of the original model's parameters and train only small number of them. And then we add the newly trained adapter to the original model. (e.g. in this notebook we train only 1.41 percent of original model's parameter)

It is shown that while training with PEFT models catastrophic forgetting can be prevented (as base model is kept intact), and it there is not that much lackings in performance compare to the full fine tuning.

## In this notebook I will cover,

- loading Flan-T5
- Tune Flan-T5 for generating a book chapter

The model will instructed to act like an author and generate book chapter with specific guidelines.


In [None]:
!pip install datasets

!pip install evaluate

!pip install 'transformers[torch]'

!pip install peft

Collecting datasets
  Downloading datasets-2.15.0-py3-none-any.whl (521 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow-hotfix (from datasets)
  Downloading pyarrow_hotfix-0.6-py3-none-any.whl (7.9 kB)
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pyarrow-hotfix, dill, multiprocess, datasets
Successfully installed datasets-2.15.0 dill-0.3.7 multiprocess-0.70.15 pyarrow-hotfix-0.6
Collecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━

In [None]:
import torch
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import time
import evaluate
import pandas as pd
import numpy as np

In [None]:
model_name='google/flan-t5-base'
flan_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(flan_model))

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

trainable model parameters: 247577856
all model parameters: 247577856
percentage of trainable model parameters: 100.00%


Below is the prompt to Flan-T5 before fine tuning for generating a book chapter

we can see flan-t5 generate a very generic stream of words that donot carry that much meaning

In [None]:
prompt = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Instruction:\nImagine you are a prolific author tasked with writing a textbook. You are working on writing a textbook chapter titled "9th Grade American Literature: Analyzing the Civil War Era through the Lens of Uncle Tom’s Cabin with Digital Humanities Tools".\n\nYour **first task** is to write six paragraphs from a page in the middle of the chapter. Your **last task** is to discuss real-world applications and implications of the knowledge gained from the chapter.\n\nNotes:\n- Your target audience consists of experts with great familiarity.\n- Compare and contrast different theories, methodologies, or viewpoints within the subject.\nAim for a well-rounded and insightful response, keeping in mind the diversity in audience knowledge.\n\n\n### Response:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
flan_model_outputs = flan_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=800, num_beams=1))
flan_model_text_output = tokenizer.decode(flan_model_outputs[0], skip_special_tokens=True)
flan_model_text_output

"I'm sure you'll find a topic that interests you."

The follwing code will load the dataset. I tokenized the dataset and loaded from my google drive to save time while loading the notebook again and again while training. The original dataset is from Huggingface, https://huggingface.co/datasets/SciPhi/textbooks-are-all-you-need-lite

**How to tokenize dataset?**
code snippet is given below,
```
def tokenize(example):    
    prompt = [text for text in example["formatted_prompt"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["completion"], padding="max_length", truncation=True, return_tensors="pt").input_ids
    return example

tokenized_dataset = dataset.map(tokenize, batched=True)

tokenized_dataset = tokenized_dataset.remove_columns(['formatted_prompt', 'completion', 'first_task', 'second_task', 'last_task', 'notes', 'title', 'model', 'temperature'])```


In [None]:
dataset_name = "/content/drive/My Drive/textbook_need_dataset"
dataset = load_dataset(dataset_name)
dataset = dataset.filter(lambda example, index: index % 200 == 0, with_indices=True)
dataset

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Filter:   0%|          | 0/681845 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 3410
    })
})

## Parameter Efficient Fine Tuning

For peft most of the cases LoRA is mostly used. LoRA stands for Low Rank Adaptation. Following code section is for configuring the peft model.

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

peft_model = get_peft_model(flan_model, lora_config)
print(print_number_of_trainable_model_parameters(peft_model))

trainable model parameters: 3538944
all model parameters: 251116800
percentage of trainable model parameters: 1.41%


In [None]:
output_dir = './peft-training'

peft_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=2e-3, # Higher learning rate
    num_train_epochs=5,

)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=dataset["train"],
)


peft_trainer.train()

peft_model_path="/content/drive/My Drive/peft"
peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)



Step,Training Loss
500,2.3074
1000,2.0186
1500,1.9443
2000,1.8926


('/content/drive/My Drive/peft/tokenizer_config.json',
 '/content/drive/My Drive/peft/special_tokens_map.json',
 '/content/drive/My Drive/peft/tokenizer.json')

### Attaching peft adapter with the flan-t5 model

In [None]:
from peft import PeftModel, PeftConfig
peft_model = PeftModel.from_pretrained(flan_model,
                                       '/content/drive/My Drive/peft',
                                       torch_dtype=torch.bfloat16,
                                       is_trainable=False)

print(print_number_of_trainable_model_parameters(peft_model))

trainable model parameters: 0
all model parameters: 251116800
percentage of trainable model parameters: 0.00%


### Let's see if the output changes,

In [None]:
prompt = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Instruction:\nImagine you are a prolific author tasked with writing a textbook. You are working on writing a textbook chapter titled "9th Grade American Literature: Analyzing the Civil War Era through the Lens of Uncle Tom’s Cabin with Digital Humanities Tools".\n\nYour **first task** is to write six paragraphs from a page in the middle of the chapter. Your **last task** is to discuss real-world applications and implications of the knowledge gained from the chapter.\n\nNotes:\n- Your target audience consists of experts with great familiarity.\n- Compare and contrast different theories, methodologies, or viewpoints within the subject.\nAim for a well-rounded and insightful response, keeping in mind the diversity in audience knowledge.\n\n\n### Response:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

# Example: move the input tensor to the GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
input_ids = input_ids.to(device)

peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=1800, num_beams=3))
peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)
peft_model_text_output

'In the 9th Grade American Literature: Analyzing the Civil War Era through the Lens of Uncle Tom’s Cabin with Digital Humanities Tools chapter, we will delve into the fascinating world of Uncle Tom’s Cabin, a haunting tale that has been a staple of American literature for centuries. The story of Uncle Tom’s Cabin, a tale about a young man who grew up in a small cabin in the middle of the Civil War, is a testament to the enduring influence of the Civil War on American literature. The story of Uncle Tom’s Cabin, a tale about a young man who grew up in a small cabin in the middle of the Civil War, is a testament to the enduring influence of the Civil War on American literature. The story of Uncle Tom’s Cabin, a tale about a young man who grew up in a small cabin in the middle of the Civil War, is a testament to the enduring influence of the Civil War on American literature. The story of Uncle Tom’s Cabin, a tale about a young man who grew up in a small cabin in the middle of the Civil War

In [None]:
prompt = 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Instruction:\nImagine you are a prolific author tasked with writing a textbook. You are working on writing a textbook chapter titled "The obvious benefits of not arguing with your boss".\n\nYour **first task** is to write six paragraphs from a page in the middle of the chapter. Your **last task** is to discuss real-world applications and implications of the knowledge gained from the chapter.\n\nNotes:\n- Your target audience consists of early phase google employees.\n- Compare and contrast different theories, methodologies, or viewpoints within the subject.\nAim for a well-rounded and insightful response, keeping in mind the diversity in audience knowledge.\n\n\n### Response:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

# Example: move the input tensor to the GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
input_ids = input_ids.to(device)

peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=1800))
peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)
peft_model_text_output

'The obvious benefits of not arguing with your boss are numerous. This chapter will explore the importance of not arguing with your boss, examining the various theories, methodologies, and viewpoints that have been developed over the years. The first paragraph will discuss the importance of not arguing with your boss, examining the various theories and methodologies that have been developed over the years. The second paragraph will discuss the importance of not arguing with your boss, examining the various theories and methodologies that have been developed over the years. The third paragraph will discuss the importance of not arguing with your boss, examining the various theories and methodologies that have been developed over the years. The fourth paragraph will discuss the importance of not arguing with your boss, examining the various theories and methodologies that have been developed over the years. The fifth paragraph will discuss the importance of not arguing with your boss, ex

# Discussion

we can see the model now understand it need to generate a chapter for a book. so it donot stops generating single line only.

**Weakness** is the model repeating same sentence after a while.

Look, we fine-tuned peft for 1000th fraction of the original dataset. And is trained with a single google colab gpu.

Rather than the result, I hope we learned how the method works and enjoyed the journey.

**Thank you
Connect me on LinkedIn, [Abrar Fahim](https://www.linkedin.com/in/abrar-fahim/)**