## Learning Objectives

At the end of the experiment, you will be able to:

* understand the working of a parameter efficinet finetuning method - LoRA
* fine tune a T5 model, `facebook/bart-large-cnn`, on the SAMSum dataset for summerization using LoRA
* push the finetuned LoRA adapter to HuggingFace model hub
* load the finetuned adapter from hub for inference

## Dataset Description

The **[SAMSum](https://huggingface.co/datasets/samsum) dataset** contains about 16k messenger-like conversations with summaries. Conversations were created and written down by linguists fluent in English. Linguists were asked to create conversations similar to those they write on a daily basis, reflecting the proportion of topics of their real-life messenger convesations. The style and register are diversified - conversations could be informal, semi-formal or formal, they may contain slang words, emoticons and typos. Then, **the conversations were annotated with summaries**. It was assumed that summaries should be a concise brief of what people talked about in the conversation in third person. The SAMSum dataset was prepared by Samsung R&D Institute Poland and is distributed for research purposes.

Data Splits:
- train: 14732
- val: 818
- test: 819

Data Fields:

- ***dialogue***: text of dialogue
- ***summary***: human written summary of the dialogue
- ***id***: unique id of an example

<br>

**Example:**

\{
> '**id**': '13818513',

>'**summary**': 'Amanda baked cookies and will bring Jerry some tomorrow.',

>'**dialogue**': "Amanda: I baked cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)"

\}

## Information

### **Parameter-Efficient Fine-Tuning (PEFT) methods**

Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.

PEFT is integrated with Transformers for easy model training and inference, and Accelerate for distributed training and inference for really big models.

[PEFT](https://github.com/huggingface/peft) is also a new open-source library from Hugging Face to enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications ***without*** fine-tuning all the model's parameters.

PEFT currently includes techniques for:

- **LoRA:** Low-Rank Adaptation of Large Language Models
- **Prefix Tuning:** P-Tuning v2
- **P-Tuning**
- **Prompt Tuning**


### **LoRA**

It is a technique that accelerates the fine-tuning of large models while consuming less memory.

To make fine-tuning more efficient, LoRA's approach is to represent the weight updates with two smaller matrices (called update matrices) through low-rank decomposition.

A and B are update matrices in below figure.

<center>
<img src="https://datascienceimages.s3.eu-north-1.amazonaws.com/PEFT_for_Text_Summary/lora_diagram.png?response-content-disposition=inline&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEHQaCmV1LW5vcnRoLTEiRzBFAiEAmOZQhT%2BYM40tIrgSCC0juQ3jYb2AAM%2F1Fbz6dD8McLICIHg%2Fvrzjk8lvADAhmWpZrYL9TF3PwvkzKpdWA8hZeRQoKu0CCM3%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEQABoMOTc1MDUwMDY4NjU4IgyeGfj2LlL0RNsVYPMqwQJM4d9h4CmJwEyTBC7qZEzc29kl0a%2FrmErIOHAztB%2FrvmsLwfZSviafkN0S0Rgnb9ZUlQLlOLgmJAh%2Boo5wqCAD6Vph5HKDP0UmiithH%2BlXOUIiDE51ykTUEA3w1iSSHMyOs7vDBsLp%2FMFpvpCG3V2XnlQ5Iqgo422ee763aDs%2FzuanCRZpfGDaLTo%2BgTWY0vQntjWJILYrZeOw0tpLpzZArag7KROagc0J8JjBvK0M%2Fz02ApRdrd4sspTDVFraDlGzFJekxjfCDEkf0gSQy7gutoZ2hujhd%2BonPc2Sob988wEY1BUTJWathwBoXphrbMZIS6LhUp1rTnDl8w7hv54XZ4ORo9w4oXVsZdG%2FDbp2dUX6yVHFY3mjMGdU6fEQ1j2AlzySCTm4T%2FzJpXst4CisnbG6opNd6Os2b78VkYfQnKAwtdPmsQY6swKV5d%2BjoEU9poWR0HUudhbWdbIOfv1xKG64htIRW3NpNgi0OonJi5zOPTw5aWoAvi5hbnA7akONMJtwKRTgl9t30qtpFA8DwvmNL%2BUt%2FNsayqUy%2BrKPtD%2FtfVYyu82iXyxz4EyCSXPTgcCREW61a0z%2F8HtRH76JayEUAc7PAypjS861xn7SFWDzldlYWE8H2bDvjOQdf82VBKm67cP5CPWaxKpPkpTJFAR82hOQeFordG1wctY1YO5AOZnqv8Wd%2B7Tz9rQYNYk9xUihZZHgrvYzMuQWKcj%2By2eMBR7uWxP5LR48KBnL6edR4ZCv5H5LYCGjIREzJAImRO9qbDptQLHW%2B943UkCxOSqNdCIQjgRoWe84M93lB6rTQjLowcqWYU9YJDY9V73Jfw8Qz3mx6HxtnhdV&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240507T043903Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIA6GBMDKKZK2QSB44A%2F20240507%2Feu-north-1%2Fs3%2Faws4_request&X-Amz-Signature=052039b26657eff00ad146debec5328fcb11461004d149cd3fad231d8fbb6f27" width=900px>
</center>
<br>

- These new matrices can be trained to **adapt to the new data** while keeping the overall number of changes low.
- The original weight matrix **remains frozen** and doesn't receive any further adjustments.
- To produce the final results, both the original and the adapted weights are **combined**.

### Setup Steps:

### Install required dependencies

In [None]:
!pip -q install transformers datasets evaluate transformers[torch]

# A dependecy required for loading SAMSum dataset
!pip -q install py7zr

!pip -q install peft

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━

### Import required packages

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import TrainingArguments, Trainer

from peft import LoraConfig, get_peft_model, TaskType
from peft import PeftModel, PeftConfig

import warnings
warnings.filterwarnings('ignore')

### **Load Model & Tokenizer**

In [None]:
# Load model from HF Model Hub

"""
BART HAS 400M PARAMS: https://github.com/facebookresearch/fairseq/tree/main/examples/bart
Look into Model card - 400 Million parameters
"""

checkpoint = "facebook/bart-large-cnn"                # username/repo-name

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Load model
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

### **Load Dataset**

In [None]:
# Load SAMSum dataset
dataset = load_dataset("samsum")
dataset

Downloading data:   0%|          | 0.00/6.06M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/347k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/335k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/14732 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/819 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/818 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 14732
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 819
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 818
    })
})

### **Prepare the Dataset**

In [None]:
# Define function to prepare dataset

def tokenize_inputs(example):

    start_prompt = "Summarize the following conversation.\n\n"
    end_prompt = "\n\nSummary: "
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example['dialogue']]
    example['input_ids'] = tokenizer(prompt, padding='max_length', truncation=True, return_tensors='pt').input_ids             # 'pt' for pytorch tensor
    example['labels'] = tokenizer(example['summary'], padding='max_length', truncation=True, return_tensors='pt').input_ids

    return example

In [None]:
# Prepare dataset
tokenizer.pad_token = tokenizer.eos_token
tokenized_datasets = dataset.map(tokenize_inputs, batched=True)       # using batched=True for Fast tokenizer implementation

# Remove columns/keys that are not needed further
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'dialogue', 'summary'])

Map:   0%|          | 0/14732 [00:00<?, ? examples/s]

Map:   0%|          | 0/819 [00:00<?, ? examples/s]

Map:   0%|          | 0/818 [00:00<?, ? examples/s]

In [None]:
# Shortening the data: Just picking row index divisible by 100
# For learning purpose! It will reduce the compute resource requirement and training time

tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 100 == 0, with_indices=True)

Filter:   0%|          | 0/14732 [00:00<?, ? examples/s]

Filter:   0%|          | 0/819 [00:00<?, ? examples/s]

Filter:   0%|          | 0/818 [00:00<?, ? examples/s]

In [None]:
print(tokenized_datasets['train'].shape)
print(tokenized_datasets['validation'].shape)
print(tokenized_datasets['test'].shape)

(148, 2)
(9, 2)
(9, 2)


### **Create PEFT Model using LoRA**

To fine-tune a model using LoRA, you need to:

- Instantiate a base model, here it is `facebook/bart-large-cnn`
- Create a configuration (`LoraConfig`) where you define LoRA-specific parameters
- Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`
- Train the `PeftModel` as you normally would train the base model

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

# LoRA-specific parameters
lora_config = LoraConfig(
    r=32,                       # 8, 16, 32    # the rank of the update matrices
    lora_alpha=32,                             # LoRA scaling factor
    lora_dropout=0.05,
    bias='none',                               # specifies if the bias parameters should be trained
    task_type=TaskType.SEQ_2_SEQ_LM,           # telling lora that this is a sq2seq modeling task
)

In [None]:
# Trainable PEFTModel
peft_model = get_peft_model(model, peft_config=lora_config)

### **Train PEFT Model**

In [None]:
from transformers import TrainingArguments, Trainer

peft_training_args = TrainingArguments(
    output_dir="./mode_tuned_peft",           # local directory
    learning_rate=1e-5,
    num_train_epochs=5,      ## for 5 epochs took around 10 minutes
    weight_decay=0.01,
    auto_find_batch_size=True,
    evaluation_strategy='epoch',
    logging_steps=10
)

peft_trainer = Trainer(
    model=peft_model,                    # model to be fine-tuned
    args=peft_training_args,                       # training arguments
    train_dataset=tokenized_datasets['train'],          # train data to use
    eval_dataset=tokenized_datasets['validation']       # validation data to use
)

In [None]:
# Number of trainable parameters
peft_model.print_trainable_parameters()

trainable params: 4,718,592 || all params: 411,009,024 || trainable%: 1.1480507055728295


From above we can see, here we are only training 1.14% of the parameters of the model.

In [None]:
# Training
peft_trainer.train()

Epoch,Training Loss,Validation Loss
1,4.9549,3.675203
2,3.2709,3.017449
3,2.402,2.368091
4,2.1928,1.890325
5,1.9013,1.736078


TrainOutput(global_step=185, training_loss=3.1953776694632867, metrics={'train_runtime': 557.9387, 'train_samples_per_second': 1.326, 'train_steps_per_second': 0.332, 'total_flos': 1625110767206400.0, 'train_loss': 3.1953776694632867, 'epoch': 5.0})

### **Save PEFT Adapter**

**Push your Peft adapter to Hugging Face Model Hub**

In [None]:
# Login to HuggingFace
# Run, and paste your HF Access token when prompted
!huggingface-cli login

# OR
# from huggingface_hub import notebook_login
# notebook_login()


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
# Push peft adapter to Hub

my_peft_repo = "dialogue_Summary_peft"

peft_model.push_to_hub(repo_id= my_peft_repo, commit_message= "Upload peft adapter", )

adapter_model.safetensors:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/yrajm1997/dialogue_Summary_peft/commit/ad7cd87e12817872cd8c8ed5787d8c34ae0c5e46', commit_message='Upload peft model', commit_description='', oid='ad7cd87e12817872cd8c8ed5787d8c34ae0c5e46', pr_url=None, pr_revision=None, pr_num=None)

Access your pushed adapter at `https://huggingface.co/[YOUR-USER-NAME]/[YOUR-MODEL-REPO-NAME]/tree/main`

### **Reload & Test**

**Test your LoRA finetuned model downloaded from HF Model Hub**

In [None]:
from peft import PeftModel, PeftConfig
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

username = "levi121"      # change it to your HuggingFace username

base_checkpoint = username + '/dialogue_Summary'
peft_model_id = username + '/dialogue_Summary_peft'

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_checkpoint)

# Load Base model
peft_model_base = AutoModelForSeq2SeqLM.from_pretrained(base_checkpoint)

# Load PEFT model
loaded_peft_model = PeftModel.from_pretrained(model = peft_model_base,           # The model to be adapted
                                              model_id = peft_model_id,          # Name of the PEFT configuration to use
                                              is_trainable=False,                # False for inference
                                              )

tokenizer_config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/75.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/278 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.66k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/358 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/623 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

In [None]:
def generate_summary(input, llm):
    """Prepare prompt  -->  tokenize -->  generate output using LLM  -->  detokenize output"""

    input_prompt = f"""
                    Summarize the following conversation.

                    {input}

                    Summary:
                    """

    input_ids = tokenizer(input_prompt, return_tensors='pt')
    tokenized_output = llm.generate(input_ids=input_ids['input_ids'], min_length=30, max_length=200, )
    output = tokenizer.decode(tokenized_output[0], skip_special_tokens=True)

    return output

In [None]:
sample = dataset['test'][0]['dialogue']
label = dataset['test'][0]['summary']

output = generate_summary(sample, llm=loaded_peft_model)

print("Sample")
print(sample)
print("-------------------")
print("Summary:")
print(output)
print("Ground Truth Summary:")
print(label)

Sample
Hannah: Hey, do you have Betty's number?
Amanda: Lemme check
Hannah: <file_gif>
Amanda: Sorry, can't find it.
Amanda: Ask Larry
Amanda: He called her last time we were at the park together
Hannah: I don't know him well
Hannah: <file_gif>
Amanda: Don't be shy, he's very nice
Hannah: If you say so..
Hannah: I'd rather you texted him
Amanda: Just text him 🙂
Hannah: Urgh.. Alright
Hannah: Bye
Amanda: Bye bye
-------------------
Summary:
Hannah asks Amanda for Betty's number. Amanda can't find it. Hannah suggests she ask Larry to call Betty. Amanda doesn't know Larry well but he's very nice.
Ground Truth Summary:
Hannah needs Betty's number but Amanda doesn't have it. She needs to contact Larry.


### References:

1. [LoRA](https://huggingface.co/docs/peft/main/en/conceptual_guides/lora)
2. [Quicktour](https://huggingface.co/docs/peft/en/quicktour)
3. [Efficient Large Language Model training with LoRA and Hugging Face](https://www.philschmid.de/fine-tune-flan-t5-peft)