# Large Language Model (LLM) Fine-tuning Demonstration

## Purpose:
This notebook serves as a demonstration of fine-tuning Large Language Models (LLMs) on task-specific instructions to enhance their performance on specialized tasks. The process of fine-tuning adjusts the parameters of a pre-trained model to make it more suited to a specific task, allowing it to generate more accurate and coherent responses.

## Model:
In this demonstration, we are using the `opt-350M` model, an opensource model from Meta. This model has been pre-trained on a diverse and extensive corpus, enabling it to understand and generate human-like text based on the given prompts or instructions.

## Dataset:
We are utilizing the `samsum` dataset, a collection of dialogues and their corresponding summaries. This dataset is ideal for training models on the task of dialogue summarization, allowing them to learn how to generate concise and informative summaries of conversations.

## Task:
The primary task in this notebook is **Dialog Summarization**. The model will be fine-tuned to summarize dialogues, generating brief and coherent summaries that retain the essential information from the conversations.

## Process:
1. **Pre-fine-tuning Performance Assessment:**
   - We will assess the model's ability to perform dialog summarization before fine-tuning, using samples from the `samsum` dataset.
   
2. **Fine-tuning:**
   - The `opt-350M` model will be fine-tuned on the `samsum` dataset, learning to generate accurate and concise summaries of dialogues.
   
3. **Post-fine-tuning Performance Assessment:**
   - We will evaluate the model's performance on dialog summarization after fine-tuning, comparing it with the pre-fine-tuning performance to observe the improvements.

## Objective:
The objective of this notebook is to showcase the effectiveness of fine-tuning LLMs on task-specific instructions, emphasizing how it can significantly enhance the model's performance on specialized tasks like dialog summarization.




## Causal Language Modeling (CLM) Pretraining Overview

Causal Language Modeling is a technique where the model generates text sequentially, predicting the next word in a sequence based on the previous words. It learns to predict the probability \( P(w_t | w_{1}, w_{2}, ..., w_{t-1}) \) of a word \( w_t \) at time \( t \) given the preceding words.

### Pretraining Steps:

1. **Data Collection:**
   - A large corpus of text data is collected from diverse sources.
   - The text is tokenized into subwords or words.
   
2. **Model Initialization:**
   - The parameters of a transformer model, typically used for CLM, are initialized.
   
3. **Masked Language Modeling:**
   - During pretraining, a variant called Masked Language Model (MLM) is often used where some input tokens are masked, and the model learns to predict them.
   
4. **Training:**
   - The model is trained to predict the next word in a sequence based on the preceding words.
   - It learns contextual representations of words by adjusting its parameters to minimize the difference between the predicted probability distribution and the true distribution of the next word.
   
5. **Evaluation and Fine-tuning:**
   - The pretrained model is evaluated on specific tasks and fine-tuned on task-specific datasets.

### Diagrammatic Representation:

The diagram below represents a simple sequence and the causal relationships between the words in the sequence.


<img src="images/Pretraining.png" alt="Drawing" style="width: 800px;"/>

## Instruction Tuning Overview

Instruction tuning is a crucial step in the deployment of Large Language Models (LLMs) like GPT-3.5 Turbo. It involves fine-tuning the model to understand and generate responses to specific instructions or prompts, enhancing its performance on various tasks.

### Steps Involved:

1. **Selection of Prompts:**
   - Choose a set of prompts or instructions that are representative of the task you want the model to perform.
   
2. **Creation of Training Dataset:**
   - Construct a dataset consisting of the selected prompts along with the corresponding correct responses.
   
3. **Fine-tuning:**
   - The model is fine-tuned on the training dataset, learning to generate responses that are coherent and contextually relevant to the given prompts.
   
4. **Evaluation:**
   - After fine-tuning, the model is evaluated on a separate dataset to assess its performance and make any necessary adjustments.

### Importance:

- **Enhanced Performance:**
   - Instruction tuning allows the model to generate more accurate and coherent responses, improving its overall performance on specific tasks.
   
- **Task Specificity:**
   - It enables the model to understand and respond to task-specific instructions, making it more versatile and applicable to a range of use cases.

### Illustration:

The Python code below illustrates a simplistic example of instruction tuning, where a hypothetical model is fine-tuned to respond to a specific instruction.

<img src="images/fine-tuning-task-specific.png" alt="Drawing" style="width: 800px;"/>

## Task specific examples


<img src="images/fine-tuning-task-specific-example.png" alt="Drawing" style="width: 800px;"/>


##  Start Notebook

This notebook shows how to train a OPT-350M model on a single GPU (e.g. A100 with 80GB) using int8 quantization and LoRA.

### Step 0: Install pre-requirements and convert checkpoint

The example uses the Hugging Face trainer and model. We will download model and data both from hugging face.

Let's proceed with the demonstration!

In [None]:
#%%bash
#!pip3 install transformers datasets accelerate sentencepiece protobuf==3.20 py7zr scipy peft bitsandbytes fire torch_tb_profiler ipywidgets tqdm

### Step 1: Load the model

Point model_id to model weight folder

In [None]:
import sys
import os

# Set CUDA_VISIBLE_DEVICES environment variable
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, DataCollatorForLanguageModeling


model = AutoModelForCausalLM.from_pretrained(
    "facebook/opt-350m",
    load_in_8bit=True,
    device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")

### Step 2: Check base model

Run the base model on an example input:

In [None]:
eval_prompt = """
Summarize this dialog:
Person A: Hey, are you free this weekend?
Person B: Hi! I might have some time on Saturday. Why do you ask?
Person A: I was thinking of organizing a small get-together at my place. Just a few close friends. Would you like to come?
Person B: That sounds like a lot of fun! I’d love to come. What time are you thinking?
Person A: I was thinking around 7 PM. We can have some snacks and drinks.
Person B: 7 PM works for me. Should I bring something?
Person A: If you could bring some drinks, that would be great!
Person B: Sure, I can do that. Looking forward to it!
Person A: Awesome! See you on Saturday then!
---
Summary:
"""

# """How to train a dog to sit?"""

# model = model_350

model_input = tokenizer(eval_prompt, return_tensors="pt", return_attention_mask=False).to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

We can see that the base model only repeats the conversation.

### Step 3: Prepare model for PEFT

Let's prepare the model for Parameter Efficient Fine Tuning (PEFT):

In [None]:
model.train()

def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_int8_training,
    )

    peft_config =  LoraConfig(
                                r=16,
                                lora_alpha=32,
                                target_modules=["q_proj", "v_proj"],
                                lora_dropout=0.05,
                                bias="none",
                                task_type="CAUSAL_LM"
                            )


    # prepare int-8 model for training
    # model = prepare_model_for_int8_training(model)
    # model.print_trainable_parameters()
    total_parameters = sum(p.numel() for p in model.parameters())
    print("total number of parameters in model : ", total_parameters )
    model = get_peft_model(model, peft_config , adapter_name = "A1")
    # model = get_peft_model(model, peft_config , adapter_name = "A2")
    model.print_trainable_parameters()
    return model, peft_config

# create peft config
model, lora_config = create_peft_config(model)



### Step 4: Load the pre-processed dataset and fine tune the model

Here, we fine tune the model for a single epoch which takes a bit more than an hour on a A100.

In [None]:
from transformers import default_data_collator , DataCollatorForLanguageModeling, Trainer, TrainingArguments
from datasets import load_dataset
from torch.utils.data import Dataset
from itertools import chain

class Concatenator(object):
    def __init__(self, chunk_size=1024):
        self.chunk_size=chunk_size
        self.residual = {"input_ids": [], "attention_mask": []}
        
    def __call__(self, batch):
        concatenated_samples = {
            k: v + list(chain(*batch[k])) for k, v in self.residual.items()
        }

        total_length = len(concatenated_samples[list(concatenated_samples.keys())[0]])

        if total_length >= self.chunk_size:
            chunk_num = total_length // self.chunk_size
            result = {
                k: [
                    v[i : i + self.chunk_size]
                    for i in range(0, chunk_num * self.chunk_size, self.chunk_size)
                ]
                for k, v in concatenated_samples.items()
            }
            self.residual = {
                k: v[(chunk_num * self.chunk_size) :]
                for k, v in concatenated_samples.items()
            }
        else:
            result = concatenated_samples
            self.residual = {k: [] for k in concatenated_samples.keys()}

        result["labels"] = result["input_ids"].copy()

        return result


def get_preprocessed_samsum(tokenizer, split):
    dataset = load_dataset("samsum", split=split)

    prompt = (
        f"Summarize this dialog:\n{{dialog}}\n---\nSummary:\n{{summary}}{{eos_token}}"
    )

    def apply_prompt_template(sample):
        return {
            "text": prompt.format(
                dialog=sample["dialogue"],
                summary=sample["summary"],
                eos_token=tokenizer.eos_token,
            )
        }

    dataset = dataset.map(apply_prompt_template, remove_columns=list(dataset.features))
        
    dataset = dataset.map(
        lambda sample: tokenizer(sample["text"]),
        batched=True,
        remove_columns=list(dataset.features),
    ).map(Concatenator(), batched=True)


    return dataset

data3 = get_preprocessed_samsum(tokenizer, 'train')

trainer = Trainer(
    model=model,
    train_dataset=data3,
    args=TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=200,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=20,
        output_dir='outputs'
    ),
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

### Step 5:
Save model checkpoint

In [None]:
output_dir = '/glb/data/gw_export/bootcamp/<your_id>/output'
model.save_pretrained(output_dir)

### Step 6:
Try the fine tuned model on the same example again to see the learning progress:

In [None]:
model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=200)[0], skip_special_tokens=True))
