Fine Tuning LLMs -> 

### **Installing All Required Packages ->**

In [1]:
%pip install accelerate peft bitsandbytes transformers trl 

Note: you may need to restart the kernel to use updated packages.


### **Importing Required Libraries ->**

In [3]:
import os
import torch
from datasets import load_dataset

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging
)

from peft import LoraConfig

from trl import SFTTrainer

### **Creating the Datasets ->**

[Original Dataset](https://huggingface.co/datasets/timdettmers/openassistant-guanaco)

Original Dataset has a pattern of 

``` ### Human: Question Prompt with Details### Assistant: Answer to the Prompt. ```

Prompt Template used for Chat Models is ->

```python
 <s>[INST] <<SYS>> System Prompt <</SYS>> User Prompt [/INST] Model Answer </s>
 ```


Original Dataset can be reformated to the required format as below

```python
dataset = load_dataset("timdettmers/openassistant-guanaco")

def transform_conversation(example):
    conversation_text = example['text']
    segments = conversation_text.split('###')

    reformatted_segments = []

    # Iterate over pairs of segments
    for i in range(1, len(segments) - 1, 2):
        human_text = segments[i].strip().replace('Human:', '').strip()

        # Check if there is a corresponding assistant segment before processing
        if i + 1 < len(segments):
            assistant_text = segments[i+1].strip().replace('Assistant:', '').strip()

            # Apply the new template
            reformatted_segments.append(f'<s>[INST] {human_text} [/INST] {assistant_text} </s>')
        else:
            # Handle the case where there is no corresponding assistant segment
            reformatted_segments.append(f'<s>[INST] {human_text} [/INST] </s>')

    return {'text': ''.join(reformatted_segments)}


# Apply the transformation
transformed_dataset = dataset.map(transform_conversation)
```

This can be manually done in this notebook after loading the original dataset, But that would take a long time, Thankfully there is a dataset available in the Hugging Face Hub that already has this format.

[New Dataset](https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k)

New Dataset has the required format of ```<s>[INST] <<SYS>> System Prompt <</SYS>> User Prompt [/INST] Model Answer </s>```

Also the complete model training i.e. Finetuning all the weights of the model, Requires a lot of GPU Resources as Llama 2 has 7 billion parameters. 

So we would be using PEFT using LoRA/QLoRA to finetune the model.

### **Parameter Efficient Fine Tuning - PEFT ->**

PEFT is a technique to finetune a model, In this technique, Only a fraction of the parameters/weights are changed according to the task/use-case and the rest of the parameters/weights are frozen.

**LoRA -> Low Rank Adaptation & QLoRA -> Quantized Low Rank Adaptation** are two techniques to perform Parameter Efficient Fine Tuning.

In [None]:
#Loading the model -> 

model_name = "meta-llama/Llama-2-7b-chat-hf"  #Llama 2 from HuggingFace

dataset_name = "mlabonne/guanaco-llama2-1k"   #The new reformatted dataset

finetuned_model_name = "Llama-2-7b-chat-finetuned"



#QLoRA Parameters -> 


lora_r = 64                   #Rank

lora_alpha = 16               #Scaling parameter -> [0-1599] Quantize -> [0,99] -> alpha = 16

lora_dropout = 0.01



#BitsAndBytes Parameters -> 


use_4bit = True               #Activates the 4 bit precision model loading

bnb_4bit_compute_dtype = "float16"

bnb_4bit_quant_type = "nf4"   #fp4 / nf4 -> Converts all weights to 4 bits having std dev = 1 , mean = 0

use_nested_quant = False    #Double Quant -> False



#Output Directory -> 
