# Instruction Fine-tuning

Before understanding about fine-tuning, let us have a look on the drawbacks of the in-context learning. 

## Drawbacks of In-context learning

- Smaller models with less parameters will not be able to understand the in-context learning as they are already trained on less amount of data and lacks the ability to idenify the patterns. 

- Context window will be filled with this examples leaving no space for the instructions to be filled. 

## What is Instruction fine-tuning?

To overcome these drawbacks, we need to "Fine-tune" the model. 

- If we remember, in pre-training process of LLM we have trained with Petabytes, Terabytes and Gigabytes of data. 

- Similar to that, in fine-tuning we will have a labeled dataset of the same or less size, and using the supervised learning technique we will fine tune the model. 

- Our labeled dataset contains lots of Task specific examples (Prompt and Completion pairs) which are related to one or multiple tasks.

- Our LLM model gets re-trained from scratch using our labeled dataset, and all the model weights and parameters are updated. This process is called as "Full Fine-tuning". 

- This process improves the model performance and releases a new version of LLM model. 

- Prompt libraries are designed to fine-tune models for classification.

> Full fine-tuning requires full compute and memory as compared to pre-training of LLM model. 

## Steps involved in Fine-tuning

**Step 1:** Prepare and clean the dataset.

**Step 2:** Perform Train, test and split the dataset for training and testing purpose.

**Step 3:** Use the training dataset, and fine-tune the LLM model. 

**Step 4:** We need to use the "Cross Entropy" loss function to optimize the output and parameters. The weights gets automatically updated in the standard backward propogation and after multiple EPOCHs the values will be updated. 

**Step 5:** Once the training is successfull, we can evalulate the model using our testing dataset. 

## Single Task Fine-tuning

While fine-tuning the LLM model only for a single task, we often need 500-1000 set of examples which will be our training dataset. But doing the same, it leads to a problem called "Catastrophic forgetting".

## What is Catastrophic forgetting? 

- This is a phenomenon occurs post fine-tuning the LLM model for a single task. 

- This leads the LLM model to forget its original capabilities/earlier tasks such as doing generic operations i.e., Generation, Summarization, etc. and only works good for the single-task that is fine-tuned on. 

- Below example shows more explainable information: 

    | Status | Prompt | Completion |
    | :---: | :---: | :---: |
    | Before fine-tuning | What is cat name? Charlie the cat is on bed. | *Charile* |
    | After fine-tuning | What is cat name? Charlie the cat is on bed. | *Charile is positive* |

### Solution to avoid this problem

- First solution is to train the model on multiple tasks instead of single task. 

- By leveraging Parameter Efficient Fine-tuning (PEFT) - it doesn't change the original parameters.

## Multi Task Fine-tuning

- Training dataset contains examples of several tasks, where each task is associated with several examples.

- Using this technique, we can avoid the problem of catastrophic forgetting. 

#### Drawbacks: 

- Requires lots of data nearly 50,000 records of data. 

## FLAN-T5 LLM Model

- FLAN-T5 termed as "Fine-tuned Language Net -T5" defines the process involved in Instruction fine-tuning. 

- This model is an instruction fine-tuned version of T5 model, a Text-to-Text Transfer Transformer which is fine-tuned on 1000 multiple tasks. 

- Its main area involves in working with various NLP related tasks converting them into a text-to-text format.

- When compared with the original T5 model, FLAN-T5 works more better on Question answering, Translation, Summarization, etc. 