# Fine-tuning Mistral-7B for Instruction Generation

## Overview

This Jupyter notebook demonstrates the process of fine-tuning the Mistral-7B language model for instruction generation using Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA). The goal is to adapt the model to generate instructions based on given inputs and responses, essentially reversing the typical instruction-following behavior of large language models.

## Purpose

- Showcase the fine-tuning process for large language models
- Demonstrate the use of LoRA for efficient adaptation of pre-trained models
- Provide a practical example of preparing data, configuring models, and training for a specific NLP task

## Key Components

1. Data preparation using the mosaicml/instruct-v3 dataset
2. Model loading and configuration with 4-bit quantization
3. LoRA setup for parameter-efficient fine-tuning
4. Training process using the SFTTrainer from the TRL library

## How to Use This Notebook

1. **Environment Setup**: Ensure you have a GPU-enabled environment with Python and Jupyter installed.
2. **Dependencies**: Run the first cell to install required libraries.
3. **Data Preparation**: Follow the cells that load and preprocess the dataset.
4. **Model Configuration**: Execute cells that load and configure the Mistral-7B model.
5. **Training**: Run the training cell to fine-tune the model.
6. **Evaluation**: Use the provided functions to test the model's performance after training.

## Notes

- This notebook uses a subset of the full dataset for quicker experimentation. Adjust dataset size as needed.
- The training process is resource-intensive. Ensure you have adequate GPU memory available.
- Experiment with different LoRA configurations and training parameters to optimize results.

By following this notebook, you'll gain hands-on experience in fine-tuning large language models for specific tasks using state-of-the-art techniques in natural language processing.

## Installing Required Libraries

**What it's doing:**
Installing necessary Python libraries for the project.

**Why:**
These libraries are essential for working with transformers, fine-tuning models, handling datasets, and optimizing performance. Installing them ensures we have all the tools needed for our task.


In [1]:
! pip install transformers trl accelerate torch bitsandbytes peft datasets -qU

## Loading the Dataset

**What it's doing:**
Loading the "mosaicml/instruct-v3" dataset.

**Why:**
This dataset contains instruction-response pairs, which are crucial for our task of fine-tuning a model to generate instructions. It provides the training data we need.


In [2]:
from datasets import load_dataset

instruct_tune_dataset = load_dataset("mosaicml/instruct-v3")

## Examining the Dataset

**What it's doing:**
Displaying the structure of the loaded dataset.

**Why:**
This helps us understand the composition of our dataset, including the number of examples and the available features. It's an important step for data exploration and verification.


In [3]:
instruct_tune_dataset

DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 56167
    })
    test: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 6807
    })
})

## Filtering the Dataset

**What it's doing:**
Filtering the dataset to only include examples from the "dolly_hhrlhf" source.

**Why:**
By focusing on a specific subset of the data, we can potentially improve the quality and consistency of our fine-tuning results. This step helps in data curation.


In [4]:
instruct_tune_dataset = instruct_tune_dataset.filter(lambda x: x["source"] == "dolly_hhrlhf")
instruct_tune_dataset

DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 34333
    })
    test: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 4771
    })
})

## Reducing Dataset Size

**What it's doing:**
Limiting the dataset to 5,000 training examples and 200 test examples.

**Why:**
This reduction in dataset size allows for faster experimentation and requires less computational resources. It's a common practice when initially developing and testing a model fine-tuning pipeline.


In [5]:
instruct_tune_dataset["train"] = instruct_tune_dataset["train"].select(range(5_000))
instruct_tune_dataset["test"] = instruct_tune_dataset["test"].select(range(200))
instruct_tune_dataset

DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 5000
    })
    test: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 200
    })
})

## Defining the Prompt Template

**What it's doing:**
Creating a template for formatting our training data.

**Why:**
This template structures our input data consistently, telling the model how to interpret the input and what kind of output we expect. It's crucial for instruction-tuning tasks.


In [6]:
prompt_template = """<s>### Instruction:
Use the provided input to create an instruction that could have been used to generate the response with an LLM.

### Input:
{input}

### Response:
{response}</s>"""

## Creating the Prompt Function

**What it's doing:**
Defining a function to format each sample from our dataset according to the prompt template.

**Why:**
This function prepares our data for training, ensuring each example is formatted consistently and correctly for our specific task of instruction generation.


In [7]:
def create_prompt(sample):
    input_text = sample["response"]  # The 'response' from the dataset becomes the 'input' for our new task
    response_text = sample["prompt"].replace("Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction\n", "").strip()
    
    full_prompt = prompt_template.format(input=input_text, response=response_text)
    
    return full_prompt

## Testing the Prompt Function

**What it's doing:**
Applying the prompt function to a sample from the dataset.

**Why:**
This test ensures our prompt function is working correctly before we use it in training. It's a crucial verification step in our data preparation process.


In [8]:
create_prompt(instruct_tune_dataset["train"][0])

'<s>### Instruction:\nUse the provided input to create an instruction that could have been used to generate the response with an LLM.\n\n### Input:\nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.\n\n### Response:\nWhat are different types of grass?\n\n### Response</s>'

## Loading the Pre-trained Model and Tokenizer

**What it's doing:**
Loading the Mistral-7B model and its tokenizer, with 4-bit quantization.

**Why:**
This step prepares our base model for fine-tuning. The 4-bit quantization allows us to work with this large model on more modest hardware by reducing its memory footprint.

In [9]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    device_map='auto',
    quantization_config=nf4_config,
    use_cache=False
)

print(model)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )

## Defining the Generation Function

**What it's doing:**
Creating a function to generate responses using our model.

**Why:**
This function allows us to test our model's outputs at various stages of fine-tuning, helping us assess its performance and progress.



In [10]:
def generate_response(prompt, model):
  encoded_input = tokenizer(prompt,  return_tensors="pt", add_special_tokens=True)
  model_inputs = encoded_input.to('cuda')

  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)

  decoded_output = tokenizer.batch_decode(generated_ids)

  return decoded_output[0].replace(prompt, "")

## Testing the Generation Function

**What it's doing:**
Generating a response with our base model before fine-tuning.

**Why:**
This provides a baseline to compare against after fine-tuning, helping us understand how much the model's performance improves.

In [11]:
generate_response("### Instruction:\nUse the provided input to create an instruction that could have been used to generate the response with an LLM.\n\n### Input:\nI think it depends a little on the individual, but there are a number of steps you’ll need to take.  First, you’ll need to get a college education.  This might include a four-year undergraduate degree and a four-year doctorate program.  You’ll also need to complete a residency program.  Once you have your education, you’ll need to be licensed.  And finally, you’ll need to establish a practice.\n\n### Response:", model)

'<s> \nTo become a trained medical professional, you need to follow several steps. First, you should obtain a college education by earning a four-year undergraduate degree and a four-year doctorate program. After that, you must complete a residency program. Once you have completed your education and residency, you need to get licensed. Finally, establish a practice in your field.</s>'

## Configuring LoRA for Fine-tuning

**What it's doing:**
Setting up the Low-Rank Adaptation (LoRA) configuration for fine-tuning.

**Why:**
LoRA allows us to fine-tune the model efficiently by adding a small number of trainable parameters. This configuration defines how LoRA will be applied to our model.

In [13]:
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM"
)



## Preparing the Model for LoRA Fine-tuning

**What it's doing:**
Applying the LoRA configuration to our model.

**Why:**
This step prepares our model for efficient fine-tuning, setting up the additional LoRA parameters while keeping most of the original model frozen.


In [14]:
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

print(peft_config)  # Print your LoRA configuration to confirm it's set up correctly

for name, module in model.named_modules():
    print(f"Module: {name}")

for name, module in model.named_modules():
    if any(lora_term in name.lower() for lora_term in ['lora', 'adapter', 'peft']):
        print(f"Potential LoRA adapter found in: {name}")

for name, param in model.named_parameters():
    if param.requires_grad:
        print(f"Trainable parameter: {name}")

for name, module in model.named_modules():
    if 'lora' in name.lower():
        print(f"LoRA adapter found in: {name}")

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path='mistralai/Mistral-7B-Instruct-v0.1', revision=None, task_type='CAUSAL_LM', inference_mode=False, r=64, target_modules={'q_proj', 'v_proj'}, lora_alpha=16, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None)
Module: 
Module: base_model
Module: base_model.model
Module: base_model.model.model
Module: base_model.model.model.embed_tokens
Module: base_model.model.model.layers
Module: base_model.model.model.layers.0
Module: base_model.model.model.layers.0.self_attn
Module: base_model.model.model.layers.0.self_attn.q_proj
Module: base_model.model.model.layers.0.self_attn.q_proj.base_layer
Module: base_model.model.model.layers.0.self_attn.q_proj.lora_dro

## Setting Up Training Arguments

**What it's doing:**
Configuring the training process parameters.

**Why:**
These arguments define crucial aspects of our training process, such as learning rate, batch size, and evaluation frequency. They significantly impact the efficiency and effectiveness of fine-tuning.


In [15]:
from transformers import TrainingArguments

args = TrainingArguments(
  output_dir = "mistral_instruct_generation",
  #num_train_epochs=5,
  max_steps = 100, # comment out this line if you want to train in epochs
  per_device_train_batch_size = 4,
  warmup_steps = 0,
  logging_steps=10,
  save_strategy="epoch",
  #evaluation_strategy="epoch",
  evaluation_strategy="steps",
  eval_steps=20, # comment out this line if you want to evaluate at the end of each epoch
  learning_rate=2e-4,
  bf16=True,
  lr_scheduler_type='constant',
)



## Setting Up the Trainer

**What it's doing:**
Initializing the SFTTrainer with our model, datasets, and training configuration.

**Why:**
The trainer handles the fine-tuning process, managing the training loop, evaluation, and logging. This setup brings together all the components we've prepared for fine-tuning.

In [16]:
from trl import SFTTrainer

max_seq_length = 2048

trainer = SFTTrainer(
  model=model,
  peft_config=peft_config,
  max_seq_length=max_seq_length,
  tokenizer=tokenizer,
  packing=True,
  formatting_func=create_prompt,
  args=args,
  train_dataset=instruct_tune_dataset["train"],
  eval_dataset=instruct_tune_dataset["test"]
)

trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
    all_param += param.numel()
    if param.requires_grad:
        trainable_params += param.numel()
print(f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}")




Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
max_steps is given, it will override any value given in num_train_epochs


trainable params: 27262976 || all params: 3779334144 || trainable%: 0.7213698223345028




## Training the Model

**What it's doing:**
Running the fine-tuning process and testing the result.

**Why:**
This is the main training step where our model learns from the prepared dataset. After training, we test it on a sample input to verify improvement and check resource usage to understand the computational cost of our fine-tuning process.

In [17]:
trainer.train()

sample_input = instruct_tune_dataset["train"][0]
formatted_input = create_prompt(sample_input)
print("Sample Input:")
print(formatted_input)
print("\nModel Output:")
print(generate_response(formatted_input, model))

import torch
print(f"GPU memory allocated: {torch.cuda.memory_allocated()/1e9:.2f} GB")
print(f"GPU memory cached: {torch.cuda.memory_reserved()/1e9:.2f} GB")

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33maocvee[0m ([33maocvee2[0m). Use [1m`wandb login --relogin`[0m to force relogin




Step,Training Loss,Validation Loss
20,1.4841,1.34557
40,1.4291,1.299801
60,1.4061,1.287846
80,1.3622,1.279572
100,1.3923,1.273444


Sample Input:
<s>### Instruction:
Use the provided input to create an instruction that could have been used to generate the response with an LLM.

### Input:
There are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.

### Response:
What are different types of grass?

### Response</s>

Model Output:
<s><s> ### Instruction:
Use the provided input to create an instruction that could have been used to generate the response with an LLM.

### Input:
There are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.

### Response:
What are different types of grass?

### Response</