# Finetuning Mistral 7B LLM for Function Calling

## Overview

The notebook is a modification and rearragnement of code originally discussed in [Mistral: Easiest Way to Fine-Tune on Custom Data](https://www.youtube.com/watch?v=lCZRwrRvrWg) by [Prompt Engineering](https://www.youtube.com/@engineerprompt). 

The code has been modified to use [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model with [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) dataset to try and finetune Mistral 7B model for function calling.

### Install Required Packages

In [1]:
#!pip install transformers trl accelerate torch bitsandbytes peft datasets -qU

### Import modules

In [2]:
from datasets import load_dataset
import copy
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import TrainingArguments
from trl import SFTTrainer
import uuid
import pandas as pd
from functools import partial
from datetime import datetime
from os.path import isfile

#### Declare constants

In [3]:
bos_token, eos_token = "<s>","</s>"
bop_token, eop_token = "[INST]","[/INST]"
dataset_id = "lalanikarim/glaive-function-calling-v2"
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
metrics_file = "metrics.csv"
seed = 421337

In [4]:
now = datetime.now()
sample_size = 5000
test_split = 0.1
test_size = int(sample_size * test_split)
train_size = sample_size - test_size
epochs = 3
now_str = now.strftime("%Y-%m-%d-%H-%M-%S")
lora_output_dir = f"lora-{now_str}-samples-{sample_size}-epochs-{epochs}"

#### Load HF Dataset

We need a dataset to fine-tune a model, for this example we will be using a subset of the `mistralai/Mistral-7B-Instruct-v0.2` dataset.

In [5]:
full_dataset = load_dataset(dataset_id)

#### Data structure

The dataset contains three different columns. We are only interested in the columns `prompt` and `response`. There are 9 different possible source value in the `source` column. We are only interested in one of them.

In [6]:
full_dataset

DatasetDict({
    train: Dataset({
        features: ['chat', 'system', 'prompt', 'response'],
        num_rows: 71678
    })
    test: Dataset({
        features: ['chat', 'system', 'prompt', 'response'],
        num_rows: 7965
    })
})

#### Prepare dataset

We will use just a small subset of the data for this training example.

#### Shuffle and (optionally) sample a subset

In [7]:
dataset = full_dataset.shuffle(seed=seed)
dataset["train"] = dataset["train"].select(range(train_size))
dataset["test"] = dataset["test"].select(range(test_size))
dataset

DatasetDict({
    train: Dataset({
        features: ['chat', 'system', 'prompt', 'response'],
        num_rows: 90
    })
    test: Dataset({
        features: ['chat', 'system', 'prompt', 'response'],
        num_rows: 10
    })
})

#### Create Formatted Prompt

In the following function we'll be merging `system`, `prompt` and `response` columns by creating the following template:

```
<s>[INST]{system}

Human: {prompt}[/INST]
{response}</s>
```

In [8]:
def create_prompt(record):
    
    system = record["system"]
    prompt = record["prompt"]
    response = record["response"]

    full_prompt = ""
    full_prompt += bos_token
    full_prompt += bop_token
    
    # system
    full_prompt += system
    
    # prompt
    full_prompt += "\n"
    full_prompt += prompt
    
    full_prompt += eop_token
    full_prompt += "\n"
    
    # response
    full_prompt += response
    full_prompt += eos_token
    full_prompt += "\n"
    
    return full_prompt

In [9]:
prompt = create_prompt(dataset["test"][0])
print(prompt)

<s>[INST]SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
    "name": "convert_currency",
    "description": "Convert an amount from one currency to another",
    "parameters": {
        "type": "object",
        "properties": {
            "amount": {
                "type": "number",
                "description": "The amount to convert"
            },
            "from_currency": {
                "type": "string",
                "description": "The currency to convert from"
            },
            "to_currency": {
                "type": "string",
                "description": "The currency to convert to"
            }
        },
        "required": [
            "amount",
            "from_currency",
            "to_currency"
        ]
    }
}

USER: Hi, I need to convert 500 USD to EUR. Can you help me with that?[/INST]
FUNCTION: {"name": "convert_currency", "arguments": '{"amount": 500, "from_currency": "USD", "to_currency

In [10]:
def extract_prompt(prompt):
    idx1 = prompt.index(bop_token)
    idx2 = prompt.index(eop_token) + len(eop_token)
    return prompt[idx1: idx2] + "\n"

In [11]:
user_prompt = extract_prompt(prompt)
print(user_prompt)

[INST]SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
    "name": "convert_currency",
    "description": "Convert an amount from one currency to another",
    "parameters": {
        "type": "object",
        "properties": {
            "amount": {
                "type": "number",
                "description": "The amount to convert"
            },
            "from_currency": {
                "type": "string",
                "description": "The currency to convert from"
            },
            "to_currency": {
                "type": "string",
                "description": "The currency to convert to"
            }
        },
        "required": [
            "amount",
            "from_currency",
            "to_currency"
        ]
    }
}

USER: Hi, I need to convert 500 USD to EUR. Can you help me with that?[/INST]



In [12]:
sample = dataset["test"].select(range(test_size - 10, test_size))
sample

Dataset({
    features: ['chat', 'system', 'prompt', 'response'],
    num_rows: 10
})

In [13]:
complete_prompts = list(map(create_prompt,sample))
user_prompts = list(map(extract_prompt, complete_prompts))
sample_df = pd.DataFrame({"user_prompt":user_prompts,"expected":complete_prompts})
sample_df["expected"] = sample_df.apply(lambda row: row["expected"].replace(row["user_prompt"],"")[3:],axis=1)
sample_df

Unnamed: 0,user_prompt,expected
0,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""convert_currency"", ""argume..."
1,[INST]SYSTEM: You are a helpful assistant with...,ASSISTANT: Of course! I can help with that. Co...
2,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""search_books"", ""arguments""..."
3,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""convert_currency"", ""argume..."
4,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""calculate_bmi"", ""arguments..."
5,[INST]SYSTEM: You are a helpful assistant with...,"ASSISTANT: Of course! To help you better, coul..."
6,[INST]SYSTEM: You are a helpful assistant with...,"ASSISTANT: Of course, I can help with that. Ho..."
7,[INST]SYSTEM: You are a helpful assistant with...,"ASSISTANT: Of course, I can help with that. Ho..."
8,[INST]SYSTEM: You are a helpful assistant with...,"ASSISTANT: I'm sorry, but I don't have the cap..."
9,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""check_flight_status"", ""arg..."


### Map the Dataset

In [14]:
# instruct_tune_dataset = instruct_tune_dataset.map(create_prompt)

### Loading the Base Model

Load the model in `4bit`, with double quantization, with `bfloat16` as the compute dtype.

In this case we are using the instruct-tuned model - instead of the base model. For fine-tuning a base model will need a lot more data!

In [15]:
nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

In [16]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map='auto',
    quantization_config=nf4_config,
    use_cache=False
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [17]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Let's example how well the model does at this task currently:

In [18]:
def generate_response(prompt, model):
  encoded_input = tokenizer(prompt,  return_tensors="pt", add_special_tokens=True)
  model_inputs = encoded_input.to('cuda')

  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)

  decoded_output = tokenizer.batch_decode(generated_ids)

  return decoded_output[0].replace(prompt, "")[3:]

In [19]:
model_response = generate_response(user_prompt,model)
print(model_response)

 Hello! I'd be happy to help you convert 500 USD to EUR.Here's the information you'll need to provide for the conversion function:
```javascript
{
  "amount": 500,
  "from_currency": "USD",
  "to_currency": "EUR"
}
```
Now we can use the `convert_currency` function with this information:
```javascript
convert_currency({
  "amount": 500,
  "from_currency": "USD",
  "to_currency": "EUR"
});
```
This will return the converted amount in EUR.</s>


In [20]:
def print_sample(row):
    cols = ["user_prompt","expected","base","finetune"]
    for col in cols:
        if col in row:
            print("#"*10)
            print(col,"\n")
            print(row[col])
            print("\n")

In [21]:
def show_samples():
    sample_df.iloc[[0,1]].apply(print_sample,axis=1)

In [22]:
generate_from_base = partial(generate_response,model=model)
sample_df["base"] = sample_df["user_prompt"].apply(generate_from_base)
show_samples()

##########
user_prompt 

[INST]SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
    "name": "convert_currency",
    "description": "Convert an amount from one currency to another",
    "parameters": {
        "type": "object",
        "properties": {
            "amount": {
                "type": "number",
                "description": "The amount to convert"
            },
            "from_currency": {
                "type": "string",
                "description": "The currency to convert from"
            },
            "to_currency": {
                "type": "string",
                "description": "The currency to convert to"
            }
        },
        "required": [
            "amount",
            "from_currency",
            "to_currency"
        ]
    }
}

USER: Hi, I need to convert 500 USD to EUR. Can you help me with that?[/INST]



##########
expected 

FUNCTION: {"name": "convert_currency", "arguments": '{"amo

### Setting up the Training
we will be using the `huggingface` and the `peft` library!

In [23]:

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM"
)

we need to prepare the model to be trained in 4bit so we will use the  `prepare_model_for_kbit_training` function from peft

In [24]:
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

### Hyper-paramters for training
These parameters will depend on how long you want to run training for.
Most important to consider:

`num_train_epochs/max_steps`: How many iterations over the data you want to do, BE CAREFUL, don't try too many, you will over-fit!!!!!

`learning_rate`: Controls the speed of convergence


In [25]:

args = TrainingArguments(
  output_dir = lora_output_dir,
  num_train_epochs=epochs,
  # max_steps = 100, # comment out this line if you want to train in epochs
  per_device_train_batch_size = 4,
  warmup_steps = 0.03,
  logging_steps=10,
  save_strategy="epoch",
  #evaluation_strategy="epoch",
  evaluation_strategy="steps",
  eval_steps=20, # comment out this line if you want to evaluate at the end of each epoch
  learning_rate=2e-4,
  bf16=True,
  lr_scheduler_type='constant',
)

Setting up the trainer.

`max_seq_length`: Context window size


In [27]:

max_seq_length = 2048

trainer = SFTTrainer(
  model=model,
  peft_config=peft_config,
  max_seq_length=max_seq_length,
  tokenizer=tokenizer,
  packing=True,
  formatting_func=create_prompt, # this will aplly the create_prompt mapping to all training and test dataset
  args=args,
  train_dataset=dataset["train"],
  eval_dataset=dataset["test"]
)

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [28]:
output = trainer.train()



Step,Training Loss,Validation Loss




In [29]:
trainer.save_model(lora_output_dir)

In [30]:
output.metrics

{'train_runtime': 35.2418,
 'train_samples_per_second': 0.965,
 'train_steps_per_second': 0.284,
 'total_flos': 2982167156097024.0,
 'train_loss': 0.7704600334167481,
 'epoch': 2.0}

In [31]:
output

TrainOutput(global_step=10, training_loss=0.7704600334167481, metrics={'train_runtime': 35.2418, 'train_samples_per_second': 0.965, 'train_steps_per_second': 0.284, 'total_flos': 2982167156097024.0, 'train_loss': 0.7704600334167481, 'epoch': 2.0})

In [32]:
metrics_data = {
    "run_name":lora_output_dir,
    "samples":sample_size,
    "train_runtime":output.metrics["train_runtime"],
    "train_loss":output.metrics["train_loss"],
    "global_step":output.global_step,
    "epochs":epochs,
}
new_metric = pd.DataFrame.from_records([metrics_data])

if isfile(metrics_file):
    metrics_df = pd.read_csv(metrics_file)
    metrics_df = pd.concat([metrics_df,new_metric],ignore_index=True)
else:
    metrics_df = new_metric

metrics_df

Unnamed: 0,run_name,samples,train_runtime,train_loss,global_step,epochs
0,lora-2024-02-11-17-39-18-samples-100,100,18.6664,0.837787,5,1
1,lora-2024-02-11-17-48-53-samples-1000,1000,193.4668,0.415997,45,1
2,lora-2024-02-11-17-59-25-samples-5000,5000,1238.9249,0.214476,227,1
3,lora-2024-02-11-18-28-44-samples-10000,10000,3153.9444,0.177438,455,1
4,lora-2024-02-11-19-41-50-samples-5000-epochs-2,5000,2465.3345,0.17249,454,2
5,lora-2024-02-11-20-39-28-samples-1000-epochs-5,1000,972.555,0.195687,225,5
6,lora-2024-02-11-22-04-12-samples-1000-epochs-10,1000,1960.6943,0.131118,460,10
7,lora-2024-02-12-08-53-55-samples-10000-epochs-2,10000,6329.1559,0.146952,910,2
8,lora-2024-02-13-11-22-43-samples-100-epochs-2,100,35.2418,0.77046,10,2


# Save Model and Push to Hub

In [33]:
# !pip install huggingface-hub -qU

In [34]:
# from huggingface_hub import notebook_login

# notebook_login()

In [35]:
# trainer.push_to_hub("Promptengineering/mistral-instruct-generation")

In [36]:
merged_model = model.merge_and_unload()



In [37]:
generate_from_base = partial(generate_response,model=merged_model)
sample_df["finetune"] = sample_df["user_prompt"].apply(generate_from_base)
show_samples()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


##########
user_prompt 

[INST]SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
    "name": "convert_currency",
    "description": "Convert an amount from one currency to another",
    "parameters": {
        "type": "object",
        "properties": {
            "amount": {
                "type": "number",
                "description": "The amount to convert"
            },
            "from_currency": {
                "type": "string",
                "description": "The currency to convert from"
            },
            "to_currency": {
                "type": "string",
                "description": "The currency to convert to"
            }
        },
        "required": [
            "amount",
            "from_currency",
            "to_currency"
        ]
    }
}

USER: Hi, I need to convert 500 USD to EUR. Can you help me with that?[/INST]



##########
expected 

FUNCTION: {"name": "convert_currency", "arguments": '{"amo

In [38]:
sample_df.to_csv(f"{lora_output_dir}.csv",index=False)
metrics_df.to_csv(metrics_file,index=False)

In [39]:
sample_df

Unnamed: 0,user_prompt,expected,base,finetune
0,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""convert_currency"", ""argume...",Of course! Based on the information you provi...,Of course! I'll use the `convert_currency` fu...
1,[INST]SYSTEM: You are a helpful assistant with...,ASSISTANT: Of course! I can help with that. Co...,Of course! I'd be happy to help you calculate...,Of course! I have a function called `calculat...
2,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""search_books"", ""arguments""...","Based on the given information, you can use t...","Based on the given information, you can use t..."
3,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""convert_currency"", ""argume...",Hello! I'd be happy to help you convert 500 U...,Of course! Here is how you can use the `conve...
4,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""calculate_bmi"", ""arguments...","To calculate your Body Mass Index (BMI), I'll...","To calculate your Body Mass Index (BMI), you ..."
5,[INST]SYSTEM: You are a helpful assistant with...,"ASSISTANT: Of course! To help you better, coul...",Of course! I can suggest a movie based on you...,Hello! I'd be happy to help you find a movie ...
6,[INST]SYSTEM: You are a helpful assistant with...,"ASSISTANT: Of course, I can help with that. Ho...",Absolutely! Here's how you can use the `gener...,Hello [USER]! I'd be happy to help you genera...
7,[INST]SYSTEM: You are a helpful assistant with...,"ASSISTANT: Of course, I can help with that. Ho...","Absolutely, I'd be happy to help you generate...","Absolutely, I'd be happy to help you generate..."
8,[INST]SYSTEM: You are a helpful assistant with...,"ASSISTANT: I'm sorry, but I don't have the cap...",I'm an assistant designed to help you with bo...,"I'm an assistant designed to help with books,..."
9,[INST]SYSTEM: You are a helpful assistant with...,"FUNCTION: {""name"": ""check_flight_status"", ""arg...",```python\ncheck_flight_status(\n flight_num...,Based on the provided function `check_flight_...
