<img src="https://res.cloudinary.com/dbl53sidm/image/upload/v1696398508/mistral-7b-v0.1_opibjl.jpg" width="100%">

## Fine-tuning [Mistral 7B Instruct](https://mistral.ai/news/announcing-mistral-7b/) using LoRA for Custom Output Format

This notebook demonstrates how to fine-tune the Mistral-7B-Instruct model to produce outputs in a specific format using your custom dataset, while ensuring that the tokenizer preserves the dataset's format.

## Introduction

We will fine-tune the Mistral 7B Instruct model using Low-Rank Adaptation (LoRA) without quantization, making it suitable for deployment on Cloudflare Workers AI. The model will be trained to generate outputs in the format specified in your dataset.

All the code will be available on my GitHub. Do drop by and give a follow and a star.
[adithya-s-k](https://github.com/adithya-s-k)

[GitHub Code](https://github.com/adithya-s-k/LLM-Alchemy-Chamber/blob/main/LLMs/Mistral-7b/Mistral_Colab_Finetune_ipynb_Colab_Final.ipynb)

I also post content about LLMs and what I have been working on Twitter.
[AdithyaSK (@adithya_s_k) / X](https://twitter.com/adithya_s_k)

## Prerequisites

Before starting, ensure that you have the following:

1. **GPU**: A powerful GPU (e.g., A100) is required to handle the full 16-bit model without quantization.
2. **Python Packages**: Install the necessary Python packages using the following commands:

Let's begin by checking if your GPU is correctly detected:

In [None]:
!nvidia-smi

Tue Oct 29 16:24:14 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0              48W / 400W |      2MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Step 1 - Install Necessary Packages

First, install the required dependencies.

In [None]:
!pip install -q -U transformers
!pip install -q -U peft
!pip install -q -U accelerate
!pip install -q datasets scipy
!pip install -q trl

## Step 2 - Model Loading

We'll load the Mistral 7B Instruct model without quantization to ensure compatibility with Cloudflare Workers AI.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer = AutoTokenizer.from_pretrained(model_id)

# Important: Set tokenizer settings to preserve formatting
tokenizer.padding_side = "left"
tokenizer.truncation_side = "right"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
)

model.config.use_cache = False  # Disable caching for training

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## Define Inference Function

Let's define a wrapper function to get completions from the model for a user question.

In [None]:
def get_completion(query: str, model, tokenizer) -> str:
    device = "cuda"  # or "cuda:0" depending on your setup

    prompt = query

    encodings = tokenizer(prompt, return_tensors="pt").to(device)

    generated_ids = model.generate(
        **encodings,
        max_new_tokens=512,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    return decoded

### Test the Base Model

Run an inference on the base model to see how it performs before fine-tuning.

In [None]:
result = get_completion(
    query="""
<s>
[INST]
Recommend skincare products based on the user's preferences and concerns.
{"name": "Alex", "kind_of_products": ["male"], "skin_goals": ["firmness_elasticity", "fight_dry_skin""], "motivation": ["ai_skincare_expert", "save_time"], "routine": "complete_routine", "biggest_concern": "bright_skin", "product_preferences": ["hypoallergenic"], "skinType": "oily", "acneRisk": "veryHigh", "hydrationLevel": "normalHydration", "pHValue": "pH2"}"
[/INST]
</s>
""",
    model=model,
    tokenizer=tokenizer
)
print(result)

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...




[INST]
Recommend skincare products based on the user's preferences and concerns.
{"name": "Alex", "kind_of_products": ["male"], "skin_goals": ["firmness_elasticity", "fight_dry_skin""], "motivation": ["ai_skincare_expert", "save_time"], "routine": "complete_routine", "biggest_concern": "bright_skin", "product_preferences": ["hypoallergenic"], "skinType": "oily", "acneRisk": "veryHigh", "hydrationLevel": "normalHydration", "pHValue": "pH2"}"
[/INST]

Hi Alex,

Based on your skin type (oily), goals (firmness elasticity, fight dry skin), and preferences (hypoallergenic), I recommend the following products:

1. Cleanser: La Mer Clear Infusion Cream Wash - This oil-free cleanser removes impurities and hydrates your skin. It's suitable for oily skin and helps achieve your goals.

2. Serum: Estée Lauder Advanced Night Repair Concentrate Serum - This lightweight serum improves skin texture, reduces pores, and even tones your skin. It's hypoallergenic and safe for your oily skin.

3. Moisturi

## Step 3 - Load and Prepare Your Custom Dataset

We'll load your dataset from [kleberbaum/mesagona-demo-data](https://huggingface.co/datasets/kleberbaum/mesagona-demo-data) and prepare it for fine-tuning.

In [None]:
from datasets import load_dataset

dataset = load_dataset("kleberbaum/mesagona-data", split="train")
dataset

(…)leaned_standardized_skincare_dataset.csv:   0%|          | 0.00/872k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Dataset({
    features: ['instruction', 'input', 'output'],
    num_rows: 1000
})

### Explore the Dataset

Let's take a look at a sample from your dataset.

In [None]:
print(dataset[0])

{'instruction': '"Based on the user\'s input, provide a detailed skincare recommendation with the following result format: result: {""ingredients"": [], ""recommendation"": """"}"', 'input': '"{name: Sam, kind_of_products: [female], skin_goals: [shine control], motivation: [just_curious], routine: simple_effective, biggest_concern: clear skin, product_preferences: [alcohol_free], skinType: oily, acneRisk: lowRisk, hydrationLevel: normalHydration, pHValue: pH1}"', 'output': '"{ingredients: [Green Tea Extract, Clay, Ferulic Acid], recommendation: Hi Sam, For oily skin, Green Tea Extract and Clay can balance oil production and prevent breakouts. These ingredients reduce shine and keep your skin healthy.  To achieve your goal of shine control, include these ingredients in your skincare routine.}"'}


### Formatting the Dataset

We need to format the dataset to match the model's expected input format, ensuring that it learns to produce outputs in your specified format.

In [None]:
def generate_prompt(data_point):
    """Generate input text based on instruction, input, and output."""
    instruction = data_point['instruction'] # 'You are a helpful code assistant. Your task is to populate this object {"ingredients": [], "recommendation": ""} for given information.'
    input_text = data_point['input']
    output_text = data_point['output']

    prompt = f"""
<s>
[INST]
{instruction}
{input_text}
[/INST]
{output_text}
</s>
"""
    return prompt.strip()

# Generate prompts
prompts = [generate_prompt(data_point) for data_point in dataset]

# Add the prompts to the dataset
dataset = dataset.add_column("prompt", prompts)

### Tokenize the Dataset

We'll tokenize the dataset carefully to ensure the format is preserved.

In [None]:
# Function to tokenize without altering the format
def tokenize_function(examples):
    return tokenizer(
        examples["prompt"],
        return_attention_mask=False,
        return_token_type_ids=False,
        padding=False,
        truncation=False,
        add_special_tokens=False,
    )

tokenized_datasets = dataset.map(tokenize_function, batched=True, remove_columns=dataset.column_names)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

### Prepare Labels

Since we're training a causal language model, the labels are the same as the inputs shifted by one position.

In [None]:
def shift_labels(examples):
    examples["labels"] = examples["input_ids"].copy()
    return examples

tokenized_datasets = tokenized_datasets.map(shift_labels, batched=False)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

### Split the Dataset

We'll split the dataset into training and testing sets.

In [None]:
tokenized_datasets = tokenized_datasets.train_test_split(test_size=0.2, seed=42)
train_data = tokenized_datasets["train"]
test_data = tokenized_datasets["test"]

## Step 4 - Apply LoRA

We'll use the PEFT library to apply LoRA to our model.

In [None]:
from peft import LoraConfig, get_peft_model

# Enable gradient checkpointing
model.gradient_checkpointing_enable()

# Define LoRA configuration
peft_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # Adjust target modules as needed
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA to the model
model = get_peft_model(model, peft_config)

### Check Trainable Parameters

Let's check how many parameters are trainable after applying LoRA.

In [None]:
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
all_params = sum(p.numel() for p in model.parameters())
print(f"Trainable parameters: {trainable_params} | Total parameters: {all_params} | Percentage: {100 * trainable_params / all_params:.2f}%")

Trainable parameters: 3407872 | Total parameters: 7245139968 | Percentage: 0.05%


## Step 5 - Fine-tuning with Supervised Fine-Tuning (SFT)

We'll use the `SFTTrainer` from the `trl` library for supervised fine-tuning.

In [None]:
from trl import SFTTrainer
import transformers

tokenizer.pad_token = tokenizer.eos_token

# Define a custom data collator that does not alter the inputs
def data_collator(features):
    return {
        'input_ids': torch.nn.utils.rnn.pad_sequence(
            [torch.tensor(f['input_ids']) for f in features], batch_first=True, padding_value=tokenizer.pad_token_id
        ),
        'labels': torch.nn.utils.rnn.pad_sequence(
            [torch.tensor(f['labels']) for f in features], batch_first=True, padding_value=-100
        ),
    }

trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=test_data,
    peft_config=peft_config,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_ratio=0.03,
        num_train_epochs=3,  # Adjust as needed
        learning_rate=2e-4,
        logging_steps=10,
        output_dir="outputs",
        save_strategy="epoch",
        report_to=[]  # Disable reporting to avoid errors in some environments
    ),
    data_collator=data_collator,
)

# Start the training
trainer.train()

  super().__init__(


Step,Training Loss
10,13.0657
20,6.9336
30,2.0485
40,0.9566
50,0.8285
60,0.7643
70,0.7542
80,0.7418
90,0.7172
100,0.7267


TrainOutput(global_step=600, training_loss=0.8646981771787008, metrics={'train_runtime': 587.9154, 'train_samples_per_second': 4.082, 'train_steps_per_second': 1.021, 'total_flos': 2.6207244652732416e+16, 'train_loss': 0.8646981771787008, 'epoch': 3.0})

## Step 6 - Saving and Merging LoRA Adapters

After fine-tuning, we'll save the LoRA adapters and merge them with the base model.

In [None]:
# Save the LoRA adapters
model.save_pretrained("mesagona-finetuned-lora")

# Load the base model
from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
)

# Load the LoRA adapters
merged_model = PeftModel.from_pretrained(base_model, "mesagona-finetuned-lora")

# Merge and unload
merged_model = merged_model.merge_and_unload()

# Save the merged model
# merged_model.save_pretrained("mesagona-merged-model", safe_serialization=True)
# tokenizer.save_pretrained("mesagona-merged-model")

# Save the LoRA adapters\n",
model.save_pretrained("mesagona-finetuned-lora")

# Save the tokenizer (optional)\n",
tokenizer.save_pretrained("mesagona-finetuned-lora")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

('mesagona-finetuned-lora/tokenizer_config.json',
 'mesagona-finetuned-lora/special_tokens_map.json',
 'mesagona-finetuned-lora/tokenizer.model',
 'mesagona-finetuned-lora/added_tokens.json',
 'mesagona-finetuned-lora/tokenizer.json')

## Step 7 - Evaluating the Fine-tuned Model

Let's define a function to generate completions from the merged model and test it.

In [None]:
def get_completion_merged(instruction: str, input_text: str, model, tokenizer) -> str:
    device = "cuda"

    prompt = f"""
<s>
[INST]
{instruction}
{input_text}
[/INST]
</s>
""".strip()

    encodings = tokenizer(prompt, return_tensors="pt").to(device)

    generated_ids = model.generate(
        **encodings,
        max_new_tokens=512,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    return decoded

# Test the fine-tuned model
instruction = 'result: {"ingredients": [], "recommendation": ""}'
input_text = '{"name": "Simon", "kind_of_products": ["male"], "skin_goals": ["firmness_elasticity", "fight_dry_skin""], "motivation": ["ai_skincare_expert", "save_time"], "routine": "complete_routine", "biggest_concern": "bright_skin", "product_preferences": ["hypoallergenic"], "skinType": "oily", "acneRisk": "veryHigh", "hydrationLevel": "normalHydration", "pHValue": "pH2"}"'

result = get_completion_merged(
    instruction=instruction,
    input_text=input_text,
    model=merged_model,
    tokenizer=tokenizer
)
print(result)


[INST]
result: {"ingredients": [], "recommendation": ""}
{"name": "Simon", "kind_of_products": ["male"], "skin_goals": ["firmness_elasticity", "fight_dry_skin""], "motivation": ["ai_skincare_expert", "save_time"], "routine": "complete_routine", "biggest_concern": "bright_skin", "product_preferences": ["hypoallergenic"], "skinType": "oily", "acneRisk": "veryHigh", "hydrationLevel": "normalHydration", "pHValue": "pH2"}"
[/INST]
[inst]
result: {"ingredients": [Caffeine, Allantoin, Azelaic Acid], recommendation: Hi Simon, For oily skin, Caffeine and Allantoin can balance oil production and prevent breakouts. These ingredients reduce shine and keep your skin healthy.  To achieve your skin goals, include these ingredients in your skincare routine.


## Optional: Push the Model to Hugging Face Hub

If you wish to share your fine-tuned model, you can push it to the Hugging Face Hub.

In [None]:
# Login to Hugging Face\n",
from huggingface_hub import notebook_login

notebook_login()

# Replace with your model name\n",
adapter_model_name = "kleberbaum/mesagona-finetuned-lora"

# Push the LoRA adapters and tokenizer to the Hugging Face Hub"
model.push_to_hub(adapter_model_name)
tokenizer.push_to_hub(adapter_model_name)

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

adapter_model.safetensors:   0%|          | 0.00/13.6M [00:00<?, ?B/s]

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/kleberbaum/mesagona-finetuned-lora/commit/b1e33bda97342d21e8dd04afb8a8cdad5e8141ba', commit_message='Upload tokenizer', commit_description='', oid='b1e33bda97342d21e8dd04afb8a8cdad5e8141ba', pr_url=None, pr_revision=None, pr_num=None)

## Conclusion

You've successfully fine-tuned the Mistral 7B Instruct model using LoRA without quantization, training it to produce outputs in your specified format while ensuring that the tokenizer does not alter your dataset's format. This makes the model compatible with Cloudflare Workers AI and tailored to your application's needs.

Feel free to explore further and adapt the code to your needs. Happy fine-tuning!