<a href="https://colab.research.google.com/github/mshojaei77/Awesome-Fine-tuning/blob/main/fine_tuning_phi_3_mini_lora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Instruction Tuning a Phi-3-mini Model for Persian Question Answering with LoRA

This tutorial provides a step-by-step guide on fine-tuning a Phi-3-mini model for Persian question answering using instruction tuning with LoRA (Low-Rank Adaptation) on the Hugging Face Hub. We'll utilize the `mshojaei77/merged_persian_qa` dataset containing over 527k question-answer pairs for this purpose.

### 1. Setup and Installation

Let's begin by setting up our environment and installing the necessary libraries.

In [None]:
!pip install -qqq --upgrade bitsandbytes transformers peft accelerate datasets trl flash_attn wandb huggingface_hub python-dotenv absl-py nltk rouge_score

!pip list | grep transformers.

This installs essential libraries like `transformers`, `peft` (for LoRA), `datasets`, `trl` (for SFTTrainer), and others.

Now, import the required modules:

In [None]:
from random import randrange
import torch
from datasets import load_dataset
from peft import LoraConfig, prepare_model_for_kbit_training, TaskType, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    set_seed,
    pipeline
)
from trl import SFTTrainer


### 2. Data Preparation

Download the `mshojaei77/merged_persian_qa` dataset and examine its structure.

In [None]:
dataset_name = "mshojaei77/merged_persian_qa"
dataset_split = "train"
dataset = load_dataset(dataset_name, split=dataset_split)

print(f"dataset size: {len(dataset)}")
print(dataset[randrange(len(dataset))])
print(dataset)

This dataset should have two columns: "Context" for the question and "Response" for the answer.

### 3. Model and Tokenizer

We'll use the `microsoft/Phi-3-mini-4k-instruct` model for this tutorial. You can choose a different Phi-3 variant based on your requirements.

In [None]:
model_id = "microsoft/Phi-3-mini-4k-instruct"
model_name = "microsoft/Phi-3-mini-4k-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, add_eos_token=True, use_fast=True)
tokenizer.padding_side = 'left'
tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)

model = AutoModelForCausalLM.from_pretrained(
          model_id, torch_dtype=torch.float16, trust_remote_code=True, device_map="auto",
          attn_implementation="flash_attention_2" if torch.cuda.is_available() else "sdpa"
)


This code loads the tokenizer and the model. Ensure you have a GPU available; otherwise, adjust the `device_map` parameter accordingly.

### 4. Instruction Tuning Format

We'll format our data for instruction tuning by creating a "messages" column using the ChatML format.

In [None]:
def create_message_column(row):
    return {"messages": [
        {"role": "user", "content": row['Context']},
        {"role": "assistant", "content": row['Response']}
    ]}

def format_dataset_chatml(row):
    return {"text": tokenizer.apply_chat_template(row["messages"], add_generation_prompt=False, tokenize=False)}

dataset_chatml = dataset.map(create_message_column)
dataset_chatml = dataset_chatml.map(format_dataset_chatml)

print(dataset_chatml[0])

This formats the dataset to have a conversational structure with user instructions and assistant responses.

### 5. Training Setup

Now, we'll split the dataset, define our training arguments, and configure the LoRA parameters.

In [None]:
dataset_chatml = dataset_chatml.train_test_split(test_size=0.05, seed=1234)

lora_r = 16
lora_alpha = 16
lora_dropout = 0.05
target_modules = ['k_proj', 'q_proj', 'v_proj', 'o_proj', "gate_proj", "down_proj", "up_proj"]

args = TrainingArguments(
    output_dir="./phi-3-mini-LoRA-persian-qa",
    evaluation_strategy="steps",
    save_strategy="epoch",
    learning_rate=1e-4,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    warmup_ratio=0.1,
    logging_steps=100,
    eval_steps=100,
    report_to="wandb",
    fp16=True,
    seed=42,
)

peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    task_type=TaskType.CAUSAL_LM,
    target_modules=target_modules,
)


This code defines parameters for training, evaluation, and LoRA. Adjust these based on your hardware and desired performance.

### 6. Training with SFTTrainer

We'll utilize the `SFTTrainer` from the `trl` library for efficient instruction tuning.

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset_chatml['train'],
    eval_dataset=dataset_chatml['test'],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=args,
)

trainer.train()
trainer.save_model()

This code trains the model using the specified parameters and saves the trained model locally.

### 7. Evaluation

We can assess the fine-tuned model's performance using the ROUGE metric.

In [None]:
from datasets import load_metric

rouge_metric = load_metric("rouge", trust_remote_code=True)

def calculate_rouge(row):
    response = test_inference(row['messages'][0]['content'])
    result = rouge_metric.compute(predictions=[response], references=[row['output']], use_stemmer=True)
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    result['response'] = response
    return result

metricas = dataset_chatml['test'].select(range(0, 500)).map(calculate_rouge, batched=False)

import numpy as np

print("Rouge 1 Mean: ", np.mean(metricas['rouge1']))
print("Rouge 2 Mean: ", np.mean(metricas['rouge2']))
print("Rouge L Mean: ", np.mean(metricas['rougeL']))
print("Rouge Lsum Mean: ", np.mean(metricas['rougeLsum']))

This code calculates and prints the average ROUGE scores for the generated responses compared to the ground truth answers.

### 8. Saving and Sharing

Finally, save your fine-tuned model and tokenizer to the Hugging Face Hub for sharing and future use.


In [None]:
new_model = AutoPeftModelForCausalLM.from_pretrained(
    args.output_dir,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map="auto",
)

merged_model = new_model.merge_and_unload()

merged_model.save_pretrained("merged_model", trust_remote_code=True, safe_serialization=True)
tokenizer.save_pretrained("merged_model")

hf_model_repo = "your-username/your-model-name"  # Replace with your desired repo name

merged_model.push_to_hub(hf_model_repo)
tokenizer.push_to_hub(hf_model_repo)


Congratulations! You have successfully fine-tuned a Phi-3-mini model for Persian question answering using instruction tuning and LoRA. You can now utilize this fine-tuned model for various downstream tasks involving Persian question answering. Remember to replace placeholder values like your Hugging Face username and desired model repository names throughout the code.
