Based on Phi-3Cookbook https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/Phi-3-finetune-qlora-python.ipynb

Install required packages

In [None]:
!pip install -qqq --upgrade bitsandbytes transformers peft accelerate datasets trl flash_attn
!pip install huggingface_hub
!pip install python-dotenv
!pip install wandb -qqq
!pip install absl-py nltk rouge_score
!pip list | grep transformers.

transformers                     4.41.2


Import packages

In [None]:
import numpy as np
import torch
import pandas as pd
from random import randrange
from datasets import load_dataset
from peft import LoraConfig, prepare_model_for_kbit_training, PeftModel, get_peft_model
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, set_seed, pipeline)
from trl import SFTTrainer
from datasets import Dataset
import wandb
import os
from datasets import load_metric

Define parameters

In [None]:
model_id = "microsoft/Phi-3-mini-4k-instruct"
model_name = "microsoft/Phi-3-mini-4k-instruct"
dataset_name = "acorreal/mental-health-dataset"
dataset_split= "train"
new_model = "phi3-mental-health"
hf_model_repo="acorreal/"+new_model
device_map = {"": 0}
use_4bit = True
bnb_4bit_compute_dtype = "bfloat16"
bnb_4bit_quant_type = "nf4"
use_double_quant = True
lora_r = 16
lora_alpha = 16
lora_dropout = 0.05
target_modules= ['k_proj', 'q_proj', 'v_proj', 'o_proj', "gate_proj", "down_proj", "up_proj"]
set_seed(1234)

Connect to Huggingface Hub

In [None]:
from huggingface_hub import login
login(token='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')

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


Load the dataset with the instruction set

In [None]:
df = pd.read_csv('dataset.csv')
df.columns = ['input', 'output']
df['instruction'] = "You are a mental health assistant. Your job is to provide emotional support, actively listen, and offer practical suggestions for well-being. Respond empathically and do not give specific medical advice or diagnoses. Always make sure the user feels heard and supported. If the user mentions suicidal thoughts, encourage them to seek professional help immediately. Here's the conversation so far:\n\n"
df.head()

Unnamed: 0,input,output,instruction
0,i am going through some things with my feeling...,if everyone thinks you are worthless then mayb...,You are a mental health assistant. Your job is...
1,i am going through some things with my feeling...,hello and thank you for your question and seek...,You are a mental health assistant. Your job is...
2,i am going through some things with my feeling...,first thing i would suggest is getting the sle...,You are a mental health assistant. Your job is...
3,i am going through some things with my feeling...,therapy is essential for those that are feelin...,You are a mental health assistant. Your job is...
4,i am going through some things with my feeling...,i first want to let you know that you are not ...,You are a mental health assistant. Your job is...


In [None]:
# Load the dataset
dataset = Dataset.from_pandas(df)
dataset

Dataset({
    features: ['input', 'output', 'instruction'],
    num_rows: 2747
})

In [None]:
print(dataset[1])

{'input': 'i am going through some things with my feelings and myself i barely sleep and i do nothing but think about how i am worthless and how i should not be here i have never tried or contemplated suicide i have always wanted to fix my issues but i never get around to it how can i change my feeling of being worthless to everyone', 'output': 'hello and thank you for your question and seeking advice on this feelings of worthlessness is unfortunately common in fact most people if not all have felt this to some degree at some point in their life you are not alone changing our feelings is like changing our thoughts it is hard to do our minds are so amazing that the minute you change your thought another one can be right there to take it is place without your permission another thought can just pop in there the new thought may feel worse than the last one my guess is that you have tried several things to improve this on your own even before reaching out on here people often try thinking 

Load the tokenizer to prepare the dataset

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.padding_side = 'right' # to prevent warnings

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Function to generate the suitable format for our model.

In [None]:
def create_message_column(row):
    messages = []
    user = {
        "content": f"{row['instruction']}\n Input: {row['input']}",
        "role": "user"
    }
    messages.append(user)
    assistant = {
        "content": f"{row['output']}",
        "role": "assistant"
    }
    messages.append(assistant)
    return {"messages": messages}

def format_dataset_chatml(row):
    return {"text": tokenizer.apply_chat_template(row["messages"], add_generation_prompt=False, tokenize=False)}

Implement the ChatML format on our dataset.

In [None]:
dataset_chatml = dataset.map(create_message_column)
dataset_chatml = dataset_chatml.map(format_dataset_chatml)

Map:   0%|          | 0/2747 [00:00<?, ? examples/s]

Map:   0%|          | 0/2747 [00:00<?, ? examples/s]

Print example

In [None]:
dataset_chatml[0]

{'input': 'i am going through some things with my feelings and myself i barely sleep and i do nothing but think about how i am worthless and how i should not be here i have never tried or contemplated suicide i have always wanted to fix my issues but i never get around to it how can i change my feeling of being worthless to everyone',
 'output': 'if everyone thinks you are worthless then maybe you need to find new people to hang out withseriously the social context in which a person lives is a big influence in selfesteemotherwise you can go round and round trying to understand why you are not worthless then go back to the same crowd and be knocked down againthere are many inspirational messages you can find in social media maybe read some of the ones which state that no person is worthless and that everyone has a good purpose to their lifealso since our culture is so saturated with the belief that if someone does not feel good about themselves that this is somehow terriblebad feelings 

Print dataset

In [None]:
dataset_chatml = dataset_chatml.train_test_split(test_size=0.05, seed=1234)
dataset_chatml

DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'instruction', 'messages', 'text'],
        num_rows: 2609
    })
    test: Dataset({
        features: ['input', 'output', 'instruction', 'messages', 'text'],
        num_rows: 138
    })
})

Recognize GPU

In [None]:
if torch.cuda.is_bf16_supported():
  compute_dtype = torch.bfloat16
  attn_implementation = 'flash_attention_2'
else:
  compute_dtype = torch.float16
  attn_implementation = 'sdpa'

print(attn_implementation)
print(compute_dtype)

flash_attention_2
torch.bfloat16


Load the tokenizer and model to finetune

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, add_eos_token=True, use_fast=True)
tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
tokenizer.padding_side = 'left'

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_double_quant,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=compute_dtype, trust_remote_code=True, quantization_config=bnb_config, device_map=device_map,
    attn_implementation=attn_implementation
)

model = prepare_model_for_kbit_training(model)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/904 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

Set up the QLoRA parameters.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, add_eos_token=True, use_fast=True)
tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
tokenizer.padding_side = 'left'

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_double_quant,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=compute_dtype,
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map=device_map,
    attn_implementation=attn_implementation
)

model = prepare_model_for_kbit_training(model)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Establish a connection with wandb and enlist the project and experiment.

In [None]:
# wandb.login()
# os.environ["PROJECT"]="phi3-mental-health"
# project_name = "phi3-mental-health"
# wandb.init(project=project_name, name = "Pphi3-mental-health")

We now possess all the necessary components to construct our SFTTrainer and commence the model training.

In [None]:
from peft import LoraConfig

peft_config = LoraConfig(
    lora_alpha=32,
    lora_dropout=0.1,
    r=8,
    bias="none",
    target_modules=[
        "model.layers.0.self_attn.qkv_proj",
        "model.layers.0.self_attn.o_proj",
        "model.layers.0.mlp.gate_up_proj",
        "model.layers.0.mlp.down_proj",
        "model.layers.1.self_attn.qkv_proj",
        "model.layers.1.self_attn.o_proj",
        "model.layers.1.mlp.gate_up_proj",
        "model.layers.1.mlp.down_proj",
    ]
)

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    save_total_limit=1,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    max_grad_norm=1.0,
    gradient_accumulation_steps=2,
    learning_rate=5e-5,
    lr_scheduler_type="linear",
)


trainer = SFTTrainer(
    model=model,
    train_dataset=dataset_chatml['train'],
    eval_dataset=dataset_chatml['test'],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=1024,
    tokenizer=tokenizer,
    args=args,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/2609 [00:00<?, ? examples/s]

Map:   0%|          | 0/138 [00:00<?, ? examples/s]



In [None]:
# Get model modules
#for name, module in model.named_modules():
#    print(name)

Initiate the model training process by invoking the train() method on our Trainer instance.

In [None]:
peft_model = get_peft_model(model, peft_config)
peft_model.config.use_cache = False
trainer.train()
trainer.save_model()



<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc




Epoch,Training Loss,Validation Loss
1,No log,No log
2,No log,No log
3,No log,No log




Login in to Hugging Face

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
trainer.hub_model_id = "acorreal/adapter-phi-3-mini-mental-health"
hf_adapter_repo="acorreal/adapter-phi-3-mini-mental-health"
trainer.push_to_hub(token='TOKEN')



training_args.bin:   0%|          | 0.00/5.30k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/3.15M [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/acorreal/adapter-phi-3-mini-mental-health/commit/4ee9aa071457b1473808298ca52053b5585df517', commit_message='End of training', commit_description='', oid='4ee9aa071457b1473808298ca52053b5585df517', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
# Empty VRAM
#del model
#del trainer
#import gc
#gc.collect()
#gc.collect()
#torch.cuda.empty_cache()
#torch.cuda.empty_cache()

Reload model and upload on hugging face

In [None]:
hf_adapter_repo = "acorreal/adapter-phi-3-mini-mental-health"
model_name, hf_adapter_repo, compute_dtype

('microsoft/Phi-3-mini-4k-instruct',
 'acorreal/adapter-phi-3-mini-mental-health',
 torch.bfloat16)

In [None]:
peft_model_id = hf_adapter_repo
tr_model_id = model_name
model_2 = AutoModelForCausalLM.from_pretrained(tr_model_id, trust_remote_code=True, torch_dtype=compute_dtype)
model_2 = PeftModel.from_pretrained(model, peft_model_id)
model_2 = model_2.merge_and_unload()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/3.15M [00:00<?, ?B/s]



In [None]:
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

tokenizer_config.json:   0%|          | 0.00/3.16k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.85M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/447 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
hf_model_repo

'acorreal/phi3-mental-health'

In [None]:
merged_model_id = hf_model_repo
model.push_to_hub(merged_model_id)
tokenizer.push_to_hub(merged_model_id)

model.safetensors:   0%|          | 0.00/2.66G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/acorreal/phi3-mental-health/commit/c1e82e2abf4892ba9f4def6c01adaee1d83da95e', commit_message='Upload tokenizer', commit_description='', oid='c1e82e2abf4892ba9f4def6c01adaee1d83da95e', pr_url=None, pr_revision=None, pr_num=None)

Evaluation

Retrieve the model and tokenizer from the Hub.

In [None]:
hf_model_repo, device_map, compute_dtype

('acorreal/phi3-mental-health', {'': 0}, torch.bfloat16)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed

set_seed(1234)
tokenizer = AutoTokenizer.from_pretrained(hf_model_repo,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(hf_model_repo, trust_remote_code=True, torch_dtype=compute_dtype, device_map=device_map) # compute "auto" dev_map "cuda"

tokenizer_config.json:   0%|          | 0.00/3.27k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.85M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/561 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


model.safetensors:   0%|          | 0.00/2.66G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

reload dataset

In [None]:
dataset_chatml = dataset.map(create_message_column)
dataset_chatml = dataset_chatml.map(format_dataset_chatml)
dataset_chatml = dataset_chatml.train_test_split(test_size=0.05)
dataset_chatml

Map:   0%|          | 0/2747 [00:00<?, ? examples/s]

Map:   0%|          | 0/2747 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'instruction', 'messages', 'text'],
        num_rows: 2609
    })
    test: Dataset({
        features: ['input', 'output', 'instruction', 'messages', 'text'],
        num_rows: 138
    })
})

Predict

In [None]:
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

In [None]:
# Print template
pipe.tokenizer.apply_chat_template([{"role": "user", "content": dataset_chatml['test'][0]['messages'][0]['content']}], tokenize=False, add_generation_prompt=True)

"<s><|user|>\nYou are a mental health assistant. Your job is to provide emotional support, actively listen, and offer practical suggestions for well-being. Respond empathically and do not give specific medical advice or diagnoses. Always make sure the user feels heard and supported. If the user mentions suicidal thoughts, encourage them to seek professional help immediately. Here's the conversation so far:\n\n\n Input: i want a secure relationship with someone that wants to be with me and who will actually put effort into it i seem to gravitate toward unavailable men and those that want intimacy and no relationship i let men dictate and control me because they accuse me of being controlling i let men emotionally abuse me and i am at their beck and call i am not comfortable being alone or doing anything by myself i feel i need the security of someone being around just to survive i know what i am doing wrong and i do it anyway just hoping things will change how do i stop this behavior an

In [None]:
def predict(prompt):
    prompt = pipe.tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True)
    outputs = pipe(prompt, max_new_tokens=256, do_sample=True, num_beams=1, temperature=0.3, top_k=50, top_p=0.95, max_time= 180)
    return outputs[0]['generated_text'][len(prompt):].strip()

In [None]:
%%time
predict(dataset_chatml['test'][0]['messages'][0]['content'])



CPU times: user 18.8 s, sys: 50.1 ms, total: 18.8 s
Wall time: 19 s


"It sounds like you're going through a really tough time, and it's brave of you to reach out for support. It's clear you're aware that your current patterns aren't serving you well, and that's an important first step. It's okay to feel overwhelmed by these thoughts and behaviors, but remember, you're not alone in this.\n\n\nTo start, it might be helpful to reflect on what you truly want from a relationship. Consider what qualities you value in a partner and how you can communicate these needs in a relationship. It's also beneficial to explore why you feel the need to be with someone who might not be the best for you.\n\n\nBuilding self-esteem and self-reliance can be empowering. You might find it helpful to engage in activities that you enjoy and that make you feel good about yourself. This can also include learning new skills that increase your independence.\n\n\nRemember, it's essential to seek support from a mental health professional who can provide personalized guidance. They can 

In [None]:
rouge_metric = load_metric("rouge", trust_remote_code=True)

  rouge_metric = load_metric("rouge", trust_remote_code=True)


Downloading builder script:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

Calculate a rouge score

In [None]:
def calculate_rogue(row):
    response = predict(row['messages'][0]['content'])
    result = rouge_metric.compute(predictions=[response], references=[row['output']], use_stemmer=True)
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    result['response']=response
    return result

In [None]:
%%time
metricas = dataset_chatml['test'].select(range(0,1)).map(calculate_rogue, batched=False)
print("Rouge 1 Mean: ",np.mean(metricas['rouge1']))
print("Rouge 2 Mean: ",np.mean(metricas['rouge2']))
print("Rouge L Mean: ",np.mean(metricas['rougeL']))
print("Rouge Lsum Mean: ",np.mean(metricas['rougeLsum']))



Map:   0%|          | 0/1 [00:00<?, ? examples/s]

Rouge 1 Mean:  37.65432098765432
Rouge 2 Mean:  3.7267080745341614
Rouge L Mean:  17.901234567901238
Rouge Lsum Mean:  21.60493827160494
CPU times: user 19.2 s, sys: 65.2 ms, total: 19.3 s
Wall time: 19.2 s
