<a href="https://colab.research.google.com/github/schumbar/CMPE297/blob/main/assignment_03/part_f/ShawnChumbar_Assignment03_PartF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 03 Part F - Finetuning for Mental Health Development Chatbot

## Assignment Description

F. Mental Health Chatbot:
   - Finetune a model (e.g., Microsoft Phi-3) using Unsloth for mental health chatbot development
   - Demonstrate the chatbot's capabilities

### References

1. [Fine-tuning Microsoft PHI3 with Unsloth for Mental Health Chatbot Development](https://medium.com/@mauryaanoop3/fine-tuning-microsoft-phi3-with-unsloth-for-mental-health-chatbot-development-ddea4e0c46e7)

### Install Necessary Libraries

In [8]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install trl peft accelerate bitsandbytes xformers

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-eu5n6xb3/unsloth_700175f2f6864b6a9f980b616ef5ce87
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-eu5n6xb3/unsloth_700175f2f6864b6a9f980b616ef5ce87
  Resolved https://github.com/unslothai/unsloth.git to commit 79a2112ca4a775ce0b3cb75f5074136cb54ea6df
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [9]:
!pip install triton



### Import Necessary Libraries

In [10]:
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset
from unsloth import FastLanguageModel

### Load the Phi-3 Model

In [11]:
max_seq_length = 2048  # Choose any length, Unsloth supports RoPE Scaling internally
dtype = None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True  # Use 4bit quantization to reduce memory usage

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Phi-3-mini-4k-instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

==((====))==  Unsloth 2024.9.post4: Fast Mistral patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.26G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/194 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/458 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

### Set up fine-tuning parameters

In [12]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # Choose any number > 0. Suggested 8, 16, 32, 64, 128
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj",],
    lora_alpha=16,
    lora_dropout=0,  # Supports any, but 0 is optimized
    bias="none",  # Supports any, but "none" is optimized
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  # We support rank stabilized LoRA
    loftq_config=None,  # And LoftQ
)

Unsloth 2024.9.post4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Prepare training data

In [13]:
data = load_dataset("heliosbrahma/mental_health_chatbot_dataset")

README.md:   0%|          | 0.00/2.50k [00:00<?, ?B/s]

(…)-00000-of-00001-01391a60ef5c00d9.parquet:   0%|          | 0.00/102k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/172 [00:00<?, ? examples/s]

### Set up the trainer

In [14]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=data['train'],
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/172 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


### Show current memory stats

In [15]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
2.283 GB of memory reserved.


### Train the model

In [16]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 172 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 29,884,416


Step,Training Loss
1,1.189
2,0.9741
3,1.3177
4,1.2641
5,1.2559
6,0.9073
7,1.1012
8,1.3368
9,1.3372
10,0.7624


### Print trained metrics

In [17]:
print(trainer_stats.metrics)

{'train_runtime': 283.6623, 'train_samples_per_second': 1.692, 'train_steps_per_second': 0.212, 'total_flos': 3466330523873280.0, 'train_loss': 0.9898937493562698, 'epoch': 2.7906976744186047}


### Inference with fine-tuned model

In [36]:
def get_prompt(question):
    FastLanguageModel.for_inference(model)  # Enable native 2x faster inference
    inputs = tokenizer([question], return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=200, use_cache=True)
    decoded_output = tokenizer.batch_decode(outputs)
    return decoded_output

In [40]:
prompt_template = "What is a panic attack?"
print(get_prompt(prompt_template), sep='\n')

["What is a panic attack?\nA panic attack is a sudden episode of intense fear or anxiety that can cause physical and emotional symptoms. It can be very frightening and overwhelming, but it's important to remember that panic attacks are not life-threatening and can be managed with the right support and treatment.\n\nWhat are the symptoms of a panic attack?\nThe symptoms of a panic attack can vary from person to person, but some common signs include:\n\n- Rapid heart rate\n- Sweating\n- Trembling or shaking\n- Shortness of breath\n- Feeling of choking\n- Chest pain or discomfort\n- Nausea or abdominal pain\n- Dizziness or lightheadedness\n- Feeling of unreality or detachment\n- Fear of losing control or going crazy\n- Fear of dying\n\nHow common are panic"]


In [41]:
prompt_template = "What is a heart attack?"
print(get_prompt(prompt_template))

["What is a heart attack?\nA heart attack, also known as a myocardial infarction, occurs when the blood flow to a part of the heart is blocked, usually by a blood clot. This can damage or destroy part of the heart muscle.\n\nHeart attacks are a leading cause of death worldwide, and they can happen to anyone, regardless of age or lifestyle. However, certain factors can increase the risk of having a heart attack, such as smoking, high blood pressure, high cholesterol, diabetes, obesity, and a sedentary lifestyle.\n\nSymptoms of a heart attack can vary from person to person, but common signs include chest pain or discomfort, shortness of breath, nausea, lightheadedness, and cold sweats. It's important to seek immediate medical attention if you or someone you know is experiencing these symptoms.\n\nPrevention and Tre"]


In [42]:
prompt_template = "What is a nervous breakdown?"
print(get_prompt(prompt_template))

["What is a nervous breakdown?\n\nA nervous breakdown is a term used to describe a severe mental health crisis that can result from overwhelming stress, trauma, or other factors. It is not a medical diagnosis but rather a way to describe a state of extreme emotional distress that can interfere with a person's ability to function in their daily life.\n\nSymptoms of a nervous breakdown can vary from person to person but may include:\n\n1. Intense anxiety or panic attacks\n2. Depression or feelings of hopelessness\n3. Difficulty concentrating or making decisions\n4. Withdrawal from social activities or relationships\n5. Changes in sleep or appetite\n6. Irritability or agitation\n7. Thoughts of self-harm or suicide\n\nIt's important to note that a nervous breakdown is not a permanent condition, and with proper treatment"]


### Save the fine-tuned model

In [43]:
model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.model',
 'lora_model/added_tokens.json',
 'lora_model/tokenizer.json')