In [1]:
!nvidia-smi 
!lscpu

Fri Apr 12 20:59:21 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla P100-PCIE-16GB           Off | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P0              26W / 250W |      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

In [2]:
%%capture
!pip install -q -U torch=='2.0.0'
!pip install -q -U accelerate=='0.25.0' peft=='0.7.1' bitsandbytes=='0.41.3.post2' transformers=='4.36.1' trl=='0.7.4'
!pip install --upgrade pandas 

In [3]:
!pip install wandb --quiet

In [4]:
import os 
import warnings
import pandas as pd

import wandb
import torch
import transformers
import bitsandbytes
from datasets import Dataset
from huggingface_hub import login
from peft import LoraConfig, PeftConfig, prepare_model_for_kbit_training, get_peft_model
from trl import SFTTrainer
from transformers import (AutoModelForCausalLM,
                         AutoTokenizer,
                         BitsAndBytesConfig,
                         TrainingArguments,
                         pipeline,
                         logging,
                         TrainerCallback)

from sklearn.model_selection import train_test_split
from sklearn.metrics import (accuracy_score,
                            confusion_matrix, 
                            classification_report)



In [5]:
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
os.environ['TOKENIZERS_PARALLELISM'] = "false"

In [6]:
warnings.filterwarnings("ignore")

In [7]:
df = pd.read_csv("/kaggle/input/nlp-mental-health-conversations/train.csv")
df.head()

Unnamed: 0,Context,Response
0,I'm going through some things with my feelings...,"If everyone thinks you're worthless, then mayb..."
1,I'm going through some things with my feelings...,"Hello, and thank you for your question and see..."
2,I'm going through some things with my feelings...,First thing I'd suggest is getting the sleep y...
3,I'm going through some things with my feelings...,Therapy is essential for those that are feelin...
4,I'm going through some things with my feelings...,I first want to let you know that you are not ...


In [8]:
df.shape

(3512, 2)

In [9]:
df.Context[13], df.Response[13]

("I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.\n   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.\n   How can I change my feeling of being worthless to everyone?",
 "I'm glad you are interested in changing your feelings of worthlessness. Visiting your doctor is a great first step to address your lack of sleep. First, you want to make sure that you are physically healthy and that there isn't a physical condition causing your problem with sleep.\xa0Changing your feelings is something that is difficult to do, especially on your own. Make sure that you have someone you can trust and you can talk to about how you feel. This can be a friend or a family member. If you find that there isn't anyone you would feel comfortable talking to about this issue, find a therapist close to you who can help.\xa0One activity that people ha

In [10]:
def generate_prompt(datapoint):
    return f"""
            The conversation between a Human and an AI Mental Health Chatbot named Connor.
            [INST] You are a mental health expert. Your task is to generate an appropriate response based on the conversation given in square brackets.
            The tone of your answer should be warm and kind.
            [{datapoint['Context']}][/INST]
            
            {datapoint['Response']}""".strip()

def generate_test_prompt(datapoint):
    return f"""
            The conversation between a Human and an AI Mental Health Chatbot named Connor.
            [INST] You are a mental health expert. Your task is to generate an appropriate response based on the conversation given in square brackets.
            The tone of your answer should be warm and kind.
            [{datapoint['Context']}][/INST]""".strip()
            

In [11]:
X_train, X_eval = train_test_split(df, test_size=0.2, random_state=42)
X_train.shape, X_eval.shape

((2809, 2), (703, 2))

In [12]:
X_train = pd.DataFrame(X_train.apply(generate_prompt, axis=1), columns=['text'])
X_eval_for_test = pd.DataFrame(X_eval.apply(generate_prompt, axis=1), columns=['text'])
X_eval = pd.DataFrame(X_eval.apply(generate_test_prompt, axis=1),columns=['text'])
X_train.head()

Unnamed: 0,text
2659,The conversation between a Human and an AI Men...
1057,The conversation between a Human and an AI Men...
2111,The conversation between a Human and an AI Men...
1565,The conversation between a Human and an AI Men...
631,The conversation between a Human and an AI Men...


In [13]:
train_data = Dataset.from_pandas(X_train)
eval_data = Dataset.from_pandas(X_eval)

In [14]:
print(X_eval_for_test['text'][0])

The conversation between a Human and an AI Mental Health Chatbot named Connor.
            [INST] You are a mental health expert. Your task is to generate an appropriate response based on the conversation given in square brackets.
            The tone of your answer should be warm and kind.
            [I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.
   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.
   How can I change my feeling of being worthless to everyone?][/INST]
            
            If everyone thinks you're worthless, then maybe you need to find new people to hang out with.Seriously, the social context in which a person lives is a big influence in self-esteem.Otherwise, you can go round and round trying to understand why you're not worthless, then go back to the same crowd and be knocked down again.There are 

In [15]:
print(eval_data[0])

{'text': "The conversation between a Human and an AI Mental Health Chatbot named Connor.\n            [INST] You are a mental health expert. Your task is to generate an appropriate response based on the conversation given in square brackets.\n            The tone of your answer should be warm and kind.\n            [I've hit my head on walls and floors ever since I was young. I sometimes still do it but I don't exactly know why,    I have anxiety and I had a rough childhood but now I'll start to hit my head and sometimes not realize it but I don't know how to stop or even why I'm doing it.    How can I help myself to change my behavior?][/INST]", '__index_level_0__': 495}


In [16]:
model_name = "mistralai/Mistral-7B-Instruct-v0.2"

compute_dtype = getattr(torch, "float16")

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_use_double_quant = False,
    bnb_4bit_quant_type = 'nf4',
    bnb_4bit_compute_dtype = compute_dtype
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    quantization_config=bnb_config,
)

model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_name, 
                                          trust_remote_code=True,
                                          add_eos_token=True,
                                         )
tokenizer.pad_token = tokenizer.eos_token

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [17]:
model_hub_id = "samyukthacodes/mental-health-mistral-7b-instructv0.2-finetuned-V2"

In [18]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [19]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
hf_key = user_secrets.get_secret("huggingface")

In [20]:
class PeftSavingCallback(TrainerCallback):
    def on_save(self, args, state, control, **kwargs):
        checkpoint_path = os.path.join(args.output_dir, f"checkpoint-{state.global_step}")
        kwargs["model"].save_pretrained(checkpoint_path)

        if "pytorch_model.bin" in os.listdir(checkpoint_path):
            os.remove(os.path.join(checkpoint_path, "pytorch_model.bin"))

In [21]:
callbacks = [PeftSavingCallback]

In [22]:
project = "mental-health-finetuned-v2"
base_model_name = "mistral-7b-instruct"
run_name = base_model_name + "-" + project
output_dir = "./" + run_name

In [23]:
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.05,
    r=16,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "v_proj"]
)

training_arguments = TrainingArguments(
    output_dir=output_dir,
    logging_dir = "logs",
    per_device_train_batch_size=8,
    num_train_epochs=5,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_strategy='epoch',
    logging_steps=5,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio = 0.05,
    group_by_length=True,
    lr_scheduler_type="cosine",
    report_to="wandb",
    evaluation_strategy="epoch",
    do_eval=True,
    run_name = run_name,
    push_to_hub = True,
    hub_model_id = model_hub_id,
    hub_token=hf_key,
    hub_strategy="checkpoint",
    disable_tqdm=False
)

trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=eval_data,
    peft_config=peft_config,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
    callbacks=callbacks,
    packing=False,
    max_seq_length=512)

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [24]:
model = get_peft_model(model, peft_config)

In [25]:
if torch.cuda.is_available():
    torch.cuda.empty_cache()
torch.cuda.empty_cache()

In [26]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
wb_key = user_secrets.get_secret("wandb")

In [27]:
wandb.login(key=wb_key)

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [28]:
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33msamyukthamanayil[0m ([33mfinal-project-auto[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: wandb version 0.16.6 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade
[34m[1mwandb[0m: Tracking run with wandb version 0.16.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/kaggle/working/wandb/run-20240412_210346-eqk1v8gp[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mmistral-7b-instruct-mental-health-finetuned-v2[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/final-project-auto/huggingface[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/final-project-auto/huggingface/runs/eqk1v8gp[0m
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a c

Epoch,Training Loss,Validation Loss
1,1.1616,0.908778
2,1.0808,0.655484
3,0.8594,0.538648
4,0.6891,0.486409
5,0.6936,0.474282


TrainOutput(global_step=1760, training_loss=1.290723816914992, metrics={'train_runtime': 32836.2379, 'train_samples_per_second': 0.428, 'train_steps_per_second': 0.054, 'total_flos': 2.0673369079037952e+17, 'train_loss': 1.290723816914992, 'epoch': 5.0})

In [29]:
trainer.push_to_hub()
wandb.finish()
model.config.use_cache = True

[34m[1mwandb[0m:                                                                                
[34m[1mwandb[0m: 
[34m[1mwandb[0m: Run history:
[34m[1mwandb[0m:                      eval/loss █▄▂▁▁
[34m[1mwandb[0m:                   eval/runtime ▃▅█▁▂
[34m[1mwandb[0m:        eval/samples_per_second ▁▁▁▁▁
[34m[1mwandb[0m:          eval/steps_per_second ▁▁▁▁▁
[34m[1mwandb[0m:                    train/epoch ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
[34m[1mwandb[0m:              train/global_step ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
[34m[1mwandb[0m:            train/learning_rate ▃▆███████▇▇▇▇▇▆▆▆▆▅▅▅▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁
[34m[1mwandb[0m:                     train/loss █▅▃▃▄▄▄▄▄▄▄▂▃▂▃▃▃▃▃▃▂▂▂▂▃▃▃▃▂▁▂▁▂▂▂▂▃▁▁▁
[34m[1mwandb[0m:               train/total_flos ▁
[34m[1mwandb[0m:               train/train_loss ▁
[34m[1mwandb[0m:            train/train_runtime ▁
[34m[1mwandb[0m: train/train_samples_per_second ▁
[34m[1mwandb[0m:   train/train_step

In [30]:
from peft import PeftConfig, PeftModel

In [31]:
base_model = "mistralai/Mistral-7B-Instruct-v0.2"
adapter = "samyukthacodes/mental-health-mistral-7b-instructv0.2-finetuned-V2"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    base_model,
    add_bos_token=True,
    trust_remote_code=True,
    padding_side='left'
)

# Create peft model using base_model and finetuned adapter
config = PeftConfig.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
                                             load_in_4bit=True,
                                             device_map='auto',
                                             torch_dtype='auto')
model = PeftModel.from_pretrained(model, adapter)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()

# Prompt content:
messages = [
    {"role": "user", "content": "Hey Connor! I have been feeling a bit down lately.I could really use some advice on how to feel better?"}
]

input_ids = tokenizer.apply_chat_template(conversation=messages,
                                          tokenize=True,
                                          add_generation_prompt=True,
                                          return_tensors='pt').to(device)
output_ids = model.generate(input_ids=input_ids, max_new_tokens=512, do_sample=True, pad_token_id=2)
response = tokenizer.batch_decode(output_ids.detach().cpu().numpy(), skip_special_tokens = True)

# Model response: 
print(response[0])

adapter_config.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/27.3M [00:00<?, ?B/s]

[INST] Hey Connor! I have been feeling a bit down lately.I could really use some advice on how to feel better? [/INST] Hi! The number of reasons we can feel blue are as numerous as the stars in the night sky. You are the one that would best be able to determine the reason for feeling the way you do, and ultimately, figure out the The One True Answer, that is most likely unique to your situation.The The The The The The The The The The The The The  The  The  The  side of me (the one you are writing to) suggests that oftentime, the reason for feeling the way We (that is, people in general) do feeling sad, is that we have received the message that the way we are going about whatever it is we are doing, is wrong.If that the reason would be the one that you have, then it helps to find the the the the the right people who are able to understand where you are coming from and point you the right way.I think this the result of the The The The The The The The The The The The The The The The The T

In [32]:
!pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l- done
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l- \ done
[?25h  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24932 sha256=27b0e69e5ff9f23b14784377424d9e19536a5db755a2dcf469ab0b895e633305
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [33]:
from rouge_score import rouge_scorer

In [34]:
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
messages = X_eval['text'][0]

In [35]:
print(messages)


The conversation between a Human and an AI Mental Health Chatbot named Connor.
            [INST] You are a mental health expert. Your task is to generate an appropriate response based on the conversation given in square brackets.
            The tone of your answer should be warm and kind.
            [I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.
   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.
   How can I change my feeling of being worthless to everyone?][/INST]


In [36]:
def generate_response(context):
    # Prompt content:
    messages = [
        {"role":"user", "content": context}
        
    ]

    input_ids = tokenizer.apply_chat_template(conversation=messages,
                                              tokenize=True,
                                              add_generation_prompt=True,
                                              return_tensors='pt').to(device)
    output_ids = model.generate(input_ids=input_ids, max_new_tokens=512, do_sample=True, pad_token_id=2)
    response = tokenizer.batch_decode(output_ids.detach().cpu().numpy(), skip_special_tokens = True)

    # Model response: 
    return response[0]


In [37]:
candidate_summary = generate_response(messages)

In [38]:
print(candidate_summary)

[INST] The conversation between a Human and an AI Mental Health Chatbot named Connor.
            [INST] You are a mental health expert. Your task is to generate an appropriate response based on the conversation given in square brackets.
            The tone of your answer should be warm and kind.
            [I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.
   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.
   How can I change my feeling of being worthless to everyone?][/INST] [/INST] Hello. You're asking a very important question. Without a lot of details, it's difficult to know exactly what's going on. First, I want to acknowledge that you're talking about this in a public forum, which takes a lot of courage. Many people struggle with feelings of worthlessness, guilt, anxiety, sadness, fear, and many other challenging f

In [39]:
reference_summary = X_eval_for_test['text'][0]

In [40]:
print(reference_summary)

The conversation between a Human and an AI Mental Health Chatbot named Connor.
            [INST] You are a mental health expert. Your task is to generate an appropriate response based on the conversation given in square brackets.
            The tone of your answer should be warm and kind.
            [I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.
   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.
   How can I change my feeling of being worthless to everyone?][/INST]
            
            If everyone thinks you're worthless, then maybe you need to find new people to hang out with.Seriously, the social context in which a person lives is a big influence in self-esteem.Otherwise, you can go round and round trying to understand why you're not worthless, then go back to the same crowd and be knocked down again.There are 

In [41]:
scores = scorer.score(reference_summary, candidate_summary)
for key in scores:
    print(f'{key}: {scores[key]}')


rouge1: Score(precision=0.40159045725646125, recall=0.7087719298245614, fmeasure=0.5126903553299492)
rouge2: Score(precision=0.250996015936255, recall=0.44366197183098594, fmeasure=0.32061068702290074)
rougeL: Score(precision=0.30218687872763417, recall=0.5333333333333333, fmeasure=0.38578680203045684)
