### **Installing dependencies**

In [None]:
%%capture
!pip install datasets
!pip install peft
!pip install bitsandbytes
!pip install transformers[torch]
!pip install pandas
!pip install trl
!pip install accelerate
!pip install google
!pip install tensorboard

### **Importing Libraries**

In [None]:
from random import randrange
import numpy as np
import pandas as pd
from datasets import load_dataset, Dataset
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
    AutoModel,
)
from peft import LoraConfig
from trl import SFTTrainer
from sklearn.model_selection import train_test_split
import requests

### **Importing Datasets**

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
dataset1 = load_dataset("Amod/mental_health_counseling_conversations")
dataset2 = load_dataset("nbertagnolli/counsel-chat")

Repo card metadata block was not found. Setting CardData to empty.


### **Preprocessing**

In [None]:
df1 = pd.DataFrame(dataset1["train"])
df2 = pd.DataFrame(dataset2["train"])

In [None]:
df1.head(5)

Unnamed: 0,Context,Response
0,I'm going through some things with my feelings...,"If everyone thinks you're worthless, then mayb..."
1,I'm going through some things with my feelings...,"Hello, and thank you for your question and see..."
2,I'm going through some things with my feelings...,First thing I'd suggest is getting the sleep y...
3,I'm going through some things with my feelings...,Therapy is essential for those that are feelin...
4,I'm going through some things with my feelings...,I first want to let you know that you are not ...


In [None]:
df2.head(5)

Unnamed: 0,questionID,questionTitle,questionText,questionLink,topic,therapistInfo,therapistURL,answerText,upvotes,views
0,0,Do I have too many issues for counseling?,I have so many issues to address. I have a his...,https://counselchat.com/questions/do-i-have-to...,depression,Jennifer MolinariHypnotherapist & Licensed Cou...,https://counselchat.com/therapists/jennifer-mo...,It is very common for people to have multiple ...,3,1971
1,0,Do I have too many issues for counseling?,I have so many issues to address. I have a his...,https://counselchat.com/questions/do-i-have-to...,depression,"Jason Lynch, MS, LMHC, LCAC, ADSIndividual & C...",https://counselchat.com/therapists/jason-lynch...,"I've never heard of someone having ""too many i...",2,386
2,0,Do I have too many issues for counseling?,I have so many issues to address. I have a his...,https://counselchat.com/questions/do-i-have-to...,depression,Shakeeta TorresFaith Based Mental Health Couns...,https://counselchat.com/therapists/shakeeta-to...,Absolutely not. I strongly recommending worki...,2,3071
3,0,Do I have too many issues for counseling?,I have so many issues to address. I have a his...,https://counselchat.com/questions/do-i-have-to...,depression,"Noorayne ChevalierMA, RP, CCC, CCAC, LLP (Mich...",https://counselchat.com/therapists/noorayne-ch...,Let me start by saying there are never too man...,2,2643
4,0,Do I have too many issues for counseling?,I have so many issues to address. I have a his...,https://counselchat.com/questions/do-i-have-to...,depression,"Toni Teixeira, LCSWYour road to healing begins...",https://counselchat.com/therapists/toni-teixei...,I just want to acknowledge you for the courage...,1,256


In [None]:
df3 = df2[["questionText", "answerText"]]
df3 = df3.rename(columns={"questionText":"Context", "answerText":"Response"})
df3.head(5)

Unnamed: 0,Context,Response
0,I have so many issues to address. I have a his...,It is very common for people to have multiple ...
1,I have so many issues to address. I have a his...,"I've never heard of someone having ""too many i..."
2,I have so many issues to address. I have a his...,Absolutely not. I strongly recommending worki...
3,I have so many issues to address. I have a his...,Let me start by saying there are never too man...
4,I have so many issues to address. I have a his...,I just want to acknowledge you for the courage...


In [None]:
final_df = pd.concat([df3, df1], axis=0)
final_df["instructions"] = '''Given the Patient's Context, provide Response that has a diagnosis of the Patient'''
final_df.head()

Unnamed: 0,Context,Response,instructions
0,I have so many issues to address. I have a his...,It is very common for people to have multiple ...,"Given the Patient's Context, provide Response ..."
1,I have so many issues to address. I have a his...,"I've never heard of someone having ""too many i...","Given the Patient's Context, provide Response ..."
2,I have so many issues to address. I have a his...,Absolutely not. I strongly recommending worki...,"Given the Patient's Context, provide Response ..."
3,I have so many issues to address. I have a his...,Let me start by saying there are never too man...,"Given the Patient's Context, provide Response ..."
4,I have so many issues to address. I have a his...,I just want to acknowledge you for the courage...,"Given the Patient's Context, provide Response ..."


In [None]:
print(f"Final Length of the dataframe: {len(final_df)}")

Final Length of the dataframe: 6287


In [None]:
train_df, test_df = train_test_split(final_df, test_size=0.2, random_state=42)

In [None]:
print(f"Length of training set: {len(train_df)}")
print(f"Length of testing set: {len(test_df)}")

Length of training set: 5029
Length of testing set: 1258


In [None]:
conversation_train = Dataset.from_pandas(train_df)
conversation_test = Dataset.from_pandas(test_df)

In [None]:
print(conversation_train)
print(conversation_test)

Dataset({
    features: ['Context', 'Response', 'instructions', '__index_level_0__'],
    num_rows: 5029
})
Dataset({
    features: ['Context', 'Response', 'instructions', '__index_level_0__'],
    num_rows: 1258
})


## **Fine-tuning Llama-3**

### **Data Formatting using Alpaca Format for Llama-3**
Note that in the recent reports in Literature and individual experimentations, Llama-3 unlike Llama2 is not robust to the format of data and prompt therefore we are keeping the same format used in Llama-2, Alpaca.

In [None]:
def format_prompt(sample):
    return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample["instructions"]}

### Input:
{sample["Context"]}

### Response:
{sample["Response"]}
"""

### **Loading Llama-3B Model using Quantization**

In [None]:
model_name = "meta-llama/Meta-Llama-3-8B"
use_flash_attention = False

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type ="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    use_cache = False,
    use_flash_attention_2 = use_flash_attention,
    device_map="auto",
    torch_dtype=torch.float16,
)

model.config.pretraining_tp=1

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/177 [00:00<?, ?B/s]

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=8,
    bias="none",
    task_type="CAUSAL_LM",
)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### **Comparison in the number of trainable parameters in Original Model and Quantized Model**

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

print_trainable_parameters(model)

trainable params: 3407872 || all params: 4544008192 || trainable%: 0.07499704789264605


### **Training**

In [None]:
from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="/llama-3",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit",
    logging_steps=10,
    save_strategy="no",
    learning_rate=2e-4,
    fp16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type="linear",
    report_to="tensorboard",
)

In [None]:
from trl import SFTTrainer

max_seq_length = 512
trainer=SFTTrainer(
    model=model,
    train_dataset=conversation_train,
    peft_config=peft_config,
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    packing=True,
    formatting_func=format_prompt,
    args=args,
)

Generating train split: 0 examples [00:00, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [None]:
trainer.train()



Step,Training Loss
10,2.5216
20,2.4702
30,2.4674
40,2.2155
50,2.0902
60,2.0874
70,2.022
80,2.1111
90,2.0754
100,1.9404


TrainOutput(global_step=1224, training_loss=1.9061264941115785, metrics={'train_runtime': 17523.9096, 'train_samples_per_second': 0.559, 'train_steps_per_second': 0.07, 'total_flos': 2.2585833145604506e+17, 'train_loss': 1.9061264941115785, 'epoch': 3.0})

## **Uploading the model to HuggingFace**

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
trainer.model.push_to_hub(repo_id="omertafveez/Llama-3-TherapyChatBot")

adapter_model.safetensors:   0%|          | 0.00/13.6M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/omertafveez/Llama-3-TherapyChatBot/commit/814c0c1ac54fc3f8cb21364049cb42b4c72f6ad3', commit_message='Upload model', commit_description='', oid='814c0c1ac54fc3f8cb21364049cb42b4c72f6ad3', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
model_name = "meta-llama/Meta-Llama-3-8B"
use_flash_attention = False

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type ="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

In [None]:
from transformers import AutoModel
model = AutoModel.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    use_cache = False,
    use_flash_attention_2 = False,
    torch_dtype=torch.float16,
)

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
from peft import PeftModel

adapter_model = PeftModel.from_pretrained(model, "omertafveez/Llama-3-TherapyChatBot")

In [None]:
model2 = adapter_model.merge_and_unload()

In [None]:
model2.push_to_hub(repo_id="omertafveez/Llama-3-TherapyChatBot")

README.md:   0%|          | 0.00/5.20k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.65G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/omertafveez/Llama-3-TherapyChatBot/commit/8f6ea6a97828d217c90bbf021239ad34e00881ce', commit_message='Upload model', commit_description='', oid='8f6ea6a97828d217c90bbf021239ad34e00881ce', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.push_to_hub("omertafveez/Llama-3-TherapyChatBot")

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


CommitInfo(commit_url='https://huggingface.co/omertafveez/Llama-3-TherapyChatBot/commit/864f8130394847be5413977280b08edc43130c21', commit_message='Upload tokenizer', commit_description='', oid='864f8130394847be5413977280b08edc43130c21', pr_url=None, pr_revision=None, pr_num=None)

## **Inference**

In [None]:
model_name = "omertafveez/Llama-3-TherapyChatBot"
use_flash_attention = False

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type ="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

In [None]:
from transformers import LlamaForCausalLM
model = LlamaForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    use_cache = False,
    use_flash_attention_2 = use_flash_attention,
    device_map="auto",
    torch_dtype=torch.float16,
)

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/177 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/13.6M [00:00<?, ?B/s]

In [None]:
tokenizer = AutoTokenizer.from_pretrained("omertafveez/Llama-3-TherapyChatBot")

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
def format_inference_prompt(instruction, context):
    return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{context}

### Response:
"""

instruction = "How can I address feelings of worthlessness?"
context = "I feel sad all the time. Am I worthless?"

formatted_prompt = format_inference_prompt(instruction, context)

inputs = tokenizer(formatted_prompt, return_tensors="pt")

generate_ids = model.generate(inputs['input_ids'], max_length=512) # Adjust max_length as needed

response = tokenizer.decode(generate_ids[0], skip_special_tokens=True)

response_start_idx = response.rfind("### Response:\n") + len("### Response:\n")
actual_response = response[response_start_idx:].strip()

print(actual_response)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


I'm so sorry you're feeling sad and worthless.  I'm sure you've been told by others that you are worthless.  I'm not sure what you've done to make them feel this way.  I'm sure there's a reason.  But, even if you've done something terrible, you are not worthless.  You are worthy of love and forgiveness.  I would encourage you to find a therapist to work through the issues that are causing you to feel sad and worthless.  It sounds like you've been hurting for a while.  It's time to get some help.  I wish you the best.
