In [32]:
!pip install -q evaluate 
!pip install -q rouge-score

In [1]:
import pandas as pd
import re 
import evaluate

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)

from evaluate_model.generate_predictions import run_generation

In [3]:
# Load the model
tokenizer = AutoTokenizer.from_pretrained("jeffreykthomas/llama-mental-health")
model = AutoModelForCausalLM.from_pretrained("jeffreykthomas/llama-mental-health")

In [26]:
# Testing prompt
user_token = '<user>'
agent_token = '<agent>'
prompt = user_token+' Ive been feeling down for weeks now, but I guess its just my dysthymia acting up again. Ill try to keep a positive attitude, but its hard some days. '+agent_token
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
generate_ids = model.generate(inputs.input_ids, max_new_tokens=100)
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [27]:
output

"<user> Ive been feeling down for weeks now, but I guess its just my dysthymia acting up again. Ill try to keep a positive attitude, but its hard some days. <agent> I'm sorry to hear that you're feeling down. It's normal to feel down. It's normal to feel down because we all feel down at times. It's normal to feel down because we all have a part to play in our lives. It's normal to feel down because we all have a part to play in our communities. It's normal to feel down because we all have a part to play in our communities. It's normal"

In [115]:
#load all csv into a single dataframe
import glob

files = glob.glob('data/datasets/RogueEval/*.csv')
dfs = [pd.read_csv(file) for file in files]
df = pd.concat(dfs, ignore_index = True)

In [129]:
# Removes the system instruction from the text
def remove_sys_instr(text):
    text = text.strip()
    text = text.replace("\n", "")
    clean = re.sub(r'<<SYS>>(.*?)<</SYS>>', '', text)
    return clean

# Extracts the instruction as input
def extract_input(text):
    start = '[INST]'
    end = '[/INST]'
    idx1 = text.index(start)
    idx2 = text.index(end)
    input = text[idx1 + len(start) + 1: idx2]
    return input.strip()

# Extracts the output as reference
def extract_reference(text):
    reference = text.replace(extract_input(text), "")
    reference = reference.replace('[INST]', "")
    reference = reference.replace('[/INST]', "")
    return reference.strip()

def calculate_predictions(text):
    user_token = '<user> '
    agent_token = ' <agent>'
    text = user_token + text + agent_token
    inputs = tokenizer(text, return_tensors = "pt")
    # Generate
    generate_ids = model.generate(inputs.input_ids, max_new_tokens=100)
    output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    return output

def clean_predictions(text):
    #Remove User Input
    text = re.sub(r'<user>(.*?)<agent>', "", text)
    return text

In [116]:
# Seperate Inputs and References as texts
df['text'] = df['text'].apply(lambda x: remove_sys_instr(x))
df['Input'] = df['text'].apply(lambda x: extract_input(x))
df['Reference'] = df['text'].apply(lambda x: extract_reference(x))
df.head()

Unnamed: 0,text,Input,Reference
0,[INST] I can't decide what to wear today. I n...,I can't decide what to wear today. I need you ...,"As an expert in mental health, I would like to..."
1,"[INST] Please help me with this task, I canno...","Please help me with this task, I cannot do it ...",Thank you for reaching out for support. It tak...
2,"[INST] Hey there, I'm really struggling with ...","Hey there, I'm really struggling with my addic...","Hello there! As an expert in mental health, I'..."
3,"[INST] Hi, I'm struggling with my addiction a...","Hi, I'm struggling with my addiction and feel ...",the following is a response to the user messag...
4,"[INST] Hi, I'm really struggling with my addi...","Hi, I'm really struggling with my addiction ri...",Sorry to hear that you're struggling with addi...


In [None]:
# Generate Predictions based on Inputs
import warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    df['Predictions'] = df['Input'].apply(lambda x: calculate_predictions(x))
    
df.head()

In [131]:
df['Predictions'] = df['Predictions'].apply(lambda x: clean_predictions(x))

In [139]:
df.Predictions[0]

" I'm sorry to hear that you are struggling with this. I'm glad you are reaching out for help. I'm glad you are willing to try on clothes that you may not have been able to find on your own. I think it would be helpful to have a conversation with you about what you are looking for and what you are willing to do to find it. I think it would be helpful to have a conversation with you about what you are looking for and what you are willing"

In [132]:
df.to_csv('predictions.csv')

In [2]:
df = pd.read_csv('data/datasets/RogueEval/predictions.csv')
# sample 100 rows from dataframe
df = df.sample(100)
# reset index
df.reset_index(drop=True, inplace=True)

In [3]:
# Generate Predictions based on Inputs with a reference model
reference_model = 'Qwen/Qwen1.5-0.5B-Chat'
reference_predictions = run_generation(df['Input'].tolist(), num_batches=25, batch_size=4, model_path=reference_model)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
100%|██████████| 25/25 [01:40<00:00,  4.01s/it]


In [4]:
rouge = evaluate.load('rouge')

In [5]:
reference_results = rouge.compute(predictions=reference_predictions, references=df['Reference'])

In [6]:
reference_results = [reference_results]
results_df = pd.DataFrame(reference_results)
results_df['model'] = 'Qwen1.5-0.5B-Chat'
results_df.set_index('model', inplace=True)
results_df

Unnamed: 0_level_0,rouge1,rouge2,rougeL,rougeLsum
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Qwen1.5-0.5B-Chat,0.329147,0.091519,0.158278,0.197932


In [7]:
fine_tune_model = 'jeffreykthomas/llama-mental-health'
fine_tune_predictions = run_generation(df['Input'].tolist(), num_batches=100, batch_size=1, model_path=fine_tune_model)

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
100%|██████████| 100/100 [08:31<00:00,  5.12s/it]


In [8]:
# Add results of non distilled model to the dataframe
results = rouge.compute(predictions=fine_tune_predictions, references=df['Reference'])

In [9]:
new_data = pd.DataFrame([results]) 
new_data.index = ['llama-mental-health']  
results_df = pd.concat([results_df, new_data], axis=0) 
results_df

Unnamed: 0,rouge1,rouge2,rougeL,rougeLsum
Qwen1.5-0.5B-Chat,0.329147,0.091519,0.158278,0.197932
llama-mental-health,0.243432,0.032763,0.117626,0.117248


In [10]:
# Generate predictions with the distilled model
distilled_model_path = 'jeffreykthomas/llama-mental-health-distill'

In [11]:
distilled_predictions = run_generation(df['Input'].tolist(), num_batches=100, batch_size=1, model_path=distilled_model_path)

  3%|▎         | 3/100 [00:05<02:46,  1.72s/it]

Truncated response detected, retrying...


  4%|▍         | 4/100 [00:07<02:56,  1.84s/it]

Truncated response detected, retrying...
Truncated response detected, retrying...


  6%|▌         | 6/100 [00:12<03:41,  2.35s/it]

Truncated response detected, retrying...


  7%|▋         | 7/100 [00:13<02:39,  1.72s/it]

Truncated response detected, retrying...
Truncated response detected, retrying...


  8%|▊         | 8/100 [00:13<02:07,  1.38s/it]

Truncated response detected, retrying...


 14%|█▍        | 14/100 [00:23<02:02,  1.43s/it]

Truncated response detected, retrying...


 15%|█▌        | 15/100 [00:24<02:06,  1.49s/it]

Truncated response detected, retrying...


 21%|██        | 21/100 [00:31<01:14,  1.06it/s]

Truncated response detected, retrying...


 27%|██▋       | 27/100 [00:43<01:42,  1.40s/it]

Truncated response detected, retrying...


 35%|███▌      | 35/100 [00:52<01:30,  1.39s/it]

Truncated response detected, retrying...


 37%|███▋      | 37/100 [00:56<01:48,  1.73s/it]

Truncated response detected, retrying...


 47%|████▋     | 47/100 [01:15<01:19,  1.51s/it]

Truncated response detected, retrying...


 62%|██████▏   | 62/100 [01:44<01:11,  1.89s/it]

Truncated response detected, retrying...
Truncated response detected, retrying...


 65%|██████▌   | 65/100 [01:51<01:11,  2.05s/it]

Truncated response detected, retrying...
Truncated response detected, retrying...


 74%|███████▍  | 74/100 [02:05<00:30,  1.18s/it]

Truncated response detected, retrying...


 84%|████████▍ | 84/100 [02:16<00:11,  1.43it/s]

Truncated response detected, retrying...
Truncated response detected, retrying...


 85%|████████▌ | 85/100 [02:18<00:16,  1.11s/it]

Truncated response detected, retrying...


 92%|█████████▏| 92/100 [02:28<00:13,  1.70s/it]

Truncated response detected, retrying...


 94%|█████████▍| 94/100 [02:32<00:12,  2.04s/it]

Truncated response detected, retrying...
Truncated response detected, retrying...


100%|██████████| 100/100 [02:39<00:00,  1.59s/it]


In [12]:
distill_results = rouge.compute(predictions=distilled_predictions, references=df['Reference'])

In [13]:
new_data = pd.DataFrame([distill_results]) 
new_data.index = ['llama-mental-health-distill']  
results_df = pd.concat([results_df, new_data], axis=0) 
results_df

Unnamed: 0,rouge1,rouge2,rougeL,rougeLsum
Qwen1.5-0.5B-Chat,0.329147,0.091519,0.158278,0.197932
llama-mental-health,0.243432,0.032763,0.117626,0.117248
llama-mental-health-distill,0.254868,0.046444,0.123899,0.124033


In [16]:
for i in range(10):
    print(f"Input: {df['Input'][i]}")
    print("\n")
    print(f"Fine Tune Predictions: {fine_tune_predictions[i]}")
    print("\n")
    print(f"Distilled Predictions: {distilled_predictions[i]}")
    print("\n\n")

Input: I don't need your help, I can do everything on my own. You're just trying to control me like everyone else.


Fine Tune Predictions: Your accusation is completely false and your emotions are being stirred up by the guilt you feel for having hurt my feelings. It's not abuse if it is expressed in anger. I think that the only way you can be safe is to seek counseling. A lot of people have reported similar situations where their boyfriend has been abusive to them because they were afraid of their boyfriend and feeling unable to leave him. When your boyfriend demonstrates his power and control over you, it is very difficult to trust yourself. This is why counseling would be a great place to begin. In order to work through some of these fears and insecurities you might want to bring up this with your boyfriend about the situation. This may not be the first time you've experienced something like this. Perhaps he was the one who threatened you or hurt you in the past. In any case, talki