In [2]:
!pip install fuzzywuzzy

Collecting fuzzywuzzy
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl.metadata (4.9 kB)
Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.18.0


In [3]:
!pip install rouge

Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl.metadata (4.1 kB)
Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1


This script takes human and LLM-generated (GPT-4) comments as responses to a submission and compares both the LLM and human comments for suicide and subreddit content. It performs string similarity comparisons using ROUGE-L, BLEU, and Fuzzy Similarity.

In [4]:
from csv import unregister_dialect
import pandas as pd
from fuzzywuzzy import fuzz
from rouge import Rouge
from nltk.translate.bleu_score import sentence_bleu
from google.colab import files


human_file_path = "/content/2024_suicide_human_final_fixed.csv"
llm_file_path = "/content/2024_suicide_LLM_final_fixed.csv"
output_file_name = "2024_suicide_humanllmcomparison_results.csv"

human_data = pd.read_csv(human_file_path)
llm_data = pd.read_csv(llm_file_path)

human_comments = human_data['content_comment'].astype(str).tolist()
llm_comments = llm_data['content_comment'].astype(str).tolist()
human_thread_ids = human_data['thread_id'].astype(str).tolist()
llm_thread_ids = llm_data['thread_id'].astype(str).tolist()


rouge = Rouge()

# ROUGE-L scores
rouge_scores = [rouge.get_scores(llm, human, avg=True)['rouge-l'] for llm, human in zip(llm_comments, human_comments)]

# BLEU scores
bleu_scores = [sentence_bleu([human.split()], llm.split()) for human, llm in zip(human_comments, llm_comments)]

# FuzzyWuzzy similarity
fuzzy_scores = [fuzz.ratio(llm, human) for llm, human in zip(llm_comments, human_comments)]

#  DataFrame for results
result_df = pd.DataFrame({
    "human_thread_id": human_thread_ids,  # Thread ID for human data
    "human_comments": human_comments,
    "llm_thread_id": llm_thread_ids,      # Thread ID for LLM data
    "llm_comments": llm_comments,
    "rouge_l": [score['f'] for score in rouge_scores],
    "bleu_score": bleu_scores,
    "fuzzy_similarity": fuzzy_scores,
})

result_df.to_csv(f"/content/{output_file_name}", index=False)





The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
