### Info
- Date: 2024-05-05
- Author: Reshama S
- Location: https://github.com/NoLaB-Lab/nlp-project1

### Description
- Evalute human vs ai transcripts

### ROUGE score
- A ROUGE score close to zero indicates poor similarity between candidate and references. 
- A ROUGE score close to one indicates strong similarity between candidate and references. 
- If candidate is identical to one of the reference documents, then score is 1.

### Levenshtein score
https://rapidfuzz.github.io/Levenshtein/levenshtein.html#distance

In [1]:
import evaluate
import pprint
from Levenshtein import distance
from Levenshtein import ratio

In [2]:
dir_human = "../data/transcripts-clinician/"
dir_ai = "../data/transcripts-whisper/"

dict_scores = {}

In [3]:
#Ref: https://towardsdatascience.com/side-by-side-comparison-of-strings-in-python-b9491ac858

import difflib
import re

def tokenize(s):
    return re.split('\s+', s)
def untokenize(ts):
    return ' '.join(ts)
        
def equalize(s1, s2):
    l1 = tokenize(s1)
    l2 = tokenize(s2)
    res1 = []
    res2 = []
    prev = difflib.Match(0,0,0)
    for match in difflib.SequenceMatcher(a=l1, b=l2).get_matching_blocks():
        if (prev.a + prev.size != match.a):
            for i in range(prev.a + prev.size, match.a):
                res2 += ['_' * len(l1[i])]
            res1 += l1[prev.a + prev.size:match.a]
        if (prev.b + prev.size != match.b):
            for i in range(prev.b + prev.size, match.b):
                res1 += ['_' * len(l2[i])]
            res2 += l2[prev.b + prev.size:match.b]
        res1 += l1[match.a:match.a+match.size]
        res2 += l2[match.b:match.b+match.size]
        prev = match
    return untokenize(res1), untokenize(res2)

def insert_newlines(string, every=64, window=10):
    result = []
    from_string = string
    while len(from_string) > 0:
        cut_off = every
        if len(from_string) > every:
            while (from_string[cut_off-1] != ' ') and (cut_off > (every-window)):
                cut_off -= 1
        else:
            cut_off = len(from_string)
        part = from_string[:cut_off]
        result += [part]
        from_string = from_string[cut_off:]
    return result

def show_comparison(s1, s2, width=40, margin=10, sidebyside=True, compact=False):
    s1, s2 = equalize(s1,s2)

    if sidebyside:
        s1 = insert_newlines(s1, width, margin)
        s2 = insert_newlines(s2, width, margin)
        if compact:
            for i in range(0, len(s1)):
                lft = re.sub(' +', ' ', s1[i].replace('_', '')).ljust(width)
                rgt = re.sub(' +', ' ', s2[i].replace('_', '')).ljust(width) 
                print(lft + ' | ' + rgt + ' | ')        
        else:
            for i in range(0, len(s1)):
                lft = s1[i].ljust(width)
                rgt = s2[i].ljust(width)
                print(lft + ' | ' + rgt + ' | ')
    else:
        print(s1)
        print(s2)

In [4]:
def readtext(filename):
    #print(filename)
    file = open(dir_human + filename + ".txt", "r")
    content_human = file.read()
    file.close()
    #print(content_human)
    #print("-" * 50)
    
    file_ai = open(dir_ai + filename + ".txt", "r")
    content_ai = file_ai.read()
    file_ai.close()
    #print(content_ai)
    #print("-" * 50)
    return filename, content_human, content_ai

In [5]:
def evaltext(filename, content_human, content_ai):
    #print(filename)
    # load the metric (from Hugging Face)
    score = evaluate.load('rouge')
    #score = evaluate.load("accuracy") # this gives error

    results = score.compute(predictions=[content_ai],
                         references=[content_human])
    print(results)
    dict_scores[filename] = results

    disagreement = distance(content_human, content_ai)
    print(f"Levenshtein disagreement: {disagreement}")
    
    ratiov = ratio(content_human, content_ai)
    print(f"Levensshtein ratio: {ratiov}")

    # Calculate normalized distance (between 0 and 1)
    levenshtein_distance = distance(content_human, content_ai)
    print(f"Levenshtein distance: {levenshtein_distance}")
    sentence_length = max(len(content_human), len(content_ai))
    normalized_distance = levenshtein_distance / sentence_length

    print(f"Normalized Levenshtein distance: {normalized_distance}")

    print('-' * 52)


In [6]:
def comptext(filename, content_human, content_ai):
    show_comparison(content_human, content_ai, width=50, sidebyside=True, compact=False)

In [7]:
def runanalysis(patientnum):
    print(patientnum)
    filename, content_human, content_ai = readtext(patientnum)
    evaltext(filename, content_human, content_ai)
    comptext(filename, content_human, content_ai)


In [8]:
runanalysis("AJ_IMG_3334")

AJ_IMG_3334
{'rouge1': 0.912751677852349, 'rouge2': 0.8224719101123596, 'rougeL': 0.9038031319910516, 'rougeLsum': 0.9038031319910516}
Levenshtein disagreement: 156
Levensshtein ratio: 0.9153952843273232
Levenshtein distance: 156
Normalized Levenshtein distance: 0.14156079854809436
--------------------------------------------------
I want you to tell me about any kind of trip that  | I want you to tell me about any kind of trip that  | 
you've taken. I know you've traveled a lot. So     | you've taken. I know you've traveled a lot. So     | 
tell me a little bit about, you know, a trip that  | tell me a little bit about, you know, a trip that  | 
you enjoyed. Well, we went to Italy. I enjoyed     | you enjoyed. Well, we went to Italy. I enjoyed     | 
Italy, enjoyed the food. I enjoyed the wine _____  | Italy, enjoyed the food. I enjoyed the ____ wine,  | 
and the women are fantastic. Um its ____ not the   | and the women are fantastic. __ ___ It's not the   | 
safest place in the worl

In [9]:
# highest score
runanalysis("RF_IMG_3241")

RF_IMG_3241
{'rouge1': 0.9210526315789473, 'rouge2': 0.847682119205298, 'rougeL': 0.9210526315789473, 'rougeLsum': 0.9144736842105264}
Levenshtein disagreement: 81
Levensshtein ratio: 0.9309878213802436
Levenshtein distance: 81
Normalized Levenshtein distance: 0.10857908847184987
--------------------------------------------------
 I'm also going to ask you to tell me a story      |  I'm also going to ask you to tell me a story      | 
about a bad, hopefully not too traumatic, but      | about a bad, hopefully not too traumatic, but      | 
like a bad childhood memory _______ if you have    | like a bad childhood ______ memory, if you have    | 
any ____ or adolescent ____________ something      | ___ any, or __________ adolescence, something      | 
that happened that wasn't too great ______ ___ I   | that happened that wasn't ___ _____ great. Oh, I   | 
remember when I was very young. well ______ _____  | remember when I was very ______ ____ young, well,  | 
not very _____ maybe about

In [10]:
# lowest score
runanalysis("SS_IMG_2863")

SS_IMG_2863
{'rouge1': 0.7400881057268723, 'rouge2': 0.5866666666666667, 'rougeL': 0.7312775330396477, 'rougeLsum': 0.7312775330396477}
Levenshtein disagreement: 208
Levensshtein ratio: 0.7803521779425394
Levenshtein distance: 208
Normalized Levenshtein distance: 0.3382113821138211
--------------------------------------------------
Now One  ___ ___ more thing I'm going ____ ___     | ___ ___  And one more thing ___ _____ I'll ask     | 
___ to have you tell me _____ is a childhood       | you to ____ ___ tell me about is a childhood       | 
memory that wasn't good, that was sad, ____ _ ___  | memory that wasn't good, ____ ___ ____ like a sad  | 
___ ___ _____ hopefully not too dramatic Yeah      | or, you know, hopefully not too ________ ____      | 
__________ ___ ____ _ ___ ______ ____ __________   | traumatic, but like a bad memory from childhood.   | 
_____ I have ___ _ ______ fairly good, you know,   | Yeah. I ____ had a pretty fairly good, you know,   | 
no issues that I can rea

In [12]:
# Baseline: both texts are the same
runanalysis("test")

test
{'rouge1': 1.0, 'rouge2': 1.0, 'rougeL': 1.0, 'rougeLsum': 1.0}
Levenshtein disagreement: 0
Levensshtein ratio: 1.0
Levenshtein distance: 0
Normalized Levenshtein distance: 0.0
--------------------------------------------------
I want you to tell me about any kind of trip that  | I want you to tell me about any kind of trip that  | 
you've taken. I know you've traveled a lot. So     | you've taken. I know you've traveled a lot. So     | 
tell me a little bit about, you know, a trip that  | tell me a little bit about, you know, a trip that  | 
you enjoyed. Well, we went to Italy. I enjoyed     | you enjoyed. Well, we went to Italy. I enjoyed     | 
Italy, enjoyed the food. I enjoyed the wine and    | Italy, enjoyed the food. I enjoyed the wine and    | 
the women are fantastic. Um its not the safest     | the women are fantastic. Um its not the safest     | 
place in the world but you kind of live with, you  | place in the world but you kind of live with, you  | 
kind you live with

In [13]:
# remove commas
runanalysis("test1")

test1
{'rouge1': 1.0, 'rouge2': 1.0, 'rougeL': 1.0, 'rougeLsum': 1.0}
Levenshtein disagreement: 6
Levensshtein ratio: 0.9955089820359282
Levenshtein distance: 6
Normalized Levenshtein distance: 0.00894187779433681
--------------------------------------------------
I want you to tell me about any kind of trip that  | I want you to tell me about any kind of trip that  | 
you've taken. I know you've traveled a lot. So     | you've taken. I know you've traveled a lot. So     | 
tell me a little bit about, _____ you know, ____   | tell me a little bit ______ about you _____ know   | 
a trip that you enjoyed. Well, we went to Italy.   | a trip that you enjoyed. Well, we went to Italy.   | 
I enjoyed Italy, _____ enjoyed the food. I         | I enjoyed ______ Italy enjoyed the food. I         | 
enjoyed the wine and the women are fantastic. Um   | enjoyed the wine and the women are fantastic. Um   | 
its not the safest place in the world but you      | its not the safest place in the world bu

In [15]:
# remove all periods
runanalysis("test2")

test2
{'rouge1': 1.0, 'rouge2': 1.0, 'rougeL': 1.0, 'rougeLsum': 1.0}
Levenshtein disagreement: 22
Levensshtein ratio: 0.9833333333333333
Levenshtein distance: 22
Normalized Levenshtein distance: 0.03278688524590164
--------------------------------------------------
I want you to tell me about any kind of trip that  | I want you to tell me about any kind of trip that  | 
you've taken. _____ I know you've traveled a lot.  | you've ______ taken I know you've traveled a lot.  | 
So tell me a little bit about, _____ you know,     | So tell me a little bit ______ about you _____     | 
____ a trip that you enjoyed. _______ Well, we     | know a trip that you ________ enjoyed Well, we     | 
went to Italy. _____ I enjoyed Italy, _____        | went to ______ Italy I enjoyed ______ Italy        | 
enjoyed the food. ____ I enjoyed the wine and the  | enjoyed the _____ food I enjoyed the wine and the  | 
women are fantastic. _________ Um its not the      | women are __________ fantastic Um its 

In [17]:
# texting types of words
runanalysis("test3")

test3
{'rouge1': 0.9444444444444444, 'rouge2': 0.9428571428571428, 'rougeL': 0.9444444444444444, 'rougeLsum': 0.9444444444444444}
Levenshtein disagreement: 10
Levensshtein ratio: 0.9691211401425178
Levenshtein distance: 10
Normalized Levenshtein distance: 0.04672897196261682
--------------------------------------------------
two-thirds ___ vs 2/3. Capital Letters vs capital  | __________ 2/3 vs 2/3. Capital Letters vs capital  | 
letters. Misspelled words vs mispelled words.      | letters. Misspelled words vs mispelled words.      | 
Three-quarters vs three quarters. Ten vs 10.       | Three-quarters vs three quarters. Ten vs 10.       | 
Eleven vs 11. Italy vs italy Ummm vs Punction      | Eleven vs 11. Italy vs italy Ummm vs Punction      | 
included. vs no punctuation                        | included. vs no punctuation                        | 


In [None]:
evaltext("AJ_IMG_3334")
evaltext("AJ_IMG_3335")
evaltext("AP_IMG_3383")
evaltext("AP_IMG_3384")
evaltext("BM_IMG_3480")

evaltext("BM_IMG_3481")
evaltext("MW_IMG_3200")
evaltext("MW_IMG_3201")
evaltext("PG_IMG_3189")
evaltext("PG_IMG_3190")

evaltext("RF_IMG_3240")
evaltext("RF_IMG_3241")
evaltext("SS_IMG_2862")
evaltext("SS_IMG_2863")

In [None]:
#dict_scores