### Info
- Updated: 2024-05-15
- Author: Reshama S
- Location: https://github.com/NoLaB-Lab/nlp-project1

### Description
- Evalute human vs ai VS ASSEMBLY AI transcripts

### ROUGE score
- A ROUGE score close to zero indicates poor similarity between candidate and references. 
- A ROUGE score close to one indicates strong similarity between candidate and references. 
- If candidate is identical to one of the reference documents, then score is 1.

### Levenshtein score
https://rapidfuzz.github.io/Levenshtein/levenshtein.html#distance

#### Assembly AI
https://www.assemblyai.com/playground/playground/transcript/14cf0430-9281-4ea2-9569-019be2d715af


In [1]:
import evaluate
import pprint
from Levenshtein import distance
from Levenshtein import ratio

In [2]:
dir_human = "../data/transcripts-clinician/"
dir_ai = "../data/transcripts-whisper/"

dict_scores = {}

In [3]:
#Ref: https://towardsdatascience.com/side-by-side-comparison-of-strings-in-python-b9491ac858

import difflib
import re

def tokenize(s):
    return re.split('\s+', s)
def untokenize(ts):
    return ' '.join(ts)
        
def equalize(s1, s2):
    l1 = tokenize(s1)
    l2 = tokenize(s2)
    res1 = []
    res2 = []
    prev = difflib.Match(0,0,0)
    for match in difflib.SequenceMatcher(a=l1, b=l2).get_matching_blocks():
        if (prev.a + prev.size != match.a):
            for i in range(prev.a + prev.size, match.a):
                res2 += ['_' * len(l1[i])]
            res1 += l1[prev.a + prev.size:match.a]
        if (prev.b + prev.size != match.b):
            for i in range(prev.b + prev.size, match.b):
                res1 += ['_' * len(l2[i])]
            res2 += l2[prev.b + prev.size:match.b]
        res1 += l1[match.a:match.a+match.size]
        res2 += l2[match.b:match.b+match.size]
        prev = match
    return untokenize(res1), untokenize(res2)

def insert_newlines(string, every=64, window=10):
    result = []
    from_string = string
    while len(from_string) > 0:
        cut_off = every
        if len(from_string) > every:
            while (from_string[cut_off-1] != ' ') and (cut_off > (every-window)):
                cut_off -= 1
        else:
            cut_off = len(from_string)
        part = from_string[:cut_off]
        result += [part]
        from_string = from_string[cut_off:]
    return result

def show_comparison(s1, s2, width=40, margin=10, sidebyside=True, compact=False):
    s1, s2 = equalize(s1,s2)

    if sidebyside:
        s1 = insert_newlines(s1, width, margin)
        s2 = insert_newlines(s2, width, margin)
        if compact:
            for i in range(0, len(s1)):
                lft = re.sub(' +', ' ', s1[i].replace('_', '')).ljust(width)
                rgt = re.sub(' +', ' ', s2[i].replace('_', '')).ljust(width) 
                print(lft + ' | ' + rgt + ' | ')        
        else:
            for i in range(0, len(s1)):
                lft = s1[i].ljust(width)
                rgt = s2[i].ljust(width)
                print(lft + ' | ' + rgt + ' | ')
    else:
        print(s1)
        print(s2)

In [4]:
def readtext(patientnum, filename1, filename2):
    #print(filename)
    filename=patientnum
    # Added 2024-05-15
    dir_human = "../data/test-data/"
    dir_ai = "../data/test-data/"
    
    file = open(dir_human + filename1 + ".txt", "r")
    content_human = file.read()
    file.close()
    #print(content_human)
    #print("-" * 50)
    
    file_ai = open(dir_ai + filename2 + ".txt", "r")
    content_ai = file_ai.read()
    file_ai.close()
    #print(content_ai)
    #print("-" * 50)
    return filename, content_human, content_ai

In [5]:
def evaltext(filename, content_human, content_ai):
    #print(filename)
    # load the metric (from Hugging Face)
    score = evaluate.load('rouge')
    #score = evaluate.load("accuracy") # this gives error

    results = score.compute(predictions=[content_ai],
                         references=[content_human])
    print(results)
    dict_scores[filename] = results

    disagreement = distance(content_human, content_ai)
    print(f"Levenshtein disagreement: {disagreement}")
    
    ratiov = ratio(content_human, content_ai)
    print(f"Levensshtein ratio: {ratiov}")

    # Calculate normalized distance (between 0 and 1)
    levenshtein_distance = distance(content_human, content_ai)
    print(f"Levenshtein distance: {levenshtein_distance}")
    sentence_length = max(len(content_human), len(content_ai))
    normalized_distance = levenshtein_distance / sentence_length

    print(f"Normalized Levenshtein distance: {normalized_distance}")

    print('-' * 52)


In [6]:
def comptext(filename, content_human, content_ai):
    show_comparison(content_human, content_ai, width=50, sidebyside=True, compact=False)

In [7]:
def runanalysis(patientnum, filename1, filename2, printcomp):
    print(patientnum)
    filename, content_human, content_ai = readtext(patientnum, filename1, filename2)
    evaltext(filename, content_human, content_ai)
    if printcomp == 1:
        comptext(filename, content_human, content_ai)


## Compare human vs Assembly AI

In [8]:
# lowest score
runanalysis("SS_IMG_2863", "SS_IMG_2863_human_pt", "SS_IMG_2863_assemblyai_pt", printcomp=1)

SS_IMG_2863
{'rouge1': 0.7891156462585033, 'rouge2': 0.6758620689655173, 'rougeL': 0.7891156462585033, 'rougeLsum': 0.7891156462585033}
Levenshtein disagreement: 103
Levensshtein ratio: 0.8237037037037037
Levenshtein distance: 103
Normalized Levenshtein distance: 0.28065395095367845
----------------------------------------------------
Yeah _____ I have fairly ___ _ _______ ______      | ____ Yeah, I ____ ______ had a pretty. Fairly      | 
good, you know, no issues that I can really        | good, you know, no issues that I can really        | 
remeber but _________ ___ I do remember falling    | _______ ___ remember. But I do remember falling    | 
out of a tree. Thank god ___ it wasnt ______ too   | out of a tree. Thank ___ God it _____ wasn't too   | 
high Reaching _____ ___ _____ ________ for the     | ____ ________ high, you know, reaching for the     | 
next one and the next one andall ____ ___ ____     | next one and the next ___ ______ one, and then     | 
___ of a sudden _____

In [9]:
# lowest score
runanalysis("SS_IMG_2863", "SS_IMG_2863_human_pt", "SS_IMG_2863_whisperai_pt", printcomp=1)

SS_IMG_2863
{'rouge1': 0.8079470198675496, 'rouge2': 0.697986577181208, 'rougeL': 0.8079470198675496, 'rougeLsum': 0.8079470198675496}
Levenshtein disagreement: 102
Levensshtein ratio: 0.833810888252149
Levenshtein distance: 102
Normalized Levenshtein distance: 0.26153846153846155
----------------------------------------------------
Yeah _____ I have ___ _ ______ fairly good, you    | ____ Yeah. I ____ had a pretty fairly good, you    | 
know, no issues that I can really remeber          | know, no issues that I can really _______          | 
_________ but I do remember falling out of a       | remember, but I do remember falling out of a       | 
tree. Thank god ___ it wasnt ______ too high       | tree. Thank ___ God it _____ wasn't too ____       | 
Reaching _____ ___ _____ ________ for the next     | ________ high. You know, reaching for the next     | 
one and the next one andall ____ ___ ____ ___ of   | one and the next ___ ______ one, and then all of   | 
a sudden ______ _______