# Evaluation of the Enriched ToposText Annotation in Book 4

We performed an evaluation of the enriched ToposText annotation by calculating Precision, Recall, and F1 Score. By combining the results of the ToposText annotation with the output of the Flair ner-large system, the Recall of the ToposText annotation significantly improved from 0.671 to 0.968, accompanied by a substantial reduction in false negatives from 635 to 61. On the other hand, the Precision exhibited a slight decrease from 0.991 to 0.949, leading to an increase in false positives from 11 to 99.

Overall, there is a notable enhancement in the annotation quality, as evidenced by the F1 Score which progressed from 0.800 to 0.958.

In [1]:
import pandas as pd

In [2]:
## open the Gold Standard of Book 4 (18,664 rows)
GoldStandard_Book4 = pd.read_excel("/Users/u0154817/OneDrive - KU Leuven/Documents/KU Leuven/PhD project 'Greek Spaces in Roman Times'/Data_Extraction/Outputs/1.4.GoldStandard_Book4.xlsx")

In [3]:
len(GoldStandard_Book4)

18664

In [4]:
## open the file containing the enriched ToposText annotation (18,664 entries)
Enriched_ToposText_Book4 = pd.read_csv("/Users/u0154817/OneDrive - KU Leuven/Documents/KU Leuven/PhD project 'Greek Spaces in Roman Times'/Data_Extraction/Outputs/3.2.Enriched_ToposText_Book4.csv")

In [5]:
len(Enriched_ToposText_Book4)

18664

In [6]:
## append the Gold Standard to the dataset of the enriched ToposText annotation
Enriched_ToposText_Book4['Manual_Annotation'] = GoldStandard_Book4['Manual_Annotation']

**Compute True Positive and False Negatives including partial matches**

In [7]:
True_Positives = [] ## create a list of true positives
False_Negatives = [] ## create a list of false negatives

In [8]:
for index, manual_annotation in enumerate(Enriched_ToposText_Book4['Manual_Annotation']): ## for each token
        
    if manual_annotation == 'B-LOC': ## for each B-LOC entity in the Gold Standard
        
        ## create a tuple containing the reference and start position
        reference_startpos = (Enriched_ToposText_Book4['Reference'][index], Enriched_ToposText_Book4['Start_pos'][index])
        
        if len(Enriched_ToposText_Book4['ToposText_update'][index]) > 1: ## if the enriched ToposText contains an annotation
            True_Positives.append(reference_startpos) ## it is a true positive
            
        else: ## if the the enriched ToposText does not contain an annotation
            
            if Enriched_ToposText_Book4['Manual_Annotation'][index+1] != 'I-LOC': ## if B-LOC is not followed by I-LOC
                False_Negatives.append(reference_startpos) ## it is a false negative
            
            else: ## if B-LOC is followed by I-LOC
                
                flag = False
                
                for n in range(1,100):
                    
                    if Enriched_ToposText_Book4['Manual_Annotation'][index+n] == 'I-LOC': ## inside the multi-word LOC entity
                        
                        if len(Enriched_ToposText_Book4['ToposText_update'][index+n]) > 1: ## the enriched ToposText contains an annotation
                            True_Positives.append((Enriched_ToposText_Book4['Reference'][index+n], Enriched_ToposText_Book4['Start_pos'][index+n])) ## it is a true positive
                            flag = True
                            break
                            
                    else: break
                        
                if flag == False: ## no entity was predicted in the span
                    False_Negatives.append(reference_startpos) ## it is a false negative

In [9]:
len(True_Positives)

1870

In [10]:
len(False_Negatives)

61

**Compute False Positives**

In [11]:
False_Positives = [] ## create a list of false positives

In [12]:
for index, ToposText_annotation in enumerate(Enriched_ToposText_Book4['ToposText_update']):
        
    if 'B-' in ToposText_annotation: ## for each B-place ToposText annotation
        
        ## create a tuple containing the reference and start position
        reference_startpos = (Enriched_ToposText_Book4['Reference'][index], Enriched_ToposText_Book4['Start_pos'][index])
        
        if len(Enriched_ToposText_Book4['Manual_Annotation'][index]) == 1: ## if the Gold Standard does not contain an entity
            
            if 'I-' not in Enriched_ToposText_Book4['ToposText_update'][index+1]: ## if B-place is not followed by I-place
                False_Positives.append(reference_startpos) ## it is a false positive
        
        else: ## if B-place is followed by I-place
            
            flag = False
            
            for n in range(1,100):
                
                if 'I-' in Enriched_ToposText_Book4['ToposText_update'][index+1]: ## inside the multi-word place annotation
                    
                    if len(Enriched_ToposText_Book4['Manual_Annotation'][index+n]) > 1: ## the Gold Standard contains an entity
                        flag = True
                        break
                        
                else: break
                        
                if flag == False:
                    False_Positives.append(reference_startpos) ## it is a false positive

In [13]:
len(False_Positives)

99

The enriched ToposText annotation has a Precision of 0.949.

In [14]:
## calculate precision

Precision = len(True_Positives) / (len(True_Positives) + len(False_Positives))
Precision

0.9497206703910615

The enriched ToposText annotation has a Recall of 0.968.

In [15]:
## calculate recall

Recall = len(True_Positives) / (len(True_Positives) + len(False_Negatives))
Recall

0.9684101501812532

The enriched ToposText annotation has a F1 Score of 0.958.

In [16]:
## calculate F1 Score

F1 = 2 * (Precision * Recall) / (Precision + Recall)
F1

0.9589743589743589