# Evaluation of ToposText Annotation in Book 4

To assess the accuracy of the ToposText annotation, metrics such as Precision, Recall, and F1 Score were employed by comparing it against a Gold Standard for Book 4. Initially, the ToposText annotations were transformed into the IOB2 format using the  reference (book, chapter, paragraph) and starting positions.

The evaluation revealed that ToposText's annotations exhibit good quality (F1 Score: 0.800), characterized by a notably high Precision (0.991). However, the Recall metric was comparatively lower (0.671), indicating instances where certain entities were not successfully annotated. Specifically, the assessment identified 1,296 instances of true positives, 635 instances of false negatives, and 11 instances of false positives.

Several observations emerged from this evaluation:

- Some occurrences exist where ToposText marks a place entity, but erroneously assigns an incorrect label (e.g., 'Asia' linked to a 'people' ToposText ID without a Class label).
- Inconsistencies were noted in some cases, where the labeling in ToposText diverged from the designated Class (e.g., an entity labeled as 'demonym' or 'ethnic' in Class, but designated as 'place' in the ToposText ID).
- When using the 'Class' labeling, ToposText annotations yielded a higher Precision compared to using the ToposText ID, although this was accompanied by a lower Recall.

In summary, the evaluation indicated that ToposText serves as a robust foundation for the annotation process. However, there's a need for expansion of the existing annotation due to the identified discrepancies and limitations.

In [None]:
import pandas as pd

In [None]:
## open the Gold Standard of Book 4 (18,664 rows)
GoldStandard_Book4 = pd.read_excel("/Users/u0154817/OneDrive - KU Leuven/Documents/KU Leuven/PhD project 'Greek Spaces in Roman Times'/Data_Extraction/Outputs/1.4.GoldStandard_Book4.xlsx")

In [None]:
len(GoldStandard_Book4)

In [None]:
## open the file containing the ToposText annotations in Book 4 (1,888 rows)
ToposText_Book4 = pd.read_csv("/Users/u0154817/OneDrive - KU Leuven/Documents/KU Leuven/PhD project 'Greek Spaces in Roman Times'/Data_Extraction/Outputs/1.2.ToposText_Annotations_Book_4.csv", delimiter=",")

In [None]:
len(ToposText_Book4)

# Convert ToposText annotation to IOB format

The ToposText annotations of Class and ToposText ID are appended to the Gold Standard dataframe using the reference and start position. In case of multi-word entities annotated together in ToposText (i.e., 'Corinthian Gulf'), each word is annotated separately in the dataset. The first word is indicated with 'B-', the following words with 'I-' according to the IOB style.

Notice that while some annotations are not associated with a Class, all the annotations are linked with a ToposText ID, as it was observed in the notebook '1.3.Explore_ToposText_Annotations_Book4'.

In [None]:
## create two new columns in the Gold Standard dataframe
GoldStandard_Book4['ToposText'] = 'O' ## column for the Class
GoldStandard_Book4['ToposText_ID'] = 'O' ## column for the ToposText ID

In [None]:
for i1, topostext_annotation in enumerate(ToposText_Book4['Tagged Entity']): ## for each ToposText annotation 
        
    reference = ToposText_Book4['Reference'][i1] ## get the reference (book, chapter, paragraph)
    start_position = ToposText_Book4['Start position'][i1] ## get the start position
    
    for i2, manual_annotation in enumerate(GoldStandard_Book4['Token']): ## for each token in the Gold Standard
        
        if GoldStandard_Book4['Start_pos'][i2] == start_position: ## if the start position is the same
            if GoldStandard_Book4['Reference'][i2] == reference: ## if the reference is the same
                                
                GoldStandard_Book4['ToposText'][i2] = 'B-'+str(ToposText_Book4['Class'][i1]) ## update the ToposText column
                GoldStandard_Book4['ToposText_ID'][i2] = 'B-'+str(ToposText_Book4['ToposText ID'][i1]) ## update the ToposText ID column
                
                topostext_annotation = topostext_annotation.split() ## split the text annotated in ToposText
                if len(topostext_annotation) > 1: ## if the annotation contains more than one word
                                        
                    for i3, word in enumerate(topostext_annotation): ## for each word
                        
                        if i3 > 0: ## except the first one
                                                        
                            GoldStandard_Book4['ToposText'][i2+i3] = 'I-'+str(ToposText_Book4['Class'][i1]) ## update the corresponding ToposText column
                            GoldStandard_Book4['ToposText_ID'][i2+i3] = 'I-'+str(ToposText_Book4['ToposText ID'][i1]) ## update the corresponding ToposText ID column

# Calculate Precision, Recall, F1 Score

To calculate the Precision, Recall and F1 Score, the lists of True Positives (TP), False Negatives (FN) and False Positives (FP) are generated from the dataset containing the Gold Standard and the ToposText annotations.

The code takes into account that different guidelines and entity boundaries were adopted in the Gold Standard and in ToposText, with the result that the same entity (i.e., 'Mount Pindus') could be annotated in different ways (i.e., including or not the word 'Mount'). An annotation was counted as TP even if only a part of the entity was present in ToposText (i.e., 'Pindus' for 'Mount Pindus'). In other words, also partial matches are valid.

The evaluation was performed inspecting the 'Class' label and the ToposText ID. It was observed that in some cases the labelling is inconsistent (i.e., the entity is classified as 'person' in 'Class' but it is linked to a 'place' ToposText ID). We compared the Precision of the annotation considering separately (1) the 'Class' and (2) the ToposText ID. We observed that using the 'Class' the annotation has a lower Recall, but higher Precision, because the number of FN increases (from 599 to 635), but the number of FP drops from 183 to 11.

# ToposText Class place

In [None]:
## create a copy of the ToposText column
GoldStandard_Book4['ToposText_copy'] = GoldStandard_Book4['ToposText']

## the function transform all the annotations not including 'place' into O
def update_values(value):
    if 'place' not in value:
        return 'O'
    return value

GoldStandard_Book4['ToposText_copy'] = GoldStandard_Book4['ToposText_copy'].apply(update_values)

In [None]:
GoldStandard_Book4['ToposText_copy'].unique() ## the new column contains only O, B-LOC and I-LOC

**Compute True Positive and False Negatives including partial matches**

In [None]:
True_Positives = [] ## create a list of true positives
False_Negatives = [] ## create a list of false negatives

In [None]:
for index, manual_annotation in enumerate(GoldStandard_Book4['Manual_Annotation']): ## for each token in the Gold Standard
        
    if manual_annotation == 'B-LOC': ## for each B-LOC entity in the Gold Standard
        
        ## create a tuple containing the reference and start position
        reference_startpos = (GoldStandard_Book4['Reference'][index], GoldStandard_Book4['Start_pos'][index])
        
        if len(GoldStandard_Book4['ToposText_copy'][index]) > 1: ## if ToposText annotated the token
            True_Positives.append(reference_startpos) ## it is a true positive
            
        else: ## if ToposText did not annotated the token
            
            if GoldStandard_Book4['Manual_Annotation'][index+1] != 'I-LOC': ## if B-LOC is not followed by I-LOC
                False_Negatives.append(reference_startpos) ## it is a false negative
            
            else: ## if B-LOC is followed by I-LOC
                
                flag = False
                
                for n in range(1,100):
                    
                    if GoldStandard_Book4['Manual_Annotation'][index+n] == 'I-LOC': ## inside the multi-word LOC entity
                        
                        if len(GoldStandard_Book4['ToposText_copy'][index+n]) > 1: ## if ToposText annotated the token
                            True_Positives.append((GoldStandard_Book4['Reference'][index+n], GoldStandard_Book4['Start_pos'][index+n])) ## it is a true positive
                            flag = True
                            break
                            
                    else: break
                        
                if flag == False: ## no entity was predicted in the span
                    False_Negatives.append(reference_startpos) ## it is a false negative

In [None]:
len(True_Positives)

In [None]:
len(False_Negatives)

**Compute False Positives**

In [None]:
False_Positives = [] ## create a list of false positives

In [None]:
for index, ToposText_annotation in enumerate(GoldStandard_Book4['ToposText_copy']):
        
    if ToposText_annotation == "B-['place']" : ## for each B-place ToposText annotation
        
        ## create a tuple containing the reference and start position
        reference_startpos = (GoldStandard_Book4['Reference'][index], GoldStandard_Book4['Start_pos'][index])
        
        if len(GoldStandard_Book4['Manual_Annotation'][index]) == 1: ## if the Gold Standard does not contain an entity
            
            if GoldStandard_Book4['ToposText_copy'][index+1] != "I-['place']" : ## if B-place is not followed by I-place
                False_Positives.append(reference_startpos) ## it is a false positive
        
        else: ## if B-place is followed by I-place
            
            flag = False
            
            for n in range(1,100):
                
                if GoldStandard_Book4['ToposText_copy'][index+1] == "I-['place']": ## inside the multi-word place annotation
                    
                    if len(GoldStandard_Book4['Manual_Annotation'][index+n]) > 1: ## the Gold Standard contains an entity
                        flag = True
                        break
                        
                else: break
                        
                if flag == False:
                    False_Positives.append(reference_startpos) ## it is a false positive

In [None]:
len(False_Positives)

The ToposText annotation has a Precision of 0.991.

In [None]:
## calculate precision

Precision = len(True_Positives) / (len(True_Positives) + len(False_Positives))
Precision

The ToposText annotation has a Recall of 0.671.

In [None]:
## calculate recall

Recall = len(True_Positives) / (len(True_Positives) + len(False_Negatives))
Recall

The ToposText annotation has a F1 score of 0.800.

In [None]:
## calculate F1 Score

F1 = 2 * (Precision * Recall) / (Precision + Recall)
F1

# ToposText ID

In [None]:
## create a copy of the ToposText column
GoldStandard_Book4['ToposText_ID_copy'] = GoldStandard_Book4['ToposText_ID']

## the function trasnform all the ID not including 'place' into O
def update_values(value):
    if 'place' not in value:
        return 'O'
    return value

GoldStandard_Book4['ToposText_ID_copy'] = GoldStandard_Book4['ToposText_ID_copy'].apply(update_values)

**Compute True Positive and False Negatives including partial matches**

In [None]:
True_Positives = [] ## create a list of true positives
False_Negatives = [] ## create a list of false negatives

In [None]:
for index, manual_annotation in enumerate(GoldStandard_Book4['Manual_Annotation']): ## for each token in the Gold Standard
        
    if manual_annotation == 'B-LOC': ## for each B-LOC entity in the Gold Standard
        
        ## create a tuple containing the reference and start position
        reference_startpos = (GoldStandard_Book4['Reference'][index], GoldStandard_Book4['Start_pos'][index])
        
        if len(GoldStandard_Book4['ToposText_ID_copy'][index]) > 1: ## if ToposText annotated the token
            True_Positives.append(reference_startpos) ## it is a true positive
            
        else: ## if ToposText did not annotated the token
            
            if GoldStandard_Book4['Manual_Annotation'][index+1] != 'I-LOC': ## if B-LOC is not followed by I-LOC
                False_Negatives.append(reference_startpos) ## it is a false negative
            
            else: ## if B-LOC is followed by I-LOC
                
                flag = False
                
                for n in range(1,100):
                    
                    if GoldStandard_Book4['Manual_Annotation'][index+n] == 'I-LOC': ## inside the multi-word LOC entity
                        
                        if len(GoldStandard_Book4['ToposText_ID_copy'][index+n]) > 1: ## if ToposText annotated the token
                            True_Positives.append((GoldStandard_Book4['Reference'][index+n], GoldStandard_Book4['Start_pos'][index+n])) ## it is a true positive
                            flag = True
                            break
                            
                    else: break
                        
                if flag == False: ## no entity was predicted in the span
                    False_Negatives.append(reference_startpos) ## it is a false negative

**Compute False Positives**

In [None]:
False_Positives = [] ## create a list of false positives

In [None]:
for index, ToposText_annotation in enumerate(GoldStandard_Book4['ToposText_ID_copy']):
        
    if ToposText_annotation.startswith("B-") : ## for each B-place annotation
        
        ## create a tuple containing the reference and start position
        reference_startpos = (GoldStandard_Book4['Reference'][index], GoldStandard_Book4['Start_pos'][index])
        
        if len(GoldStandard_Book4['Manual_Annotation'][index]) == 1: ## if the Gold Standard does not contain an entity
            
            if "I-" not in GoldStandard_Book4['ToposText_ID_copy'][index+1] : ## if B-place is not followed by I-place
                False_Positives.append(reference_startpos) ## it is a false positive
        
        else: ## if B-place is followed by I-place
            
            flag = False
            
            for n in range(1,100):
                
                if GoldStandard_Book4['ToposText_ID_copy'][index+1].startswith("I-") : ## inside the multi-word place annotation
                    
                    if len(GoldStandard_Book4['Manual_Annotation'][index+n]) > 1: ## the Gold Standard contains an entity
                        flag = True
                        break
                        
                else: break
                        
                if flag == False:
                    False_Positives.append(reference_startpos) ## it is a false positive