# Evaluation of ToposText Annotation in Book 4

To evalute the quality of the ToposText annotation both for the entity annotation and the labelling, Precision, Recall and F1 Score were calculated against a manually curated Gold Standard for Book 4. Firstly, the ToposText annotations were converted to the IOB format using the reference and start position. 

ToposText contains a good-quality annotation (F1 0.822) with high Precision (0.989). The Recall, instead, is lower (0.703), indicating that some entities were not annotated. More specifically, we counted 1,295 true positives, 635 false negatives, and 16 false positives.

Some conclusions from this step:

- In some cases, the place entity is annotated in ToposText, but the label is incorrect (i.e., 'Asia' is linked to a 'people' ToposText ID).
- In some cases, the ToposText labelling is inconsistent (i.e., the entity is labelled as 'demonym' or 'ethnic' in Class and 'place' in the ToposText ID).
- Using the 'Class' labelling, the ToposText annotation has a higher Precision rather than using the ToposText ID labelling, but a lower Recall.

To sum up, ToposText is a solid foundation for the annotation process, but it is necessary to expand the existing annotation.

In [None]:
import pandas as pd

In [None]:
## open the Gold Standard of Book 4 (18,664 rows)
GoldStandard_Book4 = pd.read_excel("/Users/u0154817/OneDrive - KU Leuven/Documents/KU Leuven/PhD project 'Greek Spaces in Roman Times'/Data_Extraction/Outputs/1.3.GoldStandard_Book4.xlsx")

In [None]:
len(GoldStandard_Book4)

In [None]:
## open the file containing the ToposText annotations in Book 4 (1,888 rows)
ToposText_Book4 = pd.read_csv("/Users/u0154817/OneDrive - KU Leuven/Documents/KU Leuven/PhD project 'Greek Spaces in Roman Times'/Data_Extraction/Outputs/1.1.ToposText_Annotations_Book_4.csv", delimiter=",")

In [None]:
len(ToposText_Book4)

# Convert ToposText annotation to IOB format

The ToposText annotations of Class and ToposText ID are appended to the Gold Standard dataframe using the reference and start position. In case of multi-word entities annotated together in ToposText (i.e., 'Corinthian Gulf'), each word is annotated separately in the dataset.

Notice that despite some ToposText annotations do not contain a Class, all the annotations are linked to a ToposText ID, as it was observed in the notebook '1.3.Explore_ToposText_Annotations_Book4'.

In [None]:
## create two new columns in the Gold Standard dataframe
GoldStandard_Book4['ToposText'] = 'O' ## column for the Class
GoldStandard_Book4['ToposText_ID'] = 'O' ## column for the ToposText ID

In [None]:
for i1, topostext_annotation in enumerate(ToposText_Book4['Tagged Entity']): ## for each ToposText annotation 
        
    reference = ToposText_Book4['Reference'][i1] ## get the reference (book, chapter, paragraph)
    start_position = ToposText_Book4['Start position'][i1] ## get the start position
    
    for i2, manual_annotation in enumerate(GoldStandard_Book4['Token']): ## for each token in the Gold Standard
        
        if GoldStandard_Book4['Start_pos'][i2] == start_position: ## if the start position is the same
            if GoldStandard_Book4['Reference'][i2] == reference: ## if the reference is the same
                                
                GoldStandard_Book4['ToposText'][i2] = ToposText_Book4['Class'][i1] ## update the ToposText column
                GoldStandard_Book4['ToposText_ID'][i2] = ToposText_Book4['ToposText ID'][i1] ## update the ToposText ID column
                
                topostext_annotation = topostext_annotation.split() ## split the text annotated in ToposText
                if len(topostext_annotation) > 1: ## if the annotation contains more than one word
                                        
                    for i3, word in enumerate(topostext_annotation): ## for each word
                        
                        if i3 > 0: ## except the first one
                                                        
                            GoldStandard_Book4['ToposText'][i2+i3] = ToposText_Book4['Class'][i1] ## update the corresponding ToposText column
                            GoldStandard_Book4['ToposText_ID'][i2+i3] = ToposText_Book4['ToposText ID'][i1] ## update the corresponding ToposText ID column

# Calculate Precision, Recall, F1 Score

To calculate the Precision, Recall and F1 Score, the lists of True Positives (TP), False Negatives (FN) and False Positives (FP) are generated from the dataset containing the Gold Standard and the ToposText annotations. A TP is a ToposText annotation that (a) is present in the Gold Standard and (b) is labelled as 'place' in ToposText. A FN is (a) an annotation that is not present in ToposText, but is present in the Gold Standard or (b) it is not correctly annotated as 'place'. A FP is an annotation that is annotated in ToposText as 'place', but not in the Gold Standard.

The code takes into account that different guidelines and entity boundaries were adopted in the Gold Standard and in ToposText, with the result that the same entity (i.e., 'Mount Pindus') could be annotated in different ways (i.e., including or not the word 'Mount'). An annotation was counted as TP even if only a part of the entity was present in ToposText (i.e., 'Pindus' for 'Mount Pindus'). 

The evaluation of the labelling was performed inspecting the 'Class' label and the ToposText ID (i.e., https://topostext.org/place/395208SDod). It was observed that in some cases the labelling is inconsistent (i.e., the entity is classified as 'person' in 'Class' but it is linked to a 'place' ToposText ID). Firstly, the annotation was counted as a TP if the Class _or_ the ToposText ID correctly contained the 'place' label. Then, we compared the Precision of the annotation considering separately (1) the ToposText ID labelling and (2) the 'Class' labelling . We observed that using the 'Class' labelling instead of the ToposText ID the annotation has a lower Recall, but higher Precision. In other words, the number of FN increases (from 599 to 635), but the number of FP drops to 16.

In [None]:
True_Positives_label = [] ## entities correctly annotated in ToposText and labelled as 'places'
False_Negatives = [] ## entities not annotated in ToposText or not labelled as 'places'

for index, manual_annotation in enumerate(GoldStandard_Book4['Manual_Annotation']):
        
    if manual_annotation == 'B-LOC': ## for each B-entity in the Gold Standard
        reference_startpos = (GoldStandard_Book4['Reference'][index], GoldStandard_Book4['Start_pos'][index])
        
        if len(GoldStandard_Book4['ToposText_ID'][index]) > 1: ## if it is annotated in ToposText
            
            if 'place' in str(GoldStandard_Book4['ToposText'][index]): ## if the ToposText annotation contains 'place' 
                True_Positives_label.append(reference_startpos) ## it is a true positive ToposText annotation
            else: False_Negatives.append(reference_startpos) ## it is a false negative ToposText annotation
        
        if len(GoldStandard_Book4['ToposText_ID'][index]) == 1: ## if it is not annotated in ToposText
            
            if GoldStandard_Book4['Manual_Annotation'][index+1] != 'I-LOC': ## if it is not followed by I-LOC
                False_Negatives.append(reference_startpos) ## it is a false negative ToposText annotation
                
            else: ## if it is in a multi-word entity
                
                flag = False
                
                for n in range(1,100): ## for any natural number
                    
                    if GoldStandard_Book4['Manual_Annotation'][index+n] == 'I-LOC': ## if it is followed by a I-LOC entity 
                        if len(GoldStandard_Book4['ToposText_ID'][index+n]) > 1: ## the I-LOC contains a ToposText ID
                            if 'place' in str(GoldStandard_Book4['ToposText'][index+n]): ## if the ToposText annotation contains 'place'
                                True_Positives_label.append((GoldStandard_Book4['Reference'][index+n], GoldStandard_Book4['Start_pos'][index+n])) ## it is a true positive ToposText annotation
                                flag = True
                                break
                    else: break
                        
                if flag == False:
                    False_Negatives.append(reference_startpos) ## it is a false negative ToposText annotation

The ToposText annotation contains 1,295 true positives.

In [None]:
len(True_Positives_label)

The ToposText annotation contains 635 false negatives.

In [None]:
len(False_Negatives)

In [None]:
## slice the dataframe showing only the O entities
GoldStandard_Book4_O = GoldStandard_Book4[GoldStandard_Book4['Manual_Annotation'] == 'O']
GoldStandard_Book4_O.reset_index(inplace=True)

In [None]:
False_Positives = [] ## entities annotated as 'places' in ToposText but not in the Gold Standard

for index, manual_annotation in enumerate(GoldStandard_Book4_O['Manual_Annotation']):
    
    reference_startpos = (GoldStandard_Book4_O['Reference'][index], GoldStandard_Book4_O['Start_pos'][index])

    if 'place' in str(GoldStandard_Book4_O['ToposText'][index]): ## if the ToposText annotation contains 'place'
        False_Positives.append(reference_startpos)

The ToposText annotation contains 16 false positives.

In [None]:
len(False_Positives)

The ToposText annotation has a Precision of 0.989.

In [None]:
## calculate precision

Precision = len(True_Positives) / (len(True_Positives) + len(False_Positives))
Precision

The ToposText annotation has a Recall of 0.703.

In [None]:
## calculate recall

Recall = len(True_Positives) / (len(True_Positives) + len(False_Negatives))
Recall

The ToposText annotation has a F1 score of 0.822.

In [None]:
## calculate F1

F1 = 2 * (Precision * Recall) / (Precision + Recall)
F1