**1.  Implement F-Score.** 

Note, to be considered a
match, an annotation must have the same span and the same label.

For calculating the F1 score, we first calculate the precision and recall. These are defined as the following:

$\text{precision} = \frac{\text{true positives}}{\text{true positives} + \text{false positives}}
$

$
\text{recall} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}
$

Now, we can combine them to find the F1 score. 

$F_1 = 2\cdot\frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}}$


In [2]:
def FScore(annotations1, annotations2): 
    # Function to return F1-score for 2 sets of annotations. 
    annotations1_tuples = set(tuple(line.strip().split(' - ')) for line in annotations1)
    annotations2_tuples = set(tuple(line.strip().split(' - ')) for line in annotations2)
    TP = annotations1_tuples.intersection(annotations2_tuples)
    FN = annotations1_tuples - annotations2_tuples
    FP = annotations2_tuples - annotations1_tuples
    P = len(TP) / (len(TP) + len(FP)) # calculating precision
    R = len(TP) / (len(TP) + len(FN)) # calculating recall
    F = 2 * (P * R) / (P + R) # calculating F1 score
    return F

***2. Calculate F-Score for each pair of annotations***  in stage (1) of the assignment,
including the annotations provided.

First, we'll import the files in Colab using PyDrive. We learnt this in the tutorials. 

In [3]:
!pip install -U -q PyDrive

# Import libraries for accessing Google Drive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

We download and load the annotation files

In [4]:
sachin_stage1_link = "1y21TGlMyk6sniDjL_rz8wSUioAgN3RzD" 
suchi_stage1_link = "1LjvqZruY7ATOU7KTxuk4JHMFq27PJOwF"
ayush_stage1_link = "1Nk4N2eHX0VVjT-IBik_Y4XJDWj-IZv14"
provided_link = "1Ob0H2l23Amdmzs4-as_yUhlOfP2Si4zm"

downloaded = drive.CreateFile({'id':sachin_stage1_link})
downloaded.GetContentFile('sachin_stage1.txt')
with open('sachin_stage1.txt') as f:
    sachin_stage1 = f.readlines()

downloaded = drive.CreateFile({'id':suchi_stage1_link})
downloaded.GetContentFile('suchi_stage1.txt')
with open('suchi_stage1.txt') as f:
    suchi_stage1 = f.readlines()

downloaded = drive.CreateFile({'id':ayush_stage1_link})
downloaded.GetContentFile('ayush_stage1.txt')
with open('ayush_stage1.txt') as f:
    ayush_stage1 = f.readlines()

downloaded = drive.CreateFile({'id':provided_link})
downloaded.GetContentFile('provided.txt')
with open('provided.txt') as f:
    provided = f.readlines()

For each pair of annotations in stage 1, we calculate the F1-Score

In [18]:
import pandas as pd
data = {
    "Name 1": ["Suchi", "Suchi", "Ayush", "Ayush", "Sachin", "Suchi"],
    "Name 2": ["Ayush", "Sachin", "Sachin", "Provided", "Provided", "Provided"],
    "F-score": [FScore(suchi_stage1, ayush_stage1), FScore(suchi_stage1, sachin_stage1), FScore(ayush_stage1, sachin_stage1), FScore(ayush_stage1, provided), FScore(sachin_stage1,provided), FScore(suchi_stage1, provided)]
}

df = pd.DataFrame(data=data)
df.index += 1 
df

Unnamed: 0,Name 1,Name 2,F-score
1,Suchi,Ayush,0.532495
2,Suchi,Sachin,0.785714
3,Ayush,Sachin,0.561122
4,Ayush,Provided,0.191958
5,Sachin,Provided,0.311953
6,Suchi,Provided,0.277108


**3.** Repeat the previous step using the data from stage (3) of the assignment.

Loading the annotation files created in stage 3. 

In [6]:
sachin_stage3_link = "1cHBjsN_yrJnJY9U0tku7Eb5DiXVwXFzS"
suchi_stage3_link = "14pZ_XcbHnRbAxTjBa35qZ1US34-BqP77"
ayush_stage3_link = "14dkwOvvXQ0KMDWHyiwH2NfoGi6pOHwft"

downloaded = drive.CreateFile({'id':sachin_stage3_link})
downloaded.GetContentFile('sachin_stage3.txt')

with open('sachin_stage3.txt') as f:
    sachin_stage3 = f.readlines()

downloaded = drive.CreateFile({'id':suchi_stage3_link})
downloaded.GetContentFile('suchi_stage3.txt')

with open('suchi_stage3.txt') as f:
    suchi_stage3 = f.readlines()

downloaded = drive.CreateFile({'id':ayush_stage3_link})
downloaded.GetContentFile('ayush_stage3.txt')

with open('ayush_stage3.txt') as f:
    ayush_stage3 = f.readlines()

Calculating F1-scores for annotations in stage 3. 

In [20]:
data = {
    "Name 1": ["Suchi", "Suchi", "Ayush"],
    "Name 2": ["Ayush", "Sachin", "Sachin"],
    "F-score": [FScore(suchi_stage3, ayush_stage3), FScore(suchi_stage3, sachin_stage3), FScore(ayush_stage3, sachin_stage3)]
}


df = pd.DataFrame(data=data)
df.index += 1 
df

Unnamed: 0,Name 1,Name 2,F-score
1,Suchi,Ayush,0.769231
2,Suchi,Sachin,0.8418
3,Ayush,Sachin,0.797654
