# Tutorial

In this tutorial I will go over the basic, front-end usage of calculating IAA between 2 annotators.

In [2]:
import spacy
import medspacy
import pandas
import sys
sys.path.insert(1, './Integrated_code/')
import IAA_ as IAA

In [3]:
nlp1 = spacy.load("en_core_web_sm")
nlp2 = spacy.load("en_core_web_md")
#!python -m spacy download en_core_web_sm
#!python -m spacy download en_core_web_md

#Note for John: Get better examples or make my own entities
doc1 = nlp1("this is a test document made in utah or mississippi, or salt lake city.")
doc2 = nlp2("this is a test document made in utah or mississippi, or salt lake city.")

print('doc1.ents: ',doc1.ents)
print('doc2.ents: ',doc2.ents)

doc1.ents:  (utah, mississippi)
doc2.ents:  (utah, mississippi, lake city)


Above we made two documents using spacy's NER packages. Document 2 added more entities than document 1. Let's calculate the IAA between these documents!

In [5]:
IAA.corpus_agreement([doc1],[doc2])

Unnamed: 0,IAA,Recall,Precision,True Positives,False Positives,False Negative
0,0.8,0.666667,1.0,2,1,0


'corpus_agreement' calculates the agreement between two lists of documents, lists/tuples of entities/spans, or dataframes. Note the brackets around 'doc1' and 'doc2', so they are passed in as lists. This is because corpus_agreement assumes you are passing a lists of documents.

'corpus_agreement' can also take options to be more flexible with other IAA methods. Below are the arguments:

### corpus_agreement(docs1, docs2, loose=1, labels=1,ent_or_span='ent')

docs1: Either a list of spacy documents, list containing inner tuples/lists of entities/spans, list of spangroups, dataframe.
    Considered the golden/correct annotation for fp,fn.
    
docs2: Either a list of spacy documents, list of tuples/lists of entities/spans, list of spangroups, or a dataframe.

loose: Boolean. 1 indicates to consider any overlap. 0 indicates to only consider exact matches.

labels: Boolean. 1 indicates to consider labels as matching criteria.

ent_or_span: String of either 'ent' or 'span'. 'ent' indicates to compare doc.ents between documents. 'span' indicates to 
    compare doc1's only spangroup (note that doc1 must have only 1 spangroup) with doc2's equivalently named spangroup. This
    argument is only relevant if passing in a list of spacy documents (ie. can be ignored if passing in a list of tuple/list 
    of ents/spans/spangroups or dataframe)

Internally, corpus_agreement is calling an overlap function that calculates and 'pairwise_f1' functions on each pair of document in the lists. We can instead choose to call these functions separate