# Inter Annotator Agreement with GitMA

In the following Notebook, you will learn how to import and process your CATMA Annotations with the Python package Gitma. The focus lays on calculating Inter Annotator Agreement (IAA) Scores. 



## Table of Contents

1. [Get a Catma access token](#1-get-a-catma-access-token)
2. [Import the `Catma` class and load your CATMA profile](#2-import-the-catma-class-and-load-your-catma-profile)
3. [Clone and load a CATMA project](#3-clone-and-load-a-catma-project)
4. [General project stats](#4-general-project-stats)
5. [A few examples on how to look at your annotations](#5-a-few-examples-on-how-to-look-at-your-annotations)
   1. [Plot all annotations](#51-plot-all-annotations)
   2. [5.2 Plot annotations for a specified annotation collection](#52-plot-annotations-for-a-specified-annotation-collection)
   3. [Cooccurrence network](#53-cooccurrence-network)
   4. [Annotation collection as Pandas DataFrame](#54-annotation-collection-as-pandas-dataframe)
6. [Inter Annotator Agreement (IAA) with gitma](#6-inter-annotator-agreement-iaa-with-gitma)
   1. [Set annotation collections](#61-set-annotation-collcections)
   2. [`get_iaa`](#62-get_iaa)
   3. [Filter by tags](#63-filter-by-tags)
   4. [Gamma Agreement](#64-gamma-agreement)
   

## 1. Get a Catma access token

To get access to your annotations on Catma you need to get a personal access Token. You can get this token on the CATMA Website after logging into your account. 

![Get access token](img/access_token_ui.png)

## 2. Import the `Catma` class and load your CATMA profile

In [None]:

from gitma import Catma

my_access_token = 'insert your access token here'
my_catma = Catma(gitlab_access_token=my_access_token)

First, let's look at your CATMA projects. 

In [None]:
my_catma.project_name_list


## 3. Clone and load a CATMA project

The `Catma` class instance can be used to clone and load a CATMA project. The only neccessary argument is the project's name. Optionally, a different destination directory can be specified.


In [None]:

my_project_name = 'GitMA_Demo_Project' # Replace with the name of your project

my_catma.load_project_from_gitlab(
    project_name=my_project_name, 
    backup_directory='projects/'
)

If a project was previously loaded from CATMA's GitLab backend, and you try to do so again, the operation will fail because the project already exists in the destination directory. If you want to fetch a fresh copy (that is, clone the project again) you need to delete or rename the existing project directory. Once you got your project from GitLab you can load it as a CatmaProject as shown below.

In [None]:
from gitma import CatmaProject

my_project = CatmaProject(
    projects_directory='projects/',
    project_name=my_project_name
)

## 4. General project stats

The `stats()` method shows you some metadata about your annotation collections.

In [None]:
my_project.stats()


## 5. A few examples on how to look at your annotations

Using the method `plot_annotations()` the annotations of each annotation collection and each document are plotted as a single subplot.
By clicking on the legend entries you can deactivate specific annotation collections within the plot. By hovering over the scatter point every annotation can be explored.


### 5.1 Plot all annotations

Using the method `plot_annotations()` the annotations of each annotation collection and each document are plotted as a single subplot.
By clicking on the legend entries you can deactivate specific annotation collections within the plot. By hovering over the scatter point every annotation can be explored.

In [None]:
my_project.plot_annotations()


### 5.2 Plot annotations for a specified annotation collection

For this we need to specify one annotation collection. To get an overview over all annotation collections in our project we can use the `annotation_collections`attribute of the CatmaProject class that contains a list of all annotation collections. We inspect the list as schown below.

In [None]:
for ac in my_project.annotation_collections:
    print(ac.name)

We can now specify the annotation collection that we want to inspect... 

In [None]:
my_annotation_collection = 'ac_2'

... and plot it (e.g. as a scatter plot).

In [None]:
my_project.ac_dict[my_annotation_collection].plot_annotations()

### 5.3 Cooccurrence network

An alternative way to visualize annotation collections is using networks. They can be used to get an insight into the cooccurrence of annotations.

In [None]:
my_project.ac_dict[my_annotation_collection].cooccurrence_network()

In [None]:
my_project.ac_dict[my_annotation_collection].cooccurrence_network(
    character_distance=50,
)

### 5.4 Annotation collection as Pandas DataFrame

In [None]:
my_project.ac_dict[my_annotation_collection].df

## 6. Inter Annotator Agreement (IAA) with gitma

### 6.1 Set annotation collcections

First we need to specify the two annotation collections for wich the inter annotator agreement should be calculated. For this we first take a look at alle annotation collections in our project. 


In [None]:
for ac in my_project.annotation_collections:
    print(ac.name)

Now we can choose the two annotation collections and place their names in the following two variables. 

In [None]:
annotation_collection_1 ='ac_1' # 'Name of the first collection'
annotation_collection_2 ='ac_2' # 'Name of the second collection'


### 6.2 `get_iaa`

For every annotation in annotation collection 1 the get_iaa method searches for the best matching annotation in annotation collection 2 with respect to its annotation text span.

In [None]:
my_project.get_iaa(
    ac1_name_or_inst=annotation_collection_1,
    ac2_name_or_inst=annotation_collection_2
)

The get_iaa method not only returns 3 different agreement scores, but also reports the number of annotation pairs considered when computing the IAA scores and the average overlap of the annotation pairs. Additionally, the method returns a confusion matrix to give an insight into the relation between the tags. 

### 6.3 Filter by tags

There may occur cases in which you don't want to include all annotations in the computing of the IAA scores. In those cases just use the tag_filter parameter, which expects a list of tag names. 

In [None]:
my_project.get_iaa(
    ac1_name_or_inst=annotation_collection_1,
    ac2_name_or_inst=annotation_collection_2,
    tag_filter=['stative_event'] # Put the tag names to include in the IAA calculation in this list
)

### 6.4 Gamma Agreement

To compute the gamma agreement, in addition to the annotation collections, 5 further parameters have to be defined. The alpha, beta and delta_empty parameters are necessary to compute the CombinedCategoricalDissimilarity. The n_samples and the precision_level values are used in the `compute_gamma()` method.

In [None]:
# gamma agreement with default settings
my_project.gamma_agreement(
    annotation_collections=[annotation_collection_1, annotation_collection_2],
    alpha=3,
    beta=1,
    delta_empty=0.01,
    n_samples=30,
    precision_level=0.01
)