# Inter Annotator Agreement with GitMA

In the following Notebook, you will learn how to import and process your CATMA Annotations with the Python package Gitma. The focus lays on calculating Inter Annotator Agreement (IAA) Scores. 



## Table of Contents

1. [Get a Catma access token](#1-get-a-catma-access-token)
2. [Import the `Catma` class and load your CATMA profile](#2-import-the-catma-class-and-load-your-catma-profile)
3. [Clone and load a CATMA project](#3-clone-and-load-a-catma-project)
4. [General project stats](#4-general-project-stats)
5. [Annotation overview for the entire project](#5-annotation-overview-for-the-entire-project)
6. [Plot annotations for a specified annotation collection](#6-annotation-stats-by-tags)
   1. [Scatter plot](#61-scatter-plot)
   2. [Cooccurrence network](#62-cooccurrence-network)
   3. [Annotation collection as Pandas DataFrame](#63-annotation-collection-as-pandas-dataframe)
7. [Annotation stats by tags](#7-annotation-stats-by-tags)
8. [Inter Annotator Agreement (IAA) with gitma](#8-inter-annotator-agreement-iaa-with-gitma)
   1. [`get_iaa`](#81-get_iaa)
   2. [Filter by tags](#82-filter-by-tags)
   3. [Compare annotation properties](#83-compare-annotation-properties)

## 1. Get a Catma access token

To get access to your annotations on Catma you need to get a personal access Token. You can get this token on the CATMA Website after logging into your account. 

![Get access token](img/access_token_ui.png)

## 2. Import the `Catma` class and load your CATMA profile

In [None]:

from gitma import Catma

my_access_token = 'insert your access token here'
my_catma = Catma(gitlab_access_token=my_access_token)

First, let's look at your CATMA projects. 

In [None]:
my_catma.project_name_list


## 3. Clone and load a CATMA project

The `Catma` class instance can be used to clone and load a CATMA project. The only neccessary argument is the project's name. Optionally, a different destination directory can be specified.


In [None]:

my_project_name = 'GitMA_Demo_Project' # Replace with the name of your project

my_catma.load_project_from_gitlab(
    project_name=my_project_name, 
    backup_directory='projects/'
)

If a project was previously loaded from CATMA's GitLab backend, and you try to do so again, the operation will fail because the project already exists in the destination directory. If you want to fetch a fresh copy (that is, clone the project again) you need to delete or rename the existing project directory. Once you got your project from GitLab you can load it as a CatmaProject as shown below.

In [4]:
from gitma import CatmaProject

my_project = CatmaProject(
    projects_directory='projects/',
    project_name=my_project_name
)

Loading tagsets ...
	Found 1 tagset(s).
Loading documents ...
	Found 1 document(s).
Loading annotation collections ...
	Found 3 annotation collection(s).
	Annotation collection "gold_annotation" for document "The Metamorphosis"
		Annotations: 0
	Annotation collection "ac_2" for document "The Metamorphosis"
		Annotations: 19
	Annotation collection "ac_1" for document "The Metamorphosis"
		Annotations: 20


## 4. General project stats

The `stats()` method shows you some metadata about your annotation collections.

In [5]:
my_project.stats()

Unnamed: 0,annotation collection,document,annotations,annotator,tag,first_annotation,last_annotation,uuid
0,ac_1,The Metamorphosis,20,{MVauth},"{change_of_state, non_event, process_event, st...",2023-08-02 15:43:57.086000+02:00,2023-08-02 15:55:27.298000+02:00,C_F8552E35-F9BD-4544-A8C5-BD9114C80E43
1,ac_2,The Metamorphosis,19,{mvgoogle},"{change_of_state, non_event, process_event, st...",2023-08-03 13:29:00.085000+02:00,2023-08-03 13:37:13.986000+02:00,C_D32D3AFF-98A6-4596-9B32-ABDC66445290



## 5. A few examples on how to look at your annotations

Using the method `plot_annotations()` the annotations of each annotation collection and each document are plotted as a single subplot.
By clicking on the legend entries you can deactivate specific annotation collections within the plot. By hovering over the scatter point every annotation can be explored.


### 5.1 Plot all annotations

Using the method `plot_annotations()` the annotations of each annotation collection and each document are plotted as a single subplot.
By clicking on the legend entries you can deactivate specific annotation collections within the plot. By hovering over the scatter point every annotation can be explored.

In [6]:
my_project.plot_annotations()


### 5.2 Plot annotations for a specified annotation collection

For this we need to specify one annotation collection. To get an overview over all annotation collections in our project we can use the `annotation_collections`attribute of the CatmaProject class that contains a list of all annotation collections. We inspect the list as schown below.

In [None]:
for ac in my_project.annotation_collections:
    print(ac.name)

We can now specify the annotation collection that we want to inspect... 

In [9]:
my_annotation_collection = 'ac_2'

... and plot it (e.g. as a scatter plot).

In [None]:
my_project.ac_dict[my_annotation_collection].plot_annotations()

### 5.3 Cooccurrence network

An alternative way to visualize annotation collections is using networks. They can be used to get an insight into the cooccurrence of annotations.

In [None]:
my_project.ac_dict[my_annotation_collection].cooccurrence_network()

In [None]:
my_project.ac_dict[my_annotation_collection].cooccurrence_network(
    character_distance=50,
)

### 5.4 Annotation collection as Pandas DataFrame

In [None]:
my_project.ac_dict[my_annotation_collection].df

## 8. Inter Annotator Agreement (IAA) with gitma

### 8.1 Set annotation collcections

First we need to specify the two annotation collections for wich the inter annotator agreement should be calculated. For this we first take a look at alle annotation collections in our project. 


In [11]:
for ac in my_project.annotation_collections:
    print(ac.name)

gold_annotation
ac_2
ac_1


Now we can choose the two annotation collections and place their names in the following two variables. 

In [12]:
annotation_collection_1 ='ac_1' # 'Name of the first collection'
annotation_collection_2 ='ac_2' # 'Name of the second collection'


### 8.1 `get_iaa`

For every annotation in annotation collection 1 the get_iaa method searches for the best matching annotation in annotation collection 2 with respect to its annotation text span.

In [13]:
my_project.get_iaa(
    ac1_name_or_inst=annotation_collection_1,
    ac2_name_or_inst=annotation_collection_2
)


Finished search for matching annotations in:
- ac_1
- ac_2
20 annotation(s) could be matched.
Average overlap is 97.54%.
Couldn't match 0 annotation(s) in the first annotation collection.


Results for "tag"
-----------------
Scott's Pi:          0.7525773195876289
Cohen's Kappa:       0.7530864197530864
Krippendorf's Alpha: 0.7587628865979381

Confusion Matrix
                    -------



Unnamed: 0,change_of_state,non_event,process_event,stative_event
change_of_state,0,0,1,0
non_event,0,2,0,0
process_event,1,0,5,0
stative_event,0,0,1,10


The get_iaa method not only returns 3 different agreement scores, but also reports the number of annotation pairs considered when computing the IAA scores and the average overlap of the annotation pairs. Additionally, the method returns a confusion matrix to give an insight into the relation between the tags. 

### 8.2 Filter by tags

There may occur cases in which you don't want to include all annotations in the computing of the IAA scores. In those cases just use the tag_filter parameter, which expects a list of tag names. 

In [17]:
my_project.get_iaa(
    ac1_name_or_inst=annotation_collection_1,
    ac2_name_or_inst=annotation_collection_2,
    tag_filter=['stative_event'] # Put the tag names to include in the IAA calculation in this list
)


Finished search for matching annotations in:
- ac_1
- ac_2
10 annotation(s) could be matched.
Average overlap is 95.07%.
Couldn't match 0 annotation(s) in the first annotation collection.

Couldn't compute IAA for level 'tag' due to missing matching annotations with the given settings.

Results for "tag"
-----------------
Scott's Pi:          0
Cohen's Kappa:       0
Krippendorf's Alpha: 0

Confusion Matrix
                    -------



Unnamed: 0,stative_event
stative_event,10


### 8.3 Compare annotation properties

The tag is only one level of CATMA annotations. If you want to compare annotations by their properties this is possible too. To compute the agreement of annotation properties you just need to use the level parameter:

In [19]:
my_project.get_iaa(
    ac1_name_or_inst='ac_1',
    ac2_name_or_inst='ac_2',
    level='prop:representation_type'
)


Finished search for matching annotations in:
- ac_1
- ac_2
20 annotation(s) could be matched.
Average overlap is 97.54%.
Couldn't match 0 annotation(s) in the first annotation collection.


Results for "prop:representation_type"
--------------------------------------
Scott's Pi:          0.4805194805194798
Cohen's Kappa:       0.4871794871794869
Krippendorf's Alpha: 0.49350649350649345

Confusion Matrix
                    -------



Unnamed: 0,character_speech,thought_representation,narrator_speech
character_speech,0,0,0
thought_representation,1,0,0
narrator_speech,0,0,19


### 8.4 Gamma Agreement

To compute the gamma agreement, in addition to the annotation collections, 5 further parameters have to be defined. The alpha, beta and delta_empty parameters are necessary to compute the CombinedCategoricalDissimilarity. The n_samples and the precision_level values are used in the `compute_gamma()` method.

In [20]:
# gamma agreement with default settings
my_project.gamma_agreement(
    annotation_collections=[annotation_collection_1, annotation_collection_2],
    alpha=3,
    beta=1,
    delta_empty=0.01,
    n_samples=30,
    precision_level=0.01
)



The gamma agreement is 0.7639100376330488
