# Inter Annotator Agreement (IAA) with gitma
This demo uses the demo CATMA project.
If you want to use it for your own annotations you first have to clone your CATMA project locally.
For further information about cloning your CATMA project see [this notebook](https://github.com/forTEXT/gitma/blob/main/demo_notebooks/load_project_from_gitlab.ipynb).

This package provides two methods to compute the agreement of two or more annotators.
Both methods compare annotation collections.
For that reason, it is recommended to use one annotation collection per annotator and document.
Additionally, it is recommended to name every annotation collection by a combination of the <span style="color:pink">document's title</span>, the <span style="color:red">annotation task</span> and the <span style="color:green">annotator</span>.

**Example:**  <span style="color:pink">robinson_crusoe</span>-<span style="color:red">narrative_space</span>-<span style="color:green">mareike</span>

## Table of contents
* [Dependencies](#1-bullet)
* [`get_iaa()`](#2-bullet)
  * [Basic example](#2.1)
  * [Filter by tags](#2.2)
  * [Compare annotation properties](#2.3)
* [`gamma_agreement()`](#3-bullet)

## Dependencies <a class="anchor" id="1-bullet"></a>

### nltk

If you are only interested in IAA metrics such as *Scott's pi*, *Cohen's kappa* and *Krippendorf's alpha*
the [Natural Language Toolkit](https://www.nltk.org/) is sufficient (already installed).

### pygamma-agreement

The gamma agreement takes unitizing as part of annotation tasks into account
(see [Mathet et al. 2015](https://aclanthology.org/J15-3003.pdf)).
For many annotation projects done within CATMA that might be crucial.
If you want to compute the gamma agreement using this package, the installation of [pygamma-agreement](https://github.com/bootphon/pygamma-agreement) is required:

    pip install pygamma-agreement

Please take note of the **further installation instructions** on the [pygamma-agreement GitHub page](https://github.com/bootphon/pygamma-agreement#installation) and the [*'how to cite'*](https://github.com/bootphon/pygamma-agreement#citing-pygamma)!




In [None]:
# import the CatmaProject class
from gitma import CatmaProject

# load your project
my_project = CatmaProject(
    project_name='test_corpus',
    project_directory='../test/demo_project/'
)

## `get_iaa()` <a class="anchor" id="2-bullet"></a>

The test project contains three annotation collections.
In this demo we will compute the agreement of the collections 'ac_1' and 'ac_2'.

For every annotation in annotation collection 1 (`ac1_name`) the `get_iaa` method searches for the best matching annotation
in annotation collection 2 (`ac2_name`) with respect to its annotation text span.
The following examples show how matching annotations in two annotation collections are identified:

<img src="demo_img/best_match_example_iaa.png">

In contrast to the `gamma_agreement` method (see below), the `get_iaa` method only considers the best matching annotations
from both annotation collections when computing the IAA value.

### Basic example <a class="anchor" id="2.1"></a>

First, we will take look at both annotation collections by comparing the annotation spans.

In [None]:
# compare the annotation collections by start point
my_project.compare_annotation_collections(
    annotation_collections=['ac_1', 'ac_2']
)

As the line plot shows, every annotation in annotation collection 'ac_1' has a matching annotation in annotation collection 'ac_2'.

Now, let's compute the IAA for all matching annotations:

In [None]:
my_project.get_iaa(
    ac1_name='ac_1',
    ac2_name='ac_2'
)

The `get_iaa` method not only returns 3 different agreement scores,
but also reports the number of annotation pairs considered when computing the IAA scores
and the average overlap of the annotation pairs.
Additionally, the method returns a confusion matrix to give an insight into the relation between the tags.
As you can see in the matrix, in 3 cases an annotation with the tag 'stative_event' in annotation collection 1
has a best match in annotation collection 2 with the same tag.
These are the first 3 annotations in annotation collection 1, as the line plot above shows.

### Filter by tags <a class="anchor" id="2.2"></a>

There may occur cases in which you don't want to include all annotations in the computing of
the IAA scores.
In those cases just use the `tag_filter` parameter, which expects a list of tag names.

In [None]:
my_project.get_iaa(
    ac1_name='ac_1',
    ac2_name='ac_2',
    tag_filter=['process']
)

As the confusion matrix shows, only the annotations from annotation collection 1
with the tag 'process' have been taken into account.
From annotation collection 2 there is still one annotation considered with the tag 'stative_event'.
But we can filter both annotation collections, too: 

In [None]:
my_project.get_iaa(
    ac1_name='ac_1',
    ac2_name='ac_2',
    tag_filter=['process'],
    filter_both_ac=True
)

Because we only use two tags in the demo project this leads to the same IAA results.

### Compare annotation properties <a class="anchor" id="2.3"></a>

The tag is only one level of CATMA annotations.
If you want to compare annotations by their properties this is possible too.
In the demo project the annotations have the property 'mental' to evaluate if a mental
event is referenced in the text:

In [None]:
my_project.compare_annotation_collections(
    annotation_collections=['ac_1', 'ac_2'],
    color_col='prop:mental'
)

To compute the agreement of annotation properties you just have to use the `level` parameter:

In [None]:
my_project.get_iaa(
    ac1_name='ac_1',
    ac2_name='ac_2',
    level='prop:mental'
)

This example shows that in some cases the `get_iaa` method ignores disagreeing annotations,
because they are not the best matching annotations.
In the last annotation span of annotation collection 1 we can find one discontinuous and one embedded annotation.
But only the discontinuous annotation is considered when computing the IAA because it is the better match to
the last annotation in annotation collection 1.

Again, if unitizing plays an important role in your annotation task we recommend the `gamma_agreement` method.

## `gamma_agreement()` <a class="anchor" id="3-bullet"></a>

To compute the gamma agreement, in addition to the annotation collections, 5 further parameters
have to be defined.
The `alpha`, `beta` and `delta_empty` parameters are necessary to compute the
[`CombinedCategoricalDissimilarity`](https://github.com/bootphon/pygamma-agreement/blob/master/pygamma_agreement/dissimilarity.py#L467).
The `n_samples` and the `precision_level` values are used in the 
[`compute_gamma()` method](https://github.com/bootphon/pygamma-agreement/blob/master/pygamma_agreement/continuum.py#L805).
See the documentation for pygamma-agreement and
[Mathet et al. 2015](https://aclanthology.org/J15-3003.pdf)
for further information about these parameters.

In [None]:
# gamma agreement with default settings
my_project.gamma_agreement(
    annotation_collections=['ac_1', 'ac_2'],
    alpha=3,
    beta=1,
    delta_empty=0.01,
    n_samples=30,
    precision_level=0.01
)

If you want to work with a different dissimillarity algorithm
consider using pygamma-agreement directly.
For this purpose you can save all annotations in a project as a CSV file
in the format pygamma-agreement takes as input:

In [None]:
pygamma_df = my_project.pygamma_table(
    annotation_collections=['ac_1', 'ac_2']
)

# save
pygamma_df.to_csv('../test/pygamma_table.csv', index=False, header=False)

# show example
pygamma_df.head(5)