# Inter Annotator Agreement (IAA) with gitma

The gitma package provides two methods to compute the agreement of two annotators.
Both methods compare annotation collections.
For that reason, it is recommended to use one annotation collection per annotator and document.
Additionally, it is recommended to name every annotation collection by a combination of the <span style="color:pink">document's title</span>, the <span style="color:red">annotation task</span> and the <span style="color:green">annotator</span>.

**Example:**  <span style="color:pink">rubinson_crueso</span>-<span style="color:red">narrative_space</span>-<span style="color:green">mareike</span>

## Dependencies

### nltk

Both methods to compute the IAA use packages that don't get installed with gitma by default.
If you are only interested in IAA metrics such as *Scott's pi*, *Cohen's kappa* and *Krippendorf's alpha*
the installation of the [Natural Language Toolkit](https://www.nltk.org/) is sufficient:

    pip install nltk

### pygamma_agreement

The gamma agreement takes unitizing as part of annotation tasks into account
(see [Mathet et al. 2015](https://watermark.silverchair.com/coli_a_00227.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAsIwggK-BgkqhkiG9w0BBwagggKvMIICqwIBADCCAqQGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMyMMaVYZBoX9znulIAgEQgIICdcXaIROcjaVrG686Y8MbKCDLINhO9N1vw5uOJOJShc3XANoIUnzebJcOwEmQLXo-sEfzscvk3C0fxSz6FN366vSE6P1rxte_YRurfJzgCMaiUz92Xh4Texplkgm0ihcRCXL3mw6vBbVSzV3WmfKGV3pyOJbOZrFoY5SMS8Oak1z6Ox8cg7dy2nvNJnn7m_ZV29R1s7z-CShjXX6re2jX6Nm4iSeQTqfDU5z8_TEH-G7Q61jT7AF-VAcsLC9r91AgDYssNNnEGodmgQOcSNSROAbWyyRAbURaHKaJPdfwuFqKQ873U7LhMV8Qu8gyP1tSMKBBT59eccs129r6q9aeJVA7LwvjRoY5XLlINwRetXkX6haJmfrza5jCJ0o6fNXk5w_p-_J_pcpzw0usY9J1nErEPG_ugW2aGmOh4pLgP1r9Bi77BtRMzN-q20TblioMiffKDBjkn9tDs83XeFxRNq9GsCZRLs8BXeFa9aefnzeTRgSRDop6kVXDmZQBpcBaxMaZuGtLP1Y4HMfbB2za6cBk2HzPtTvoRjaxqXUZ9WkkXXy6_MUWDOWjLe5CWG2wXWQrEQfCfvd5xmT29f6b6GxWh-80skkLCzcMDWz_rTQceex7L1l4gBvU5A0ChUK54kv_Xw9XjOkVTbYYXQDdiqaxFttNty_mWzJwcRpEyLAdCTMRBjpqRzBdYywtsrsQPHINFGK7NkkKkx_weGI7um5BiOT78C29wqshiADF-wKVG0mFQbPzfVpddgfvgjxQZ9bfKrI-HwkOttHV4I9U7YACHOrbC4iSSyO98oEc7dYpKBqowB7ypHarLP298TwEzfqOdww0)).
For many annotation projects done within CATMA that might be crucial.
If you want to compute the gamma agreement with gitma the installation of [pygamma-agreement](https://github.com/bootphon/pygamma-agreement) is required:

    pip install pygamma-agreement

Please take note of the **further installation instructions** on the [pygamma site](https://github.com/bootphon/pygamma-agreement#installation) and the [*'how to cite'*](https://github.com/bootphon/pygamma-agreement#citing-pygamma)!




In [1]:
# import the CatmaProject class
from gitma import CatmaProject

# load your project
my_project = CatmaProject(
    project_name='test_corpus',
    project_directory='../test/demo_project/'
)

Loading Tagsets ...
	 Found 1 Tagset(s).
Loading Documents ...
	 Found 1 Document(s).
Loading Annotation Collections ...
	 Loading Annotation Collection 'ac_2' for Kafka Franz Das Urteil
	-> with 6 Annotations.
	 Loading Annotation Collection 'ac_1' for Kafka Franz Das Urteil
	-> with 6 Annotations.
	 Loading Annotation Collection 'gold_annotation' for Kafka Franz Das Urteil
	-> with 0 Annotations.


## `get_iaa()`

In the test project exist three annotation collection.
In this demo we will compute the agreement of the collections 'ac_1' and 'ac_2'.

The `get_iaa` method searches for every annotation in annotation collection 1 (`ac1_name`) the best matching annotation
in annotation collection 2 (`ac2_name`) with respect to its annotation text span.
The following examples show how matching annotations in 2 annotation collections get identified:

<img src="demo_img/best_match_example_iaa.PNG">

In contrast to the `gamma_agreement` method (see below), the `get_iaa` method only considers the best matching annotations
from both annotation collections when computing the IAA value.

### Basic Example

First, we will take look at both annotation collections by comparing the annotation spans.

In [2]:
# compare the annotation collections by start point
my_project.compare_annotation_collections(
    annotation_collections=['ac_1', 'ac_2']
)

As the line plot shows, every annotation in annotations collection 'ac_1' has a matching annotation in annotations collection 'ac_2'.

Now, let's compute the IAA for all matching annotations:

In [3]:
my_project.get_iaa(
    ac1_name='ac_1',
    ac2_name='ac_2'
)


Finished search for overlapping annotations.
Could match 6 annotations.
Average overlap is 83.19 %.
Couldn't match 0 annotation(s) in first annotation collection.



Scott's pi: 0.6571428571428573
Cohen's Kappa: 0.6666666666666667
Krippendorf Alpha: 0.6857142857142857
-------------------
-------------------
Confusion Martrix:



Unnamed: 0,process,stative_event
process,2,0
stative_event,1,3


The `get_iaa` method not only returns 3 different agreement scores
but reports also the number of annotation pairs considered when computing the IAA scores
and the average overlap of the annotation pairs.
Additionally, the method returns a confusion matrix to give an inside into the relation between the tags.
As you can see in the matrix, in 3 cases a annotation with the tag 'stative_event' in annotation collection 1
has a best match in annotation collection 2 with the same tag.
This are the first 3 annotations in annotation collection 1, as the line plot above shows.

### Filter by Tags

There may occur cases in which you don't want to include all annotations in the computing of
the IAA scores.
If those cases just use the `tag_filter` parameter, which takes any list of tag names as argument.

In [4]:
my_project.get_iaa(
    ac1_name='ac_1',
    ac2_name='ac_2',
    tag_filter=['process']
)


Finished search for overlapping annotations.
Could match 3 annotations.
Average overlap is 100.0 %.
Couldn't match 0 annotation(s) in first annotation collection.



Scott's pi: -0.20000000000000007
Cohen's Kappa: 0.0
Krippendorf Alpha: 0.0
-------------------
-------------------
Confusion Martrix:



Unnamed: 0,process,stative_event
process,2,0
stative_event,1,0


As the confusion matrix shows, only the annototations from annotation collection 1
with the tag 'process' have been taken into account.
From annotation collection 2 there still is one annotation considered with the tag 'stative_event'.
But we can filter both annotation collection, too: 

In [5]:
my_project.get_iaa(
    ac1_name='ac_1',
    ac2_name='ac_2',
    tag_filter=['process'],
    filter_both_ac=True
)


Finished search for overlapping annotations.
Could match 3 annotations.
Average overlap is 100.0 %.
Couldn't match 1 annotation(s) in first annotation collection.



Scott's pi: -0.20000000000000007
Cohen's Kappa: 0.0
Krippendorf Alpha: 0.0
-------------------
-------------------
Confusion Martrix:



Unnamed: 0,process,#None#
process,2,0
#None#,1,0


Because we only use to tags in the demo project this leads to the same IAA results.

### Compare Property Annotations

Its tag is only one level of CATMA annotations.
If you want to compare annotations by ther properties this is possible too.
In the demo project the annotations have the property 'mental' to evaluate if a mental
event is referenced in the text:

In [6]:
my_project.compare_annotation_collections(
    annotation_collections=['ac_1', 'ac_2'],
    color_col='prop:mental'
)

To compute the agreement of property annotations you just have to use the `level` parameter:

In [7]:
my_project.get_iaa(
    ac1_name='ac_1',
    ac2_name='ac_2',
    level='prop:mental'
)


Finished search for overlapping annotations.
Could match 6 annotations.
Average overlap is 83.19 %.
Couldn't match 0 annotation(s) in first annotation collection.



Scott's pi: 1.0
Cohen's Kappa: 1.0
Krippendorf Alpha: 1.0
-------------------
-------------------
Confusion Martrix:



Unnamed: 0,yes,no
yes,1,0
no,0,5


This example shows that in some cases the `get_iaa` method ignores disagreeing annotations,
because they are not the best matching annotation.
In the last annotation span of annotation collection 1 we can find one discontinuous and one embedded
annotation in annotation collection one.
But only the discontinuous annotations gets considered when computing the IAA because it is the better match to
the last annotation in annotation collection 1.

Again, if unitizing plays an important role in your annotation task we recommend the gamma agreement method.

## `gamma_agreement()`

To compute the gamma agreement additionaly to the annotation collections 5 further parameter
have to be defined.
The `alpha`, `beta` and `delta_empty` are necessary to compute the
[`CombinedCategoricalDissimilarity`](https://github.com/bootphon/pygamma-agreement/blob/master/pygamma_agreement/dissimilarity.py#L467).
The `n_samples` and the `precision_level` value are used in the 
[`compute_gamma()` method](https://github.com/bootphon/pygamma-agreement/blob/master/pygamma_agreement/continuum.py#L805).
See the documentations in pygamma and
[Mathet et al. 2015](https://watermark.silverchair.com/coli_a_00227.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAsIwggK-BgkqhkiG9w0BBwagggKvMIICqwIBADCCAqQGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMyMMaVYZBoX9znulIAgEQgIICdcXaIROcjaVrG686Y8MbKCDLINhO9N1vw5uOJOJShc3XANoIUnzebJcOwEmQLXo-sEfzscvk3C0fxSz6FN366vSE6P1rxte_YRurfJzgCMaiUz92Xh4Texplkgm0ihcRCXL3mw6vBbVSzV3WmfKGV3pyOJbOZrFoY5SMS8Oak1z6Ox8cg7dy2nvNJnn7m_ZV29R1s7z-CShjXX6re2jX6Nm4iSeQTqfDU5z8_TEH-G7Q61jT7AF-VAcsLC9r91AgDYssNNnEGodmgQOcSNSROAbWyyRAbURaHKaJPdfwuFqKQ873U7LhMV8Qu8gyP1tSMKBBT59eccs129r6q9aeJVA7LwvjRoY5XLlINwRetXkX6haJmfrza5jCJ0o6fNXk5w_p-_J_pcpzw0usY9J1nErEPG_ugW2aGmOh4pLgP1r9Bi77BtRMzN-q20TblioMiffKDBjkn9tDs83XeFxRNq9GsCZRLs8BXeFa9aefnzeTRgSRDop6kVXDmZQBpcBaxMaZuGtLP1Y4HMfbB2za6cBk2HzPtTvoRjaxqXUZ9WkkXXy6_MUWDOWjLe5CWG2wXWQrEQfCfvd5xmT29f6b6GxWh-80skkLCzcMDWz_rTQceex7L1l4gBvU5A0ChUK54kv_Xw9XjOkVTbYYXQDdiqaxFttNty_mWzJwcRpEyLAdCTMRBjpqRzBdYywtsrsQPHINFGK7NkkKkx_weGI7um5BiOT78C29wqshiADF-wKVG0mFQbPzfVpddgfvgjxQZ9bfKrI-HwkOttHV4I9U7YACHOrbC4iSSyO98oEc7dYpKBqowB7ypHarLP298TwEzfqOdww0)
for further information about these parameters.

In [8]:
# gamma agreement with default settings
my_project.gamma_agreement(
    annotation_collections=['ac_1', 'ac_2'],
    alpha=3,
    beta=1,
    delta_empty=0.01,
    n_samples=30,
    precision_level=0.01
)

The gamma agreement is 0.26121075218578416


If you want to work with a different dissimillarity algorithm
consider to use pygamma directly.
For this purpose you can save all annotations in a project as a csv
in the format pygamma needs as input:

In [11]:
pygamma_df = my_project.pygamma_table(
    annotation_collections=['ac_1', 'ac_2']
)

# save
pygamma_df.to_csv('../test/pygamma_table.csv', index=False, header=False)

# show example
pygamma_df.head(5)

Unnamed: 0,annotator,tag,start_point,end_point
0,MMeister,stative_event,67,122
1,MVauth,stative_event,123,356
2,MVauth,stative_event,358,443
3,MMeister,process,445,487
4,MVauth,process,488,643
