# Gitma Introduction Notebook 

In the following Notebook, you will learn how to import and process your CATMA Annotations with the Python package Gitma. 



## Table of Contents

1. [Get a Catma access token](#1-get-a-catma-access-token)
2. [Import the `Catma` class and load your CATMA profile](#2-import-the-catma-class-and-load-your-catma-profile)
3. [Clone and load a CATMA project](#3-clone-and-load-a-catma-project)
4. [General project stats](#4-general-project-stats)
5. [Annotation overview for the entire project](#5-annotation-overview-for-the-entire-project)
6. [Plot annotations for a specified annotation collection](#6-annotation-stats-by-tags)
   1. [Scatter plot](#61-scatter-plot)
   2. [Cooccurrence network](#62-cooccurrence-network)
   3. [Annotation collection as Pandas DataFrame](#63-annotation-collection-as-pandas-dataframe)
7. [Annotation stats by tags](#7-annotation-stats-by-tags)
8. [Inter Annotator Agreement (IAA) with gitma](#8-inter-annotator-agreement-iaa-with-gitma)
   1. [`get_iaa`](#81-get_iaa)
   2. [Filter by tags](#82-filter-by-tags)
   3. [Compare annotation properties](#83-compare-annotation-properties)

## 1. Get a Catma access token

To get access to your annotations on Catma you need to get a personal access Token. You can get this token on the CATMA Website after logging into your account. 

![Get access token](img/access_token_ui.png)

## 2. Import the `Catma` class and load your CATMA profile

In [None]:

from gitma import Catma

# my_access_token = 'insert your access token here'
# my_catma = Catma(gitlab_access_token=my_access_token)



First, let's look at your CATMA projects

In [None]:
# my_catma.project_name_list


## 3. Clone and load a CATMA project

The `Catma` class instance can be used to clone and load a CATMA project. The only neccessary argument is the project's name. Optionally, a different destination directory can be specified.


In [None]:

my_project_name = 'GitMA_Demo_Project'

# my_catma.load_project_from_gitlab(
#     project_name=my_project_name, 
#     backup_directory='projects/'
# )

If a project was previously loaded from CATMA's GitLab backend, and you try to do so again, the operation will fail because the project already exists in the destination directory. If you want to fetch a fresh copy (that is, clone the project again) you need to delete or rename the existing project directory. Once you got your project from GitLab you can load it as a CatmaProject as shown below.

In [None]:
from gitma import CatmaProject

my_project = CatmaProject(
    projects_directory='projects/',
    project_name=my_project_name
)

## 4. General project stats

The `stats()` method shows you some metadata about your annotation collections.

In [None]:
my_project.stats()


## 5. Annotation overview for the entire project

Using the method `plot_annotations()` the annotations of each annotation collection and each document are plotted as a single subplot.
By clicking on the legend entries you can deactivate specific annotation collections within the plot. By hovering over the scatter point every annotation can be explored.


In [None]:
my_project.plot_annotations()

The plot can be customized by the `color_col` parameter, for example to visualize annotation properties...

In [None]:
my_project.plot_annotations(color_col='prop:representation_type')

.. or the annotators...

In [None]:
my_project.plot_annotations(color_col='annotator')


## 6. Plot annotations for a specified annotation collection

For this we need to specify one annotation collection. To get an overview over all annotation collections in our project we can use the `annotation_collections`attribute of the CatmaProject class that contains a list of all annotation collections. We inspect the list as schown below.

In [None]:
for ac in my_project.annotation_collections:
    print(ac.name)

We can now specify the annotation collection that we want to inspect. 

In [None]:
my_annotation_collection = 'ac_2'


### 6.1 Scatter plot

The annotations of single annotation collections can be plotted as an interactive Plotly scatter plot, too. The annotations can be explored with respect to:

- their tag: y-axis
- their text position: x-axis
- the annotated text passages: mouse over
- their properties: mouse over

In [None]:
my_project.ac_dict[my_annotation_collection].plot_annotations()

You can customize the plot by choosing annotation properties for the y_axis and the scatter color.

In [None]:
my_project.ac_dict[my_annotation_collection].plot_annotations(y_axis='prop:representation_type')

In [None]:
my_project.ac_dict[my_annotation_collection].plot_annotations(
    y_axis='annotator',
    color_prop='prop:representation_type'
)

### 6.2 Cooccurrence network

An alternative way to visualize annotation collections is using networks. They can be used to get an insight into the cooccurrence of annotations.

In [None]:
my_project.ac_dict[my_annotation_collection].cooccurrence_network()


The networks can be customized by the following optional parameters:

    character_distance: the text span within which two annotations are considered to be cooccurrent. The default is 100 characters.
    included_tags: a list of tags that are included when drawing the graph
    excluded_tags: a list of tags that are excluded when drawing the graph

In [None]:
# TODO Evtl. an Demo Projekt anpassen 
my_project.ac_dict[my_annotation_collection].cooccurrence_network(
    character_distance=50,
    included_tags=['process_event', 'stative_event'],
    excluded_tags=None
)

### 6.3 Annotation collection as Pandas DataFrame

In [None]:
my_project.ac_dict[my_annotation_collection].df

## 7. Annotation stats by tags

The `tag_stats()` method counts, for each tag:

- the number of annotations
- the full text span annotated by the tag
- the average text span of the annotations
- the most frequent tokens (here, it is possible to define a stopword list)

In [None]:
my_project.ac_dict[my_annotation_collection].tag_stats(ranking=5)

Additionally, you can use the method for properties (if you used any in the annotation process) and different annotators:

In [None]:
my_project.ac_dict[my_annotation_collection].tag_stats(tag_col='prop:representation_type', ranking=3, stopwords=['a', 'to', 'the'])

Above, every row shows the data for the different property values.

In [None]:
my_project.ac_dict[my_annotation_collection].tag_stats(tag_col='annotator', ranking=3)

Above, every row shows the data for the different annotators.

## 8. Inter Annotator Agreement (IAA) with gitma

### 8.1 `get_iaa`

For every annotation in annotation collection 1 (ac1_name_or_inst) the get_iaa method searches for the best matching annotation in annotation collection 2 (ac2_name_or_inst) with respect to its annotation text span.

First, we will take look at both annotation collections by comparing the annotation spans.

In [None]:
# compare the annotation collections by start point
my_project.compare_annotation_collections(
    annotation_collections=['ac_1', 'ac_2']
)

As the line plot shows, every annotation in annotation collection 'ac_1' has a matching annotation in annotation collection 'ac_2'.

Now, let's compute the IAA for all matching annotations:

In [None]:
my_project.get_iaa(
    ac1_name_or_inst='ac_1',
    ac2_name_or_inst='ac_2'
)

The get_iaa method not only returns 3 different agreement scores, but also reports the number of annotation pairs considered when computing the IAA scores and the average overlap of the annotation pairs. Additionally, the method returns a confusion matrix to give an insight into the relation between the tags. As you can see in the matrix, in 2 cases an annotation with the tag 'non_event' in annotation collection 1 has a best match in annotation collection 2 with the same tag. Compare this with the line plot above.

### 8.2 Filter by tags

There may occur cases in which you don't want to include all annotations in the computing of the IAA scores. In those cases just use the tag_filter parameter, which expects a list of tag names.

In [None]:
my_project.get_iaa(
    ac1_name_or_inst='ac_1',
    ac2_name_or_inst='ac_2',
    tag_filter=['process_event']
)

### 8.3 Compare annotation properties

The tag is only one level of CATMA annotations. If you want to compare annotations by their properties this is possible too. In the demo project the annotations have the property 'representation_type' to evaluate if a speech or mental event is referenced in the text:

In [None]:
my_project.compare_annotation_collections(
    annotation_collections=['ac_1', 'ac_2'],
    color_col='prop:representation_type'
)

To compute the agreement of annotation properties you just need to use the level parameter:

In [None]:
my_project.get_iaa(
    ac1_name_or_inst='ac_1',
    ac2_name_or_inst='ac_2',
    level='prop:representation_type'
)