# Gitma Introduction Notebook 

In the following Notebook, you will learn how to import and process your CATMA Annotations with the Python package Gitma. 



## Table of Contents

## Get a Catma access token

To get access to your annotations on Catma you need to get a personal access Token. You can get this token on the CATMA Website after logging into your account. 

![Get access token](img/access_token_ui.png)

## Import the `Catma` class and load your CATMA profile

In [1]:

from gitma import Catma

my_access_token = 'insert your access token here'
my_catma = Catma(gitlab_access_token=my_access_token)



ModuleNotFoundError: No module named 'gitma'

First, let's look at your CATMA projects

In [None]:
my_catma.project_name_list


## Clone and load a CATMA project

The `Catma` class instance can be used to clone and load a CATMA project. The only neccessary argument is the project's name. Optionally, a different destination directory can be specified.


In [3]:

my_project_name = 'GitMA_Demo_Project'

my_catma.load_project_from_gitlab(
    project_name=my_project_name, 
    backup_directory='projects/'
)

NameError: name 'my_catma' is not defined

If a project was previously loaded from CATMA's GitLab backend, and you try to do so again, the operation will fail because the project already exists in the destination directory. If you want to fetch a fresh copy (that is, clone the project again) you need to delete or rename the existing project directory. Once you got your project from GitLab you can load it as a CatmaProject as shown below.

In [None]:
from gitma import CatmaProject

my_project = CatmaProject(
    projects_directory='catma_projects/',
    project_name=my_project_name
)

## General project stats

The `stats()` method shows you some metadata about your annotation collections.

In [None]:
my_project.stats()


## Annotation overview for the entire project

Using the method `plot_annotations()` the annotations of each annotation collection and each document are plotted as a single subplot.
By clicking on the legend entries you can deactivate specific annotation collections within the plot. By hovering over the scatter point every annotation can be explored.


In [None]:
my_project.plot_annotations()

The plot can be customized by the `color_col` parameter, for example to visualize annotation properties...

In [None]:
my_project.plot_annotations(color_col='prop:representation_type')

.. or the annotators...

In [None]:
my_project.plot_annotations(color_col='annotator')


## Plot annotations for a specified annotation collection

For this we need to specify one annotation collection. To get an overview over all annotation collections in our project we can use the `annotation_collections`attribute of the CatmaProject class that contains a list of all annotation collections. We inspect the list as schown below.

In [None]:
for ac in my_project.annotation_collections:
    print(ac.name)

We can now specify the annotation collection that we want to inspect. 

In [None]:
my_annotation_collection = 'name of your annotation collection'


### Scatter plot

The annotations of single annotation collections can be plotted as an interactive Plotly scatter plot, too. The annotations can be explored with respect to:

- their tag: y-axis
- their text position: x-axis
- the annotated text passages: mouse over
- their properties: mouse over

In [None]:
my_project.ac_dict[my_annotation_collection].plot_annotations()

You can customize the plot by choosing annotation properties for the y_axis and the scatter color.

In [None]:
my_project.ac_dict[my_annotation_collection].plot_annotations(y_axis='prop:representation_type')

In [None]:
my_project.ac_dict[my_annotation_collection].plot_annotations(
    y_axis='annotator',
    color_prop='prop:representation_type'
)

### Cooccurrence network

An alternative way to visualize annotation collections is using networks. They can be used to get an insight into the cooccurrence of annotations.

In [1]:
my_project.ac_dict[my_annotation_collection].cooccurrence_network()

NameError: name 'my_project' is not defined


The networks can be customized by the following optional parameters:

    character_distance: the text span within which two annotations are considered to be cooccurrent. The default is 100 characters.
    included_tags: a list of tags that are included when drawing the graph
    excluded_tags: a list of tags that are excluded when drawing the graph

In [2]:
# TODO Evtl. an Demo Projekt anpassen 
my_project.ac_dict[my_annotation_collection].cooccurrence_network(
    character_distance=50,
    included_tags=['process_event', 'stative_event'],
    excluded_tags=None
)

NameError: name 'my_project' is not defined

### Annotation collection as Pandas DataFrame

In [None]:
my_project.ac_dict[my_annotation_collection].df

## Annotation stats by tags

The `tag_stats()` method counts, for each tag:

- the number of annotations
- the full text span annotated by the tag
- the average text span of the annotations
- the most frequent tokens (here, it is possible to define a stopword list)

In [None]:
my_project.ac_dict[my_annotation_collection].tag_stats(ranking=5)

Additionally, you can use the method for properties (if you used any in the annotation process) and different annotators:

In [None]:
my_project.ac_dict[my_annotation_collection].tag_stats(tag_col='prop:representation_type', ranking=3, stopwords=['a', 'to', 'the'])

Above, every row shows the data for the different property values.

In [None]:
my_project.ac_dict[my_annotation_collection].tag_stats(tag_col='annotator', ranking=3)

Above, every row shows the data for the different annotators.

## Inter Annotator Agreement (IAA) with gitma