# Gitma Introduction Notebook 

In the following Notebook, you will learn how to import and process your CATMA Annotations with the Python package Gitma. 



## Table of Contents

## Get a Catma access token

To get access to your annotations on Catma you need to get a personal access Token. You can get this token on the CATMA Website after logging into your account. 

![Get access token](img/access_token_ui.png)

## Import the `Catma` class and load your CATMA profile

In [3]:

from gitma import Catma

# my_access_token = 'insert your access token here'
# my_catma = Catma(gitlab_access_token=my_access_token)



First, let's look at your CATMA projects

In [4]:
# my_catma.project_name_list


## Clone and load a CATMA project

The `Catma` class instance can be used to clone and load a CATMA project. The only neccessary argument is the project's name. Optionally, a different destination directory can be specified.


In [5]:

my_project_name = 'GitMA_Demo_Project'

# my_catma.load_project_from_gitlab(
#     project_name=my_project_name, 
#     backup_directory='projects/'
# )

If a project was previously loaded from CATMA's GitLab backend, and you try to do so again, the operation will fail because the project already exists in the destination directory. If you want to fetch a fresh copy (that is, clone the project again) you need to delete or rename the existing project directory. Once you got your project from GitLab you can load it as a CatmaProject as shown below.

In [6]:
from gitma import CatmaProject

my_project = CatmaProject(
    projects_directory='projects/',
    project_name=my_project_name
)

Loading tagsets ...
	Found 1 tagset(s).
Loading documents ...
	Found 1 document(s).
Loading annotation collections ...
	Found 3 annotation collection(s).
	Annotation collection "gold_annotation" for document "The Metamorphosis"
		Annotations: 0
	Annotation collection "ac_2" for document "The Metamorphosis"
		Annotations: 19
	Annotation collection "ac_1" for document "The Metamorphosis"
		Annotations: 20


## General project stats

The `stats()` method shows you some metadata about your annotation collections.

In [7]:
my_project.stats()

Unnamed: 0,annotation collection,document,annotations,annotator,tag,first_annotation,last_annotation,uuid
0,ac_1,The Metamorphosis,20,{MVauth},"{change_of_state, process_event, stative_event...",2023-08-02 15:43:57.086000+02:00,2023-08-02 15:55:27.298000+02:00,C_F8552E35-F9BD-4544-A8C5-BD9114C80E43
1,ac_2,The Metamorphosis,19,{mvgoogle},"{change_of_state, process_event, stative_event...",2023-08-03 13:29:00.085000+02:00,2023-08-03 13:37:13.986000+02:00,C_D32D3AFF-98A6-4596-9B32-ABDC66445290



## Annotation overview for the entire project

Using the method `plot_annotations()` the annotations of each annotation collection and each document are plotted as a single subplot.
By clicking on the legend entries you can deactivate specific annotation collections within the plot. By hovering over the scatter point every annotation can be explored.


In [8]:
my_project.plot_annotations()

The plot can be customized by the `color_col` parameter, for example to visualize annotation properties...

In [9]:
my_project.plot_annotations(color_col='prop:representation_type')

.. or the annotators...

In [10]:
my_project.plot_annotations(color_col='annotator')


## Plot annotations for a specified annotation collection

For this we need to specify one annotation collection. To get an overview over all annotation collections in our project we can use the `annotation_collections`attribute of the CatmaProject class that contains a list of all annotation collections. We inspect the list as schown below.

In [11]:
for ac in my_project.annotation_collections:
    print(ac.name)

gold_annotation
ac_2
ac_1


We can now specify the annotation collection that we want to inspect. 

In [12]:
my_annotation_collection = 'ac_2'


### Scatter plot

The annotations of single annotation collections can be plotted as an interactive Plotly scatter plot, too. The annotations can be explored with respect to:

- their tag: y-axis
- their text position: x-axis
- the annotated text passages: mouse over
- their properties: mouse over

In [13]:
my_project.ac_dict[my_annotation_collection].plot_annotations()

You can customize the plot by choosing annotation properties for the y_axis and the scatter color.

In [14]:
my_project.ac_dict[my_annotation_collection].plot_annotations(y_axis='prop:representation_type')

In [15]:
my_project.ac_dict[my_annotation_collection].plot_annotations(
    y_axis='annotator',
    color_prop='prop:representation_type'
)

### Cooccurrence network

An alternative way to visualize annotation collections is using networks. They can be used to get an insight into the cooccurrence of annotations.

In [16]:
my_project.ac_dict[my_annotation_collection].cooccurrence_network()


The networks can be customized by the following optional parameters:

    character_distance: the text span within which two annotations are considered to be cooccurrent. The default is 100 characters.
    included_tags: a list of tags that are included when drawing the graph
    excluded_tags: a list of tags that are excluded when drawing the graph

In [17]:
# TODO Evtl. an Demo Projekt anpassen 
my_project.ac_dict[my_annotation_collection].cooccurrence_network(
    character_distance=50,
    included_tags=['process_event', 'stative_event'],
    excluded_tags=None
)

### Annotation collection as Pandas DataFrame

In [18]:
my_project.ac_dict[my_annotation_collection].df

Unnamed: 0,document,annotation collection,annotator,tag,tag_path,left_context,annotation,right_context,start_point,end_point,date,prop:representation_type
0,The Metamorphosis,ac_2,mvgoogle,process_event,/process_event,,"One morning, as Gregor Samsa was waking up fro...",", he discovered that in bed he had been change...",0,62,2023-08-03 13:29:00.085000+02:00,narrator_speech
1,The Metamorphosis,ac_2,mvgoogle,change_of_state,/change_of_state,s Gregor Samsa was waking up from anxious drea...,he discovered that in bed he had been changed ...,". He lay on his armour-hard back and saw, as h...",64,140,2023-08-03 13:29:53.994000+02:00,narrator_speech
2,The Metamorphosis,ac_2,mvgoogle,stative_event,/stative_event,had been changed into a monstrous verminous b...,He lay on his armour-hard back,"and saw, as he lifted his head up a little, h...",142,172,2023-08-03 13:30:39.524000+02:00,narrator_speech
3,The Metamorphosis,ac_2,mvgoogle,process_event,/process_event,ous verminous bug. He lay on his armour-hard b...,"and saw his brown, arched abdomen divided up i...",". From this height the blanket, just about rea...",173,282,2023-08-03 13:31:05.450000+02:00,narrator_speech
4,The Metamorphosis,ac_2,mvgoogle,process_event,/process_event,nous bug. He lay on his armour-hard back and s...,as he lifted his head up a little,", his brown, arched abdomen divided up into ri...",182,215,2023-08-03 13:31:24.227000+02:00,narrator_speech
5,The Metamorphosis,ac_2,mvgoogle,stative_event,/stative_event,abdomen divided up into rigid bow-like sectio...,"From this height the blanket, just about ready...",". His numerous legs, pitifully thin in compari...",284,382,2023-08-03 13:31:54.828000+02:00,narrator_speech
6,The Metamorphosis,ac_2,mvgoogle,stative_event,/stative_event,"slide off completely, could hardly stay in pla...","His numerous legs, pitifully thin in compariso...",". ""What's happened to me,"" he thought. It was no",384,502,2023-08-03 13:32:32.568000+02:00,narrator_speech
7,The Metamorphosis,ac_2,mvgoogle,non_event,/non_event,"mference, flickered helplessly before his eyes. """,What's happened to me,","" he thought. It was no dream. His room, a pr...",506,527,2023-08-03 13:32:45.089000+02:00,thought_representation
8,The Metamorphosis,ac_2,mvgoogle,process_event,/process_event,"lessly before his eyes. ""What's happened to me,""",he thought,". It was no dream. His room, a proper room for...",530,540,2023-08-03 13:32:57.159000+02:00,narrator_speech
9,The Metamorphosis,ac_2,mvgoogle,non_event,/non_event,"e his eyes. ""What's happened to me,"" he thought.",It was no dream,". His room, a proper room for a human being, o...",542,557,2023-08-03 13:33:33.958000+02:00,narrator_speech


## Annotation stats by tags

The `tag_stats()` method counts, for each tag:

- the number of annotations
- the full text span annotated by the tag
- the average text span of the annotations
- the most frequent tokens (here, it is possible to define a stopword list)

In [19]:
my_project.ac_dict[my_annotation_collection].tag_stats(ranking=5)

Unnamed: 0,annotations,text_span,text_span_mean,token1,token2,token3,token4,token5
process_event,6,310,51.666667,up: 3,he: 3,a: 3,as: 2,and: 2
change_of_state,1,76,76.0,he: 2,discovered: 1,that: 1,in: 1,bed: 1
stative_event,10,702,70.2,a: 8,the: 7,of: 4,his: 3,in: 3
non_event,2,36,18.0,Whats: 1,happened: 1,to: 1,me: 1,It: 1


Additionally, you can use the method for properties (if you used any in the annotation process) and different annotators:

In [20]:
my_project.ac_dict[my_annotation_collection].tag_stats(tag_col='prop:representation_type', ranking=3, stopwords=['a', 'to', 'the'])

Unnamed: 0,annotations,text_span,text_span_mean,token1,token2,token3
narrator_speech,18,1103,61.277778,was: 5,he: 5,in: 5
thought_representation,1,21,21.0,Whats: 1,happened: 1,me: 1


Above, every row shows the data for the different property values.

In [21]:
my_project.ac_dict[my_annotation_collection].tag_stats(tag_col='annotator', ranking=3)

Unnamed: 0,annotations,text_span,text_span_mean,token1,token2,token3
mvgoogle,19,1124,59.157895,a: 12,the: 7,was: 5


Above, every row shows the data for the different annotators.

## Inter Annotator Agreement (IAA) with gitma

### `get_iaa`

For every annotation in annotation collection 1 (ac1_name_or_inst) the get_iaa method searches for the best matching annotation in annotation collection 2 (ac2_name_or_inst) with respect to its annotation text span.

First, we will take look at both annotation collections by comparing the annotation spans.

In [22]:
# compare the annotation collections by start point
my_project.compare_annotation_collections(
    annotation_collections=['ac_1', 'ac_2']
)

As the line plot shows, every annotation in annotation collection 'ac_1' has a matching annotation in annotation collection 'ac_2'.

Now, let's compute the IAA for all matching annotations:

In [23]:
my_project.get_iaa(
    ac1_name_or_inst='ac_1',
    ac2_name_or_inst='ac_2'
)


Finished search for matching annotations in:
- ac_1
- ac_2
20 annotation(s) could be matched.
Average overlap is 97.54%.
Couldn't match 0 annotation(s) in the first annotation collection.

[(Annotation(Author: MVauth, Tag: Tag(Name: change_of_state, Properties: [Property(Name: representation_type), Default Values: ['narrator_speech', 'character_speech', 'thought_representation'])]), Properties: {'representation_type': ['narrator_speech']}, Start Point: 0, End Point: 62, Text: One morning, as Gregor Samsa was waking up from anxious dreams, ), Annotation(Author: mvgoogle, Tag: Tag(Name: process_event, Properties: [Property(Name: representation_type), Default Values: ['narrator_speech', 'character_speech', 'thought_representation'])]), Properties: {'representation_type': ['narrator_speech']}, Start Point: 0, End Point: 62, Text: One morning, as Gregor Samsa was waking up from anxious dreams, )), (Annotation(Author: MVauth, Tag: Tag(Name: process_event, Properties: [Property(Name: represe

Unnamed: 0,change_of_state,process_event,stative_event,non_event
change_of_state,0,1,0,0
process_event,1,5,0,0
stative_event,0,1,10,0
non_event,0,0,0,2


The get_iaa method not only returns 3 different agreement scores, but also reports the number of annotation pairs considered when computing the IAA scores and the average overlap of the annotation pairs. Additionally, the method returns a confusion matrix to give an insight into the relation between the tags. As you can see in the matrix, in 2 cases an annotation with the tag 'non_event' in annotation collection 1 has a best match in annotation collection 2 with the same tag. Compare this with the line plot above.

### Filter by tags

There may occur cases in which you don't want to include all annotations in the computing of the IAA scores. In those cases just use the tag_filter parameter, which expects a list of tag names.

In [24]:
my_project.get_iaa(
    ac1_name_or_inst='ac_1',
    ac2_name_or_inst='ac_2',
    tag_filter=['process_event']
)


Finished search for matching annotations in:
- ac_1
- ac_2
7 annotation(s) could be matched.
Average overlap is 100.0%.
Couldn't match 0 annotation(s) in the first annotation collection.

[(Annotation(Author: MVauth, Tag: Tag(Name: process_event, Properties: [Property(Name: representation_type), Default Values: ['narrator_speech', 'character_speech', 'thought_representation'])]), Properties: {'representation_type': ['narrator_speech']}, Start Point: 64, End Point: 140, Text: he discovered that in bed he  had been changed into a monstrous verminous bug, ), Annotation(Author: mvgoogle, Tag: Tag(Name: change_of_state, Properties: [Property(Name: representation_type), Default Values: ['narrator_speech', 'character_speech', 'thought_representation'])]), Properties: {'representation_type': ['narrator_speech']}, Start Point: 64, End Point: 140, Text: he discovered that in  bed he had been changed into a monstrous verminous bug, )), (Annotation(Author: MVauth, Tag: Tag(Name: process_event, Pr

Unnamed: 0,change_of_state,process_event,stative_event
change_of_state,0,1,0
process_event,0,5,0
stative_event,0,1,0


### Compare annotation properties

The tag is only one level of CATMA annotations. If you want to compare annotations by their properties this is possible too. In the demo project the annotations have the property 'representation_type' to evaluate if a speech or mental event is referenced in the text:

In [25]:
my_project.compare_annotation_collections(
    annotation_collections=['ac_1', 'ac_2'],
    color_col='prop:representation_type'
)

To compute the agreement of annotation properties you just need to use the level parameter:

In [26]:
my_project.get_iaa(
    ac1_name_or_inst='ac_1',
    ac2_name_or_inst='ac_2',
    level='prop:representation_type'
)


Finished search for matching annotations in:
- ac_1
- ac_2
20 annotation(s) could be matched.
Average overlap is 97.54%.
Couldn't match 0 annotation(s) in the first annotation collection.

[(Annotation(Author: MVauth, Tag: Tag(Name: change_of_state, Properties: [Property(Name: representation_type), Default Values: ['narrator_speech', 'character_speech', 'thought_representation'])]), Properties: {'representation_type': ['narrator_speech']}, Start Point: 0, End Point: 62, Text: One morning, as Gregor Samsa was waking up from anxious dreams, ), Annotation(Author: mvgoogle, Tag: Tag(Name: process_event, Properties: [Property(Name: representation_type), Default Values: ['narrator_speech', 'character_speech', 'thought_representation'])]), Properties: {'representation_type': ['narrator_speech']}, Start Point: 0, End Point: 62, Text: One morning, as Gregor Samsa was waking up from anxious dreams, )), (Annotation(Author: MVauth, Tag: Tag(Name: process_event, Properties: [Property(Name: represe

Unnamed: 0,thought_representation,character_speech,narrator_speech
thought_representation,0,1,0
character_speech,0,0,0
narrator_speech,0,0,19
