# Explore annotations
This demo uses the demo CATMA project.
If you want to use it for your own annotations you first have to clone your CATMA project locally.
For further information about cloning your CATMA project see [this notebook](https://github.com/forTEXT/gitma/blob/main/demo_notebooks/load_project_from_gitlab.ipynb).

## Table of contents
* [Load a CATMA project](#1-bullet)
* [General project stats](#2-bullet)
* [Annotation overview for the entire project](#3-bullet)
* [Plot annotations for a specified annotation collection](#4-bullet)
  * [Scatter plot](#4.1)
  * [Cooccurrence network](#4.2)
* [Annotation collection as Pandas DataFrame](#5-bullet)
* [Annotation stats by tags](#6-bullet)
* [Annotation stats by properties / property values](#7-bullet)

## Load a CATMA project <a class="anchor" id="1-bullet"></a>

In [None]:
from gitma import CatmaProject

In [None]:
my_project = CatmaProject(
    project_directory='../test/demo_project/',
    project_name='test_corpus'
)

## General project stats <a class="anchor" id="2-bullet"></a>
The `stats()` method shows you some metadata about your annotation collections.

In [None]:
my_project.stats()

## Annotation overview for the entire project <a class="anchor" id="3-bullet"></a>
Using the method `plot_interactive()` the annotations of each annotation collection and each document are plotted as a single subplot.

The demo project contains only one annotated document but two annotation collections.
By clicking on the legend entries you can deactivate specific annotation collections within the plot.

By hovering over the scatter point every annotation can be explored.

In [None]:
my_project.plot_annotations()

The plot can be customized by the `color_col` parameter,
for example to visualize annotation properties

In [None]:
my_project.plot_annotations(color_col='prop:intentional')

... or the annotators...

In [None]:
my_project.plot_annotations(color_col='annotator')

## Plot annotations for a specified annotation collection <a class="anchor" id="4-bullet"></a>

### Scatter plot <a class="anchor" id="4.1"></a>
The annotations of single annotation collections can be plotted as an interactive [Plotly scatter plot](https://plotly.com/python/), too.
The annotations can be explored with respect to:
- their tag: y-axis
- their text position: x-axis
- the annotated text passages: mouse over
- their properties: mouse over

In [None]:
my_project.ac_dict['ac_2'].plot_annotations()

You can customize the plot by choosing annotation properties for the y_axis and the scatter color.

In [None]:
my_project.ac_dict['ac_2'].plot_annotations(y_axis='prop:mental')

In [None]:
my_project.ac_dict['ac_2'].plot_annotations(
    y_axis='annotator',
    color_prop='prop:unpredictable'
)

### Cooccurrence network <a class="anchor" id="4.2"></a>
An alternative way to visualize annotation collections is using networks.
They can be used to get an insight into the cooccurrence of annotations.

In [None]:
my_project.ac_dict['ac_1'].cooccurrence_network()

The networks can be customized by the following optional paramters:

- character_distance: the text span within which two annotations are considered to be cooccurrent. The default is 100 characters.
- included_tags: a list of tags that are included when drawing the graph
- excluded_tags: a list of tags that are excluded when drawing the graph

Since the demo project contains only 6 annotations with 2 tags in both annotation collections, these parameters don't make a difference:

In [None]:
my_project.ac_dict['ac_1'].cooccurrence_network(
    character_distance=50,
    included_tags=['process', 'stative_event'],
    excluded_tags=None
)

This example from a larger annotation collection shows that the edges' weights visualize the count of cooccurrences:

<img src="demo_img/network_example.png">

## Annotation collection as Pandas DataFrame <a class="anchor" id="5-bullet"></a>

In [None]:
my_project.ac_dict['ac_2'].df

## Annotation stats by tags <a class="anchor" id="6-bullet"></a>
The `tag_stats()` method counts, for each tag:
- the number of annotations
- the full text span annotated by the tag
- the average text span of the annotations
- the most frequent tokens (here, it is possible to define a stopword list)

In [None]:
my_project.ac_dict['ac_2'].tag_stats(ranking=5)

Additionally, you can use the method for properties (if you used any in the annotation process) and different annotators:

In [None]:
my_project.ac_dict['ac_2'].tag_stats(tag_col='prop:mental', ranking=3, stopwords=['in', 'im'])

Here, every row shows the data for the different property values.

In [None]:
my_project.ac_dict['ac_2'].tag_stats(tag_col='annotator', ranking=3)

Here, every row shows the data for the different annotators.

## Annotation stats by properties / property values <a class="anchor" id="7-bullet"></a>

In [None]:
my_project.ac_dict['ac_2'].property_stats()