Skip to content

quadrama/jcls2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Who Knows What in German Dramas? A Composite Annotation Scheme for Knowledge Transfer

This repository contains code and data to reproduce results reported in a publication submitted in the Journal of Computational Literary Studies.

Contents of this repository:

data

This folder contains the annotated plays that are reported in the article. The plays are provided both in the format as used by the annotation tool (CorefAnnotator), as well as CSV and TEI/XML files exported from the annotation tool. The CSV files are used for the analysis. The TEI files are used to investigate how many annotations per 1000 tokens occur in the texts, presented in Section 5.1.

section-4: Calculating Inter-Annotator Agreement

This folder contains the code needed to calculate inter-annotator agreement with Gamma.

With bash on a Unix system, you can run it with python3 iaa.py ../data/round-2/V1/csv/guenderode-udohla_0?.csv, to compare the two annotations of Günderrodes' Udohla. The output is a line formatted to be used as a LaTeX table.

To generate an entire table, you can use the following command:

for i in $( ls ../data/round-2/V1/csv/*01.csv)
do 
    python3 iaa.py $i ${i/01/02}
done

This will iterate over all files in data/round-2/V1, and call the python script for each file. The python script gets the versions by two annotators as arguments.

Performance

The script makes use of the pygamma-agreement library, which in turn relies on a highly optimized library for integer linear programming. Please follow their installation instructions to use the CBC solver.

section-5: Analysing Annotated Knowledge Transfers

Python script (Python version 3.10.1)

The python script can be run using the command

$ python3 annotations_per_x_tokens.py ../data --xtokens 1000

No further packages need to be installed.

R scripts (R version 4.1.2)

To install the needed packages for the R scripts, issue the following command in a R console:

> install.packages(c("DramaAnalysis", "ggplot2", "igraph", "kableExtra", "knitr", "reshape2", "tidyverse"))

All R scripts can either be run in RStudio or in the console using the command Rscript $PATH_TO_R_SCRIPT. The plots generated by the R scripts can be found in the folder plots after running the scripts.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published