DramaNLP

This repository contains a number of UIMA components to process dramatic texts, as well as an executable pipeline. We follow general design ideas implemented in DKPro Core. The full pipeline reads in files in several TEI/XML dialects (see below), and applies the most important NLP tools on them, while keeping the structural annotation of the plays intact (and, if necessary, processing different text layers separately).

Compiling from source

Clone the repository: git clone https://github.com/quadrama/DramaNLP.git
Enter the directory: cd DramaNLP
- If necessary, switch to a branch git checkout develop/1.0
Download dependencies, compile everything and install it locally: mvn compile install This produces a lot of output, but at the end, you should see something like BUILD SUCCESS
To compile a runnable binary, enter the directory: cd de.unistuttgart.ims.drama.main and run mvn package. This creates a file called drama.Main.jar in the directory target/assembly/. This file contains the code and all its dependencies.

Running entire pipeline

As an example, we'll work on the data from the GerDraCor collection (which is based on TextGrid). Download the files from GitHub and store the XML files in a directory. We will call the directory $TEIDIR in the following examples. The directory $OUTDIR is used to store the output of the pipeline. You'll need the file drama.Main.jar.

Enter the following command in the command line interface: java -cp target/assembly/drama.Main.jar de.unistuttgart.ims.drama.main.TEI2XMI --input $TEIDIR --output $OUTDIR/xmi --csvOutput $OUTDIR/csv --conllOutput $OUTDIR/conll --skipSpeakerIdentifier --corpus GERDRACOR --collectionId "gdc" --doCleanup

After running, the directory $OUTDIR contains three sub directories, xmi, csv and conll, which are different file formats for the plays.

TEI/XML dialects

This package supports the following drama corpora

TextGrid (German)
GerDraCor (German)
theatre classique (French)

Name		Name	Last commit message	Last commit date
Latest commit History 1,278 Commits
de.unistuttgart.ims.drama.api		de.unistuttgart.ims.drama.api
de.unistuttgart.ims.drama.core.cr		de.unistuttgart.ims.drama.core.cr
de.unistuttgart.ims.drama.core.ml		de.unistuttgart.ims.drama.core.ml
de.unistuttgart.ims.drama.core		de.unistuttgart.ims.drama.core
de.unistuttgart.ims.drama.io.core		de.unistuttgart.ims.drama.io.core
de.unistuttgart.ims.drama.io.tei.folger		de.unistuttgart.ims.drama.io.tei.folger
de.unistuttgart.ims.drama.main		de.unistuttgart.ims.drama.main
de.unistuttgart.ims.drama.util		de.unistuttgart.ims.drama.util
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DramaNLP

Compiling from source

Running entire pipeline

TEI/XML dialects

About

Releases 18

Packages

Contributors 4

Languages

License

quadrama/DramaNLP

Folders and files

Latest commit

History

Repository files navigation

DramaNLP

Compiling from source

Running entire pipeline

TEI/XML dialects

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 18

Packages 0

Contributors 4

Languages

Packages