A Neural Approach to Quote Attribution in Dutch Literature

Master's thesis for the MSc Information Science at the University of Groningen.

In order to replicate our results, make sure to first clone the dutchcoref repository and install the required packages:
https://github.com/andreasvc/dutchcoref.

The OpenBoek files, including our annotated quote data, are available here:
https://andreasvc.github.io/openboek/.

Dialogue annotation

In order to train or evaluate the classifier, xml files containing the gold quotation information are needed. The following instructions are tailored to replicate the annotation of direct speech in Dutch novels.

Creating the xml files with silver information

Converter.py can be used to convert dutchcoref's booknlp format output to xml format, adding silver information about characters, quotes and mentions.

To create the xml files, use converter.py with the "create" action. Note that gold information about mentions, quotes and clusters is required.

$ python3 converter.py -a create -b goldfiles/Grunberg_HuidEnHaar/Grunberg_HuidEnHaar.conll
    -m goldfiles/Grunberg_HuidEnHaar/Grunberg_HuidEnHaar.mentions.tsv
    -q goldfiles/Grunberg_HuidEnHaar/Grunberg_HuidEnHaar.quotes.tsv
    -c goldfiles/Grunberg_HuidEnHaar/Grunberg_HuidEnHaar.clusters.tsv

The xml file containing silver information can then be corrected with the Quote Annotator tool: https://github.com/muzny/quoteannotator.
Our guidelines for annotating quotes in Dutch novels can be found here: guidelines

Updating the xml files after annotating

After using the annotation tool, use converter.py with the "update" action to complete the gold xml files with information about paragraph-, sentence- and token numbers. Providing the gold output.conll file in Dutchcoref's booknlp format is required:

$ python3 converter.py -a update -x Abdolah_Koning_annotated.xml -b goldfiles/Abdolah_Koning/Abdolah_Koning.conll

Classifier usage

The code and trained model files for each classifier can be found in the models directory.

Training and evaluating the standalone classifier

In order to get the speaker mention and cluster performance, we can run the classifiers as follows*:

$ python qaclassifier.py  -t '../riddlecoref/split/riddle/train/*.conll' -v '../riddlecoref/split/riddle/dev/*.conll' -p ../riddlecoref/parses/ -a ../riddlecoref/annotations/riddlecoref/

* If the model is already trained, providing the '-e' argument will just run the evaluation, preventing the classifier from training again.

Quote attribution within dutchcoref

To run the quote attribution classifier implemented within dutchcoref, make sure to first place all the model files in the dutchcoref folder. Now run the classifier as follows:

$ python3 coref.py --outputprefix /tmp/Gilbert_EtenBiddenBeminnen /riddlecoref/parses/Gilbert_EtenBiddenBeminnen/ --neural=quote

In order to evaluate the results, we use evalquotes.py, for which gold output from the dutchcoref system is required:

$ python3 evalquotes.py /riddlecoref/annotations/riddlecoref/Gilbert_EtenBiddenBeminnen.xml /dutchcoref/tmp/Gilbert_EtenBiddenBeminnen goldfiles/Gilbert_EtenBiddenBeminnen/Gilbert_EtenBiddenBeminnen

References

Dutchcoref original paper:
van Cranenburgh, Andreas. "A Dutch coreference resolution system with an evaluation on literary fiction." Computational Linguistics in the Netherlands Journal 9 (2019): 27-54. https://clinjournal.org/clinj/article/view/91

The Quote Annotator tool is from:
Grace Muzny, Michael Fang, Angel Fang and Dan Jurafsky. A Two-stage Sieve Approach to Quote Attribution. In Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), 2017, Valencia, Spain.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
models		models
README.md		README.md
annotation_guidelines.pdf		annotation_guidelines.pdf
annotation_guidelines.tex		annotation_guidelines.tex
converter.py		converter.py
evalquotes.py		evalquotes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Neural Approach to Quote Attribution in Dutch Literature

Dialogue annotation

Creating the xml files with silver information

Updating the xml files after annotating

Classifier usage

Training and evaluating the standalone classifier

Quote attribution within dutchcoref

References

About

Releases

Packages

Languages

frenkvdberg/dutchqa

Folders and files

Latest commit

History

Repository files navigation

A Neural Approach to Quote Attribution in Dutch Literature

Dialogue annotation

Creating the xml files with silver information

Updating the xml files after annotating

Classifier usage

Training and evaluating the standalone classifier

Quote attribution within dutchcoref

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages