This code is the combination of:
- NeuroNLP2 ( -- paper: Deep Biaffine Attention for Neural Dependency Parsing (
- Neural RST Parser ( -- paper: Transition-based Neural RST Parsing with Implicit Syntax Features (
This code is used to extract:
- Latent feature of discourse units.
- Shallow feature of discourse units.
For more technical details, please refer to our paper:
Fajri Koto, Jey Han Lau, Timothy Baldwin. Improved Document Modelling with a Neural Discourse Parser. In Proceedings of the 2019 Australasian Language Technology Workshop, Sydney.
- Python 2.7
- Run
pip install -r requirements.txt
There are three main steps:
- Using standford corenlp. After downloading the appropriate stanford corenlp, please run
. Please make sure you put all the necessary files of stanford corenlp in this repo with a folder namestanford-corenlp
. - For the next two steps, please follow for:
- Converting XML file to CoNLL format.
- Segmenting CoNLL file to get EDUs. The output is *.merge file.
Now you are ready to extaract latent/shallow features as well as the RST tree.
- For latent feature, please run
- For shallow feature, please first run
and after that runpython
Note1: Please manually adjust all PATHs in the code as I have'nt implemented args.parse in the code.
Note2: Our RST parser performance is similar to Transition-based Neural RST Parsing with Implicit Syntax Features (