A tool for detecting negation cues and scopes in natural language text.
Branch: master
Clone or download
Latest commit b1b9d14 Apr 9, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
examples Small bug fix Jan 31, 2017
objectfiles Object files Dec 8, 2016
README.md Added bib entry Apr 9, 2017
evaluation.py Description for evaluation in README Apr 2, 2017
feature_extraction.py
file_reading.py Renamed files, cleanup Jan 24, 2017
file_writing.py Renamed files, cleanup Jan 24, 2017
negtool.py Moved training to a separate file Mar 19, 2017
read_CD_file.py Added evaluation (end to end and with gold cues) for CD files and con… Apr 2, 2017
read_labelled_data.py Added support for re-training the models Mar 14, 2017
train.py Added evaluation (end to end and with gold cues) for CD files and con… Apr 2, 2017
utils.py Small bug fix Jan 31, 2017

README.md

negtool

A tool for detecting negation cues and scopes in natural language text, as described in the paper An open-source tool for negation detection: a maximum-margin approach (pdf) by M. Enger, E. Velldal & L. Øvrelid, presented at the 2017 SemBEaR workshop (slides).

Dependencies

In order to use negtool, the following libraries need to be installed:

  • numpy
  • scikit-learn
  • pystruct
  • networkx

Running the tool with raw text

To run negtool with raw text, you need to have CoreNLP installed. Run the tool with the command

python negtool.py -m raw -f <inputfile> -d <absolute path to corenlp>

Running the tool with parsed text

To run the tool with a parsed input file, the format of the file needs to be CoNLL_X, with the following information encoded:

  • Column 1: token index
  • Column 2: token
  • Column 3: lemma
  • Column 5: PoS-tag
  • Column 7: head index
  • Column 8: dependency relation

Run the tool with the command

python negtool.py -m parsed -f <inputfile>

Output

The output is a file where the first 8 columns are identical to the inputfile, and the following columns include cues and scopes encoded in the CD format. The column for events is included with the symbol "_". The name of the file is the same as the name of the input file, but with the extension .neg instead of the original file extension.

Training your own models

You can train your own cue and/or scope model with a new dataset. The dataset needs to be on CoNLL_X format with cues and scopes encoded on the CD format starting at column 9. Training is done by running

python train.py -m <model to train> -tf <training file> -cp <cue regularisation> -sp <scope regularisation>

For the -m option, the program accepts either cue, scope or all. The default value for both the cue regularisation parameter and the scope regularisation parameter is 0.20.

Evaluating the models

Our provided models and your own trained models can be evaluated with the 2012 *SEM evaluation script which can be found here. In order to get a file with predictions that can be evaluated with that script, you can run the program evaluation.py:

python evaluation.py -cm <cue model> -sm <scope model> -cl <cue lexicon> -acl <affixal cue lexicon> -cv <cue vectorizer> -sv <scope vectorizer> -tf <testfile> -cdf <parsed CD testfile> -ff <file format> -e2e <end to end/gold cues>

If you want to evaluate Conan Doyle (CD) files, you need to provide a parsed version of the file with dependency relations. If you evaluate Conll-X files, this is not necessary.

The file format argument accepts either "CD" for Conan Doyle files or "parsed" for Conll-X files.

If you want to evaluate end-to-end predictions, set the -e2e argument to true. If you want to evaluate scopes with gold cues, set the -e2e argument to false.

The output will be a file on the CD format (for every column, not just the cue columns) named "_evaluation_scopes.txt".

Citing

Please cite the following paper if you use the tool:

@InProceedings{Enger17,
  author    = {Enger, Martine  and  Velldal, Erik  and  {\O}vrelid, Lilja},
  title     = {An open-source tool for negation detection: a maximum-margin approach},
  booktitle = {Proceedings of the Workshop Computational Semantics Beyond Events and Roles},
  month     = {April},
  year      = {2017},
  address   = {Valencia, Spain},
  publisher = {Association for Computational Linguistics},
  pages     = {64--69}
}