Skip to content

Code, data and models for the paper "Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets" (Stanovsky, Eckle-Kohler, Puzikov, Dagan and Gurevych ACL 2017)

License

Notifications You must be signed in to change notification settings

gabrielStanovsky/unified-factuality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unified Factuality Representation - Corpus and Code

Introduction

Previous models for the assessment of commitment towards a predicate in a sentence (also known as factuality prediction) were trained and tested against a specific annotated dataset, subsequently limiting the generality of their results. In this work we propose an intuitive method for mapping three previously annotated corpora onto a single factuality scale, thereby enabling models to be tested across these corpora. In addition, we design a novel model for factuality prediction by first extending a previous rule-based factuality prediction system and applying it over an abstraction of dependency trees, and then using the output of this system in a supervised classifier.

In this repository you'll find both the converted corpus, as well as our factuality prediction model.

If you use this resource, please cite the following paper:

@InProceedings{stanovsky2017fact,
author    = {Stanovsky, Gabriel and Eckle-Kohler, Judith and Puzikov, Yevgeniy and Dagan, Ido and Gurevych, Iryna},
title     = {Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)},
month     = {August},
year      = {2017},
address   = {Vancouver, Canada}
}

Online Demo

Try a live demonstration by heading over to our Online Demo Page

Local Installation

Prerequisites

  1. python 2.7
  2. Java openjdk-8
Make sure that the JAVA_HOME variable is set accordingly.
    E.g., JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
  1. python-setuptools
  2. easy_install
  3. pip 9.x
  4. libxml
  5. libxslt
  6. NLTK with the WordNet corpus
pip install nltk
python -c "import nltk;nltk.download('wordnet')"
  1. spaCy with English models
pip install spacy
python -m spacy download en

Unified Dataset

For obtaining a snapshot of the unified dataset, please contact us.

Download

From src:

  1. Download external corpora:
./scripts/download_external_corpora.sh

NOTE: FactBank should be downloaded separately. Please login to LDC, download the corpus, and place it in the directory factbank_v1 under /data/external_annotations/.

  1. Install converter
./scripts/install_converter.sh
  1. Convert to a unified representation:
./scripts/convert_corpora.sh

The converted unified corpus should be created in the unified corpus directory.

Format

Each line corresponds to a word in the sentence, where the following values appear tab separated:

  1. Word index
  2. Surface form
  3. Factuality value (in [-3, +3])

An empty line separates between sentences.
Additional values may appear in tabs, depending on the input format For example, the unified corpus contains dependency parsing, and the automatic tools appends TruthTeller features (see Interactive Usage Examples).

Automatic Annotator

Installation

From src, run:

./scripts/install_annotator.sh

Running the Automatic Annotator

  1. Start servers:

    1. Start the spaCy server:
      Run ./scripts/run_spacy_server.sh
      This will open a server listening on port 8081 by default.
      Wait for the ENGINE Bus STARTED message to appear, indicating that the server is up.

    2. In a new terminal, start the PropS server:
      Run ./scripts/run_props_server.sh
      This will open a server listening on port 10345.
      Wait for the Listening on http://:8081/ message to appear, indicating that the server is up.

  2. Run client application:
    ./scripts/annotate_factuality.sh
    This will wait for input on STDIN and will output sentences with CoNLL factuality annotations to STDOUT.

NOTE: You can also run these scripts using different hosts and ports. See the scripts above for instructions on how to do this.

Usage Examples

Interactive
echo "John refused to go" | ./scripts/annotate_factuality.sh 
0       John    _       _       P       _       _
1       refused 3.0     -/?NoF  P       P       P
2       to      _       _       _       _       _
3       go      -3.0    +/-NoF  P       N       N
From files
cat ../examples/example_sentences.txt | ./scripts/annotate_factuality.sh > ../examples/example_sentences.fact.conll

Output can be seen in the CoNLL file.

Contact

gabriel (dot) satanovsky (at) gmail (dot) com

About

Code, data and models for the paper "Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets" (Stanovsky, Eckle-Kohler, Puzikov, Dagan and Gurevych ACL 2017)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published