Skip to content
Coreference Resolution With Entity Equalization
Python C++ Shell
Branch: master
Clone or download
Latest commit 5940754 Sep 23, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore first commit Jul 28, 2019
LICENSE first commit Jul 28, 2019
README.md
cache_elmo.py
conll.py first commit Jul 28, 2019
continuous_evaluate.py first commit Jul 28, 2019
coref_bert_model_2.py first commit Jul 28, 2019
coref_kernels.cc add sh and cc files; fix extract_bert_features.sh script Sep 1, 2019
coref_model.py
coref_ops.py first commit Jul 28, 2019
data.py
demo.py first commit Jul 28, 2019
evaluate.py first commit Jul 28, 2019
experiments.conf Fix experiments.conf Jul 29, 2019
extract_bert_features.sh add sh and cc files; fix extract_bert_features.sh script Sep 1, 2019
extract_features.py first commit Jul 28, 2019
filter_embeddings.py first commit Jul 28, 2019
get_char_vocab.py first commit Jul 28, 2019
metrics.py first commit Jul 28, 2019
minimize.py first commit Jul 28, 2019
modeling.py first commit Jul 28, 2019
optimization.py first commit Jul 28, 2019
predict.py first commit Jul 28, 2019
prepare_bert_data.py first commit Jul 28, 2019
ps.py first commit Jul 28, 2019
requirements.txt fix training; fix README; add requirements.txt Sep 23, 2019
setup_all.sh add sh and cc files; fix extract_bert_features.sh script Sep 1, 2019
setup_pretrained.sh add sh and cc files; fix extract_bert_features.sh script Sep 1, 2019
setup_training.sh add sh and cc files; fix extract_bert_features.sh script Sep 1, 2019
tokenization.py first commit Jul 28, 2019
train.py fix training; fix README; add requirements.txt Sep 23, 2019
train_mgpu.sh add sh and cc files; fix extract_bert_features.sh script Sep 1, 2019
util.py
worker.py first commit Jul 28, 2019

README.md

Coreference Resolution with Entity Equalization

Introduction

This repository contains the code for replicating results from

Getting Started

  • Install python (either 2 or 3) requirements: pip install -r requirements.txt
  • Download GloVe embeddings and build custom kernels by running setup_all.sh.
    • There are 3 platform-dependent ways to build custom TensorFlow kernels. Please comment/uncomment the appropriate lines in the script.
  • To train your own models, run setup_training.shand extract_bert_features.sh
    • This assumes access to OntoNotes 5.0. Please edit the ontonotes_path variable.

Training Instructions

  • Experiment configurations are found in experiments.conf
  • Choose an experiment that you would like to run, e.g. best
  • Training: python train.py <experiment>
  • Results are stored in the logs directory and can be viewed via TensorBoard.
  • Evaluation: python evaluate.py <experiment>

Demo Instructions

  • Command-line demo: python demo.py final
  • To run the demo with other experiments, replace final with your configuration name.

Batched Prediction Instructions

  • Create a file where each line is in the following json format (make sure to strip the newlines so each line is well-formed json):
{
  "clusters": [],
  "doc_key": "nw",
  "sentences": [["This", "is", "the", "first", "sentence", "."], ["This", "is", "the", "second", "."]],
  "speakers": [["spk1", "spk1", "spk1", "spk1", "spk1", "spk1"], ["spk2", "spk2", "spk2", "spk2", "spk2"]]
}
  • clusters should be left empty and is only used for evaluation purposes.
  • doc_key indicates the genre, which can be one of the following: "bc", "bn", "mz", "nw", "pt", "tc", "wb"
  • speakers indicates the speaker of each word. These can be all empty strings if there is only one known speaker.
  • Run python predict.py <experiment> <input_file> <output_file>, which outputs the input jsonlines with predicted clusters.

Other Quirks

  • It does not use GPUs by default. Instead, it looks for the GPU environment variable, which the code treats as shorthand for CUDA_VISIBLE_DEVICES.
  • The training runs indefinitely and needs to be terminated manually. The model generally converges at about 400k steps.
You can’t perform that action at this time.