Skip to content
Baselines and corpus accompanying paper Neural Network Acceptability Judgments
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
acceptability
acceptability_corpus
.editorconfig
.gitignore debug on prince Jul 8, 2018
README.md
requirements.txt
setup.py

README.md

CoLA Baselines

Baselines accompanying paper Neural Network Acceptability Judgments. Check the CoLA website to download The Corpus of Linguistic Acceptability (CoLA), for more information, and for a demo model.

Dataset

Training and validation sets for CoLA are available under acceptability_corpus/raw with a tokenized version available under tokenized. Test data (unlabeled) is available here: in domain out of domain. All models require tokenized data (we use the default NLTK tokenizer).

Requirements

  • Python >= 3
  • PyTorch v0.3.0
  • TorchNet and TorchText
  • NLTK (Optional: For NLTK preprocessing)

Running

Install Pytorch v0.3.0.

Then run:

git clone https://github.com/nyu-mll/acceptability-judgments.git
cd acceptability-judgments
pip3 install -r requirements.txt
python setup.py develop

Run a simple training session by:

python acceptability/run.py -d acceptability_corpus/tokenized

This will use default classifier model and all of the default settings.

Model

Our general model structure looks like figure below. Follow paper for more in-depth details.

Model

Complex Run

Example of a command for running ELMo + Real/Fake on top of transferred encoder:

The directory containing data (value of -d) must contain three files, train.tsv, dev.tsv and test.tsv. Download vocabulary file used by us in our experiments from this link.

python acceptability/run.py -m linear_classifier -d data --save_loc save --vocab_file ./vocab_100k.tsv --logs_dir ./logs -g -r -p 40 -se 2 -n 1000 --encoder_path ./elmo_best_real_fake/experiment_lstm_pooling_elmo_h_528_l_3_lr_0.0001_e_360_do_0.2.pth --encoding_size 528 --embedding_size 217 --embedding_path ./elmo_best_real_fake/experiment_lstm_pooling_elmo_h_528_l_3_lr_0.0001_e_360_do_0.2.emb -lr 0.00005 -nl 3 -hs 1134 -do 0.2

Pretrained Models and Testing

Pretrained models are available at this link

To do a test run over a dataset:

  • Create your data-folder in same format as acceptability_corpus with train.tsv, dev.tsv and test.tsv.
  • Download one of the pretrained models
  • Download vocabulary file from this link
  • Run (with the -g flag to use GPU hardware) the following command, for e.g ELMo pretrained it would be something like:
    python acceptability/test.py -mf elmo.pth -vf vocab_100k.tsv -ef elmo.emb -d data/test.tsv -g

To save the model predictions for each of the sentences in test.tsv, you can append to the last command the additional flag -o predictions.txt, which will write them in the predictions.txt file.

Cite

If you use CoLA or the baselines in your research, please cite the accompanying paper using the following entry:

@article{warstadt2018neural,
  title={Neural Network Acceptability Judgments},
  author={Warstadt, Alex and Singh, Amanpreet and Bowman, Samuel R},
  journal={arXiv preprint arXiv:1805.12471},
  year={2018}
}

License

Baseline code is available under MIT license.

The text in this corpus is excerpted from the published works available on website, and copyright (where applicable) remains with the original authors or publishers. We expect that research use within the US is legal under fair use, but make no guarantee of this.

You can’t perform that action at this time.