Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

CoLA Baselines

Baselines accompanying paper Neural Network Acceptability Judgments. Check the CoLA website to download The Corpus of Linguistic Acceptability (CoLA), for more information, and for a demo model.


Training and validation sets for CoLA are available under acceptability_corpus/raw with a tokenized version available under tokenized. Test data (unlabeled) is available here: in domain out of domain. All models require tokenized data (we use the default NLTK tokenizer).


  • Python >= 3
  • PyTorch v0.3.0
  • TorchNet and TorchText
  • NLTK (Optional: For NLTK preprocessing)


Install Pytorch v0.3.0.

Then run:

git clone
cd acceptability-judgments
pip3 install -r requirements.txt
python develop

Run a simple training session by:

python acceptability/ -d acceptability_corpus/tokenized

This will use default classifier model and all of the default settings.


Our general model structure looks like figure below. Follow paper for more in-depth details.


Complex Run

Example of a command for running ELMo + Real/Fake on top of transferred encoder:

The directory containing data (value of -d) must contain three files, train.tsv, dev.tsv and test.tsv. Download vocabulary file used by us in our experiments from this link.

python acceptability/ -m linear_classifier -d data --save_loc save --vocab_file ./vocab_100k.tsv --logs_dir ./logs -g -r -p 40 -se 2 -n 1000 --encoder_path ./elmo_best_real_fake/experiment_lstm_pooling_elmo_h_528_l_3_lr_0.0001_e_360_do_0.2.pth --encoding_size 528 --embedding_size 217 --embedding_path ./elmo_best_real_fake/experiment_lstm_pooling_elmo_h_528_l_3_lr_0.0001_e_360_do_0.2.emb -lr 0.00005 -nl 3 -hs 1134 -do 0.2

Pretrained Models and Testing

Pretrained models are available at this link

To do a test run over a dataset:

  • Create your data-folder in same format as acceptability_corpus with train.tsv, dev.tsv and test.tsv.
  • Download one of the pretrained encoders
  • Download vocabulary file from this link
  • OPTIONAL: Train a classifier on CoLA using the command above
  • Run (with the -g flag to use GPU hardware) the following command, e.g. for the downloadable ELMo real/fake encoder (without CoLA training) it would be something like:
    python acceptability/ -mf elmo.pth -vf vocab_100k.tsv -ef elmo.emb -d data/test.tsv -g

To save the model predictions for each of the sentences in test.tsv, you can append to the last command the additional flag -o predictions.txt, which will write them in the predictions.txt file.


If you use CoLA or the baselines in your research, please cite the accompanying paper using the following entry:

  title={Neural network acceptability judgments},
  author={Warstadt, Alex and Singh, Amanpreet and Bowman, Samuel R},
  journal={Transactions of the Association for Computational Linguistics},
  publisher={MIT Press}


Baseline code is available under MIT license.

The text in this corpus is excerpted from the published works available on website, and copyright (where applicable) remains with the original authors or publishers. We expect that research use within the US is legal under fair use, but make no guarantee of this.


Baselines and corpus accompanying paper Neural Network Acceptability Judgments



No releases published


No packages published