Skip to content
Unsupervised Question answering via Cloze Translation
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information. first commit Jun 10, 2019 first commit Jun 10, 2019
LICENSE spelling mistake on License Jun 11, 2019 added arxiv links Jun 27, 2019


Code, Data and models supporting the experiments in the ACL 2019 Paper: Unsupervised Question Answering by Cloze Translation.

Obtaining training data for Question Answering (QA) is time-consuming and resource-intensive, and existing QA datasets are only available for limited domains and languages. In this work, we take some of the first steps towards unsupervised QA, and develop an approach that, without using the SQuAD training data at all, achieves 56.4 F1 on SQuAD v1.1, and 64.5 F1 when the answer is a named entity mention.


This repository provides code to run pre-trained models to generate sythetic question answering question data. We also make a very large synthetic training dataset for extractive question answering available.

NOTE: The data is available for download now, the code and pre-trained models are coming soon.

Dataset Downloads

We make available a dataset of 5 million SQuAD-like question answering datapoints, automatically generated by the unsupervised system described in the system.

The data can be downloaded here. The data is in the SQuAD v1 format, and contains:

Fold # Paragraphs # QA pairs
unsupervised_qa_train.json 782,556 3,915,498
unsupervised_qa_dev.json 1,000 4,795
unsupervised_qa_test.json 1,000 4,804

Using this training data to fine-tune BERT-Large for reading comprehension, you should be able to achieve over 50.0 F1 on the SQuAD V1.1 development set.

Models and Code

Pre-trained models and the code to run them are coming soon.


Please cite [1] and [2] if you found the resources in this repository useful.

Unsupervised Question Answering by Cloze Translation

[1] P. Lewis, L. Denoyer, S. Riedel Unsupervised Question Answering by Cloze Translation

  title={Unsupervised Question Answering by Cloze Translation},
  author={Lewis, Patrick and Denoyer, Ludovic and Riedel, Sebastian},
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},

Phrase-Based & Neural Unsupervised Machine Translation

[2] G. Lample, M. Ott, A. Conneau, L. Denoyer, MA. Ranzato Phrase-Based & Neural Unsupervised Machine Translation

  title={Phrase-Based \& Neural Unsupervised Machine Translation},
  author={Lample, Guillaume and Ott, Myle and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},


See the LICENSE file for more details.

You can’t perform that action at this time.