AutoQ: Improving reading comprehension through automatic question generation

Repository for Question Generation project, W210 Capstone for UCB MIDS. Julia Buffinton, Saurav Datta, Joanna Huang, Kathryn Plath

About Our Project

AutoQ is the first free web app with automatically generated reading comprehension questions targeted to English language learners. Our product utilizes state of the art machine learning and natural language processing techniques to create a vast and topical collection of multiple-choice questions that mimic English-language exam formats. We currently offer over 100,000 practice questions on over 10,000 Wikipedia articles.

Our site is designed to help improve reading comprehension for English-language learners or for anyone looking to study up on a new topic. Starting with the articles from Wikipedia, we have developed algorithms to produce reading comprehension questions designed to mimic the style of questions seen on the TOEFL exam. Improving reading comprehension is achieved from practice on new material.

Do it Yourself

We employed several techniques to achieve this result. The below instructions walk through our approach. Input and output folders can be updated from within the scripts.

1. Preprocess Wikipedia articles

Raw text to Wikipedia articles was obtained from Wikimedia dumps. For the rest of the pipeline, it shoudl be formatted the same was as SQuAD datasets.

From this directory, run the preprocessing script to break up the Wikipedia articles into paragraphs and save them into the wikipedia_squad folder.

sh preprocess.sh

2. Select relevant sentences to query

Using the paragraphs from the Wikipedia articles, we identify the most “important” sentences to ask questions about. In general, the first and last sentences of each paragraphs often introduce key information or summarize information in the paragraph, so we examine those. We assume that words that appear most frequently are most important (and likely good indicators of the main topic), and therefore the most “important” sentences contain these frequent words.

From this directory, run the sentence selection script to select important sentences and save them, labeled with their location (in labeled_sentences) and unlabled for question generation (in unlabeled_sentences).

sh sentence_selection.sh

3. Question Generation

We then feed these important sentences into our Question Generation model, an attention-based bidirectional LSTM, inspired by Du et al.'s 2017 paper, Learning to Ask: Neural Question Generation for Reading Comprehension. We implement it using Torch on top of the Open Neural Machine Translation framework. Our approach is adapted from GenerationQ.

From this directory, run the question generation script to generate questions for the important sentences, save them in the questions folder, and add them back to the SQuAD-formatted Wikipedia articles (done with a call to add_questions.py, overwriting the files in wikipedia_squad).

sh question_generation.sh

3. Question Answering

To make these questions useful as a tool for reading comprehension, we must also generate their answers, so users can compare their results. We generate these also with a bidirectional LSTM, which we chose for its relative simplictiy and competitive performance on our validation set. Our implementation was modelled after part of Facebook’s 2017 paper Reading Wikipedia to Answer Open-Domain Questions, and implemented via PyTorch.

From this directory, run the question answering script to generate answers to our questions, save them in the answers folder, and add them back to the SQuAD-formatted Wikipedia articles (done with a call to add_answers.py, saving the files in wikipedia_squad_w_answers).

sh question_answering.sh

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.ipynb_checkpoints		.ipynb_checkpoints
AutoQ		AutoQ
EDA		EDA
TopicModeling		TopicModeling
archive		archive
evaluation		evaluation
sentence_selection		sentence_selection
utilities		utilities
wikipedia_data		wikipedia_data
wrong_answer_gen		wrong_answer_gen
Final Presentation.pdf		Final Presentation.pdf
README.md		README.md
add_answers.py		add_answers.py
add_questions.py		add_questions.py
preprocess.py		preprocess.py
preprocess.sh		preprocess.sh
question_answering.sh		question_answering.sh
question_generation.sh		question_generation.sh
sentence_selection.py		sentence_selection.py
sentence_selection.sh		sentence_selection.sh
wrong_answer_gen.py		wrong_answer_gen.py
wrong_answer_gen.sh		wrong_answer_gen.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoQ: Improving reading comprehension through automatic question generation

About Our Project

Do it Yourself

1. Preprocess Wikipedia articles

2. Select relevant sentences to query

3. Question Generation

3. Question Answering

About

Releases

Packages

Contributors 4

Languages

juliabuffinton/w210-capstone-qg

Folders and files

Latest commit

History

Repository files navigation

AutoQ: Improving reading comprehension through automatic question generation

About Our Project

Do it Yourself

1. Preprocess Wikipedia articles

2. Select relevant sentences to query

3. Question Generation

3. Question Answering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages