This service generate multiple choice questions (MCQ) about generic text. To do this, I implement three modules:
- Keyword Extractor: to extract keywords from a passage
- Question Generator: to generate a question from a keyword and passage
- Distractors generator: to generate keyword distractors
There are three branches in github repository, one for each keyword extractor.
There are three alternatives to extract key phrases that have been implemented:
- keyBert[1] (based on Bert transformer)
- multipartiteGraph[2]
- Rake[3] (rapid automatic keyword extractor)
This module generate wrong alternatives, given keyword using sense2vec.[4]
T5 transformer is trained on SQUAD dataset to generate question from keyword and passage.
Each module is implemented as docker image. To build each one, execute build.sh
and to set up services, run run.sh
. To test if everything runs correctly, execute test.py
[1] Bert: Pre-training of deep bidirectional transformers for language understanding. Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina(2018).
[2] Unsupervised keyphrase extraction with multipartite graphs. Boudin, Florian(2018).
[3] Automatic keyword extraction from individual documents. Rose, Stuart and Engel, Dave and Cramer, Nick and Cowley, Wendy(2010).
[4] sense2vec-a fast and accurate method for word sense disambiguation in neural word embeddings. sTrask, Andrew and Michalak, Phil and Liu, John(2015).