Skip to content

indrajithi/question_generation

Repository files navigation

Automatic Question Generation

Usage

  usage: genq.py [-h] [-file FILE] [-output OUTPUT] [-start_page START_PAGE]
                 [-number_of_pages NUMBER_OF_PAGES] [--all] [--verbose]
                 [--dir DIR]

  optional arguments:
    -h, --help            show this help message and exit
    -file FILE, -f FILE   input file location
    -output OUTPUT, -o OUTPUT
                          output file location
    -start_page START_PAGE, -s START_PAGE
                          page to start reading from
    -number_of_pages NUMBER_OF_PAGES, -n NUMBER_OF_PAGES
                          number of pages to read
    --all                 process all the pages from the start_page
    --verbose, -v         verbose
    --dir DIR, -d DIR     input directory

Algorithm

START

  1. Accept a pdf or text file.

  2. Clean the text and remove special characters.

  3. Load the knowledge base .pkl which contains {set} of unique generated questions.

  4. Beak the text in to sentences and do POS tagging.

  5. Loop through all sentences and check if sentence contain NOUN/PRP ie., ['NN', 'NNS', 'PRP', 'NNP', 'NNPS', 'PRP$'].

    If true go to step 6 else continue the loop in step 5

  6. Start from the index of Noun found and check if VERB/PRP is following.

  7. Understand the tense of the VERB/PRP and also check if the noun is he/she/it/they.

  8. Form a question based on the above rules.

  9. Lemmatize the Verb and also change the tense of the question to future.

  10. Generalize the question by removing personal reference and remove possessive pronoun: her, his , mine.

  11. Verify the question is not generated previously and add it to the knowledge base

    If sentences are remaining to be processed go to step 5 else go to step 12

  12. Save the metadata and update the knowledge base and export the questions to csv from the metadata.

END

Reference

Alphabetical list of part-of-speech tags used in the Penn Treebank Project

Automatic Factual Question Generation from Text

TextBlob: Simplified Text Processing

Automatic Question Generation from Paragraph

K2Q: Generating Natural Language Questions from Keywords with User Refinements

Infusing NLU into Automatic Question Generation

Literature Review of Automatic Question Generation Systems

Neural Question Generation from Text: A Preliminary Study

Learning to Ask: Neural Question Generation for Reading Comprehension [Apr 2017]

SQuAD: The Stanford Question Answering Dataset

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages