# Question Generation

Purpose: Given an article output a list of sentences
1. Parse Article into sentences
2. From each sentence, generate Stanford dependency parse tree
3. From each parse tree, use rule based method to generate question from sentence.
4. Refine the sentences using language models.

### Article -> Sentences

In [2]:
import nltk

In [3]:
with open('./noun_counting_data/a1.txt', 'r') as f:
    content = f.read()

In [4]:
sentences = nltk.sent_tokenize(content)
print(sentences)

["Gyarados\n\n\nGyarados (ギャラドス, Gyaradosu,  or ) is a Pokémon species in Nintendo and Game Freak's Pokémon franchise.", 'Created by Ken Sugimori, Gyarados first appeared in the video games Pokémon Red and Pokemon Green and subsequent sequels, later appearing in various merchandise, spinoff titles and animated and printed adaptations of the franchise.', 'Gyarados is voiced by Unshō Ishizuka in both Japanese and English media.', 'Known as the Atrocious Pokémon, Gyarados is the evolved form of Magikarp and it is well known in the Pokémon world for its fierce temper as well as its reputation for causing nothing but destruction so much so that once it has worked itself into a frenzy, it will not calm down until everything around it has been destroyed.', 'Gyarados appears multiple times in the anime under various trainers such as Misty, Lance, Crasher Wake, and Nurse Joy.', 'Two different Gyarados appear in the Pokémon Adventures manga.', 'One is originally owned by Misty, but is traded bet

### Sentences -> Parse Trees

In [5]:
from nltk.parse.corenlp import CoreNLPServer
from nltk.parse.corenlp import CoreNLPParser
from nltk.parse.corenlp import CoreNLPDependencyParser
import os

In [6]:
STANFORD = os.path.join("models", "stanford-corenlp-full-2018-10-05")

# Create the server
server = CoreNLPServer(
   os.path.join(STANFORD, "stanford-corenlp-3.9.2.jar"),
   os.path.join(STANFORD, "stanford-corenlp-3.9.2-models.jar"),    
)
server.start()

Download Stanford Parser: https://nlp.stanford.edu/software/lex-parser.shtml#Download Version 3.9.2

In [7]:
from nltk.tree import Tree
parser = CoreNLPParser()

In [8]:
def list_to_string(word_list):
    return ' '.join(word_list)

def tree_to_string(parsed_tree):
#     if isinstance(parsed_tree, str):
#         return parsed_tree
#     words = []
#     for subtree in parsed_tree:
#         words.append(tree_to_string(subtree))
    return list_to_string(parsed_tree.leaves())

def binary_question_from_tree(parsed_tree):
    sentence = parsed_tree[0]
    assert(sentence.label() == 'S')
    np = sentence[0]
    vp = sentence[1]
    assert(np.label() == 'NP')
    assert(vp.label() == 'VP')
    if vp[0].label() == 'VBZ':
        return list_to_string([vp[0][0].capitalize(), tree_to_string(np), tree_to_string(vp[1])]) + '?'
    return vp[0]

In [None]:
#Sentence Structure Tree
class SST():
    def __init__(self, label, children):
        self.label = label
        self.children = children

#Sentence Structure Leaf
class SSL():
    def __init__(self, label):
        self.label = label
        
simple_predicate = SST('ROOT', [SST('S', [SSL('NP'), SSL('VP'), SSL('.')])])

def satisfies_structure(parsed_tree, structure):
    if isinstance(structure, SSL):
        return parsed_tree.label() == structure.label
    else:
        if parsed_tree.label() != structure.label or len(parsed_tree) != len(structure.children): return False
        for i in range(len(parsed_tree)):
            if satisfies_structure(parsed_tree[i], structure.children[i]) == False:
                return False
        return True

In [None]:
parse_list = []
for sentence in sentences:
    if len(sentence) < 180:
        parse = next(parser.raw_parse(sentence))
        if satisfies_structure(parse, simple_predicate):
            print("=========================== Sentence ======================")
            print(parse)
#             print(parse.label())
            print(sentence) 
            print(binary_question_from_tree(parse))
            parse_list.append(parse)
            

    
    
parse.draw()

(ROOT
  (S
    (NP (NNP Gyarados))
    (VP
      (VBZ is)
      (VP
        (VBN voiced)
        (PP
          (IN by)
          (NP
            (NP (NNP Unshō) (NNP Ishizuka))
            (PP
              (IN in)
              (NP
                (DT both)
                (JJ Japanese)
                (CC and)
                (JJ English)
                (NNS media)))))))
    (. .)))
Gyarados is voiced by Unshō Ishizuka in both Japanese and English media.
Is Gyarados voiced by Unshō Ishizuka in both Japanese and English media?
(ROOT
  (S
    (NP (CD Two) (JJ different) (NNS Gyarados))
    (VP
      (VBP appear)
      (PP
        (IN in)
        (NP (DT the) (NNP Pokémon) (NNS Adventures) (NN manga))))
    (. .)))
Two different Gyarados appear in the Pokémon Adventures manga.
(VBP appear)
(ROOT
  (S
    (NP (CD One))
    (VP
      (VP
        (VBZ is)
        (ADVP (RB originally))
        (VP (VBN owned) (PP (IN by) (NP (NNP Misty)))))
      (, ,)
      (CC but)
      (VP
        (VB

(ROOT
  (S
    (NP (NNP Gyarados))
    (VP
      (VBZ is)
      (VP
        (VBN used)
        (PP
          (IN by)
          (NP
            (NP (JJ many) (JJ notable) (NNS trainers))
            (PP
              (JJ such)
              (IN as)
              (NP
                (NP
                  (NP
                    (NNP Blue)
                    (, ,)
                    (NNP Clair)
                    (, ,)
                    (NNP Lance)
                    (, ,)
                    (NNP Wallace)
                    (, ,)
                    (NNP Pike)
                    (NNP Queen)
                    (NNP Lucy))
                  (PRN
                    (, ,)
                    (S (NP (NNP Crasher)) (VP (VBP Wake)))
                    (, ,)))
                (CC and)
                (NP (NNP Cyrus))))))))
    (. .)))
Gyarados is used by many notable trainers such as Blue, Clair, Lance, Wallace, Pike Queen Lucy, Crasher Wake, and Cyrus.
Is Gyarados used by many notabl

In [None]:
server.stop()