# Question Generation

Purpose: Given an article output a list of sentences
1. Parse Article into sentences
2. From each sentence, generate Stanford dependency parse tree
3. From each parse tree, use rule based method to generate question from sentence.
4. Refine the sentences using language models.

### Article -> Sentences

In [1]:
import nltk

In [2]:
content = []
for i in range(1, 10):
    with open(f'./noun_counting_data/a{i}.txt', 'r') as f:
        content.append(f.read())

In [3]:
sentences = []
for file in content:
    sentences.extend(nltk.sent_tokenize(file))


### Sentences -> Parse Trees

In [4]:
from nltk.parse.corenlp import CoreNLPServer
from nltk.parse.corenlp import CoreNLPParser
from nltk.parse.corenlp import CoreNLPDependencyParser
import os
import requests

In [10]:
STANFORD = os.path.join("models", "stanford-corenlp-full-2018-10-05")

# Create the server
server = CoreNLPServer(
   os.path.join(STANFORD, "stanford-corenlp-3.9.2.jar"),
   os.path.join(STANFORD, "stanford-corenlp-3.9.2-models.jar"),    
)
server.start()

KeyboardInterrupt: 

In [5]:
requests.post('http://[::]:9000/?properties={"annotators":"tokenize,ssplit,pos","outputFormat":"json"}', data = {'data': "tmp"}).text

'{\n  "sentences": [\n    {\n      "index": 0,\n      "tokens": [\n        {\n          "index": 1,\n          "word": "data",\n          "originalText": "data",\n          "characterOffsetBegin": 0,\n          "characterOffsetEnd": 4,\n          "pos": "NN",\n          "before": "",\n          "after": ""\n        },\n        {\n          "index": 2,\n          "word": "=",\n          "originalText": "=",\n          "characterOffsetBegin": 4,\n          "characterOffsetEnd": 5,\n          "pos": "JJ",\n          "before": "",\n          "after": ""\n        },\n        {\n          "index": 3,\n          "word": "tmp",\n          "originalText": "tmp",\n          "characterOffsetBegin": 5,\n          "characterOffsetEnd": 8,\n          "pos": "NN",\n          "before": "",\n          "after": ""\n        }\n      ]\n    }\n  ]\n}\n'

Download Stanford Parser: https://nlp.stanford.edu/software/lex-parser.shtml#Download Version 3.9.2

In [33]:
from nltk.tree import Tree
parser = CoreNLPParser()

In [54]:
invertible_aux_verb = {'am', 'are', 'is', 'was', 'were', 'can', 'could', 'does', 'did', 'has', 'had', 'have', 'may', 'might',
                       'must', 'shall', 'should', 'will', 'would'}
def is_invertible(s):
    if isinstance(s, str):
        return s.lower() in invertible_aux_verb
    return False

def list_to_string(word_list):
    return ' '.join(word_list)

def tree_to_string(parsed_tree):
#     if isinstance(parsed_tree, str):
#         return parsed_tree
#     words = []
#     for subtree in parsed_tree:
#         words.append(tree_to_string(subtree))
    return list_to_string(parsed_tree.leaves())

def first(parsed_tree):
    if isinstance(parsed_tree[0], str):
        return parsed_tree
    return parsed_tree[0]

def binary_question_from_tree(parsed_tree):
    sentence = parsed_tree[0]
    assert(sentence.label() == 'S')
    np = sentence[0]
    vp = sentence[1]
    noun_label = first(np).label()
    #print("NL", noun_label)
    assert(np.label() == 'NP')
    assert(vp.label() == 'VP')
    #print(parsed_tree)
    if is_invertible(vp[0][0]) and noun_label in ['NNP', 'NNPS']:
        return list_to_string([vp[0][0].capitalize(), tree_to_string(np)] + list(map(lambda x : tree_to_string(x), vp[1:]))) + '?'
    return None

In [55]:
#Sentence Structure Tree
class SST():
    def __init__(self, label, children):
        self.label = label
        self.children = children

#Sentence Structure Leaf
class SSL():
    def __init__(self, label):
        self.label = label
        
simple_predicate = SST('ROOT', [SST('S', [SSL('NP'), SSL('VP'), SSL('.')])])

def satisfies_structure(parsed_tree, structure):
    if isinstance(structure, SSL):
        return parsed_tree.label() == structure.label
    else:
        if parsed_tree.label() != structure.label or len(parsed_tree) != len(structure.children): return False
        for i in range(len(parsed_tree)):
            if satisfies_structure(parsed_tree[i], structure.children[i]) == False:
                return False
        return True

In [59]:
parse_list = []
count = 50
for sentence in sentences:
    if len(sentence) < 180:
        parse = next(parser.raw_parse(sentence))
        if satisfies_structure(parse, simple_predicate) and binary_question_from_tree(parse):
            count -= 1
            if count == 0:
                break
            print("=========================== Sentence ======================")
            print("Sentence:", sentence)
            #print(parse)
#             print(parse.label())
            #print(sentence) 
            print("Question:", binary_question_from_tree(parse))
            parse_list.append(parse)
            

    
    
#parse.draw()

Sentence: Gyarados is voiced by Unshō Ishizuka in both Japanese and English media.
Question: Is Gyarados voiced by Unshō Ishizuka in both Japanese and English media?
Sentence: Gyarados has been described as both one of the most well known and most powerful Pokémon.
Question: Has Gyarados been described as both one of the most well known and most powerful Pokémon?
Sentence: Gyarados is a large sea serpent Pokémon most similar in appearance to dragons seen in Chinese mythology.
Question: Is Gyarados a large sea serpent Pokémon most similar in appearance to dragons seen in Chinese mythology?
Sentence: Gyarados is known for its fierce temper and wanton destructive tendencies.
Question: Is Gyarados known for its fierce temper and wanton destructive tendencies?
Sentence: Gyarados is used by many notable trainers such as Blue, Clair, Lance, Wallace, Pike Queen Lucy, Crasher Wake, and Cyrus.
Question: Is Gyarados used by many notable trainers such as Blue , Clair , Lance , Wallace , Pike Queen

Sentence: Meowth is one of the playable characters in the Pokémon Mystery Dungeon games.
Question: Is Meowth one of the playable characters in the Pokémon Mystery Dungeon games?
Sentence: Alolan Meowth were owned by Alolan royalty in the past, resulting in them having selfish and prideful attitudes, which caused their form to change.
Question: Were Alolan Meowth owned by Alolan royalty in the past , resulting in them having selfish and prideful attitudes , which caused their form to change?
Sentence: Meowth has appeared in the Pokémon Trading Card Game first in the Jungle series.
Question: Has Meowth appeared in the Pokémon Trading Card Game first in the Jungle series?
Sentence: Gengar is the most evolved of the three Ghost Pokémon in the First Generation.
Question: Is Gengar the most evolved of the three Ghost Pokémon in the First Generation?
Sentence: Gengar is the first of its evolutions to have hands and legs connected to its body.
Question: Is Gengar the first of its evolutions to

In [16]:
server.stop()