In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
import numpy as np
sess = tf.Session(config=config)

In [3]:
import qa_consistency
import qa_consistency.dataset_utils
import qa_consistency.implication
import os
import json
import pickle



# Example: generating implications

In [4]:
gen = qa_consistency.implication.ImplicationsSquad()

Did not use initialization regex that was passed: .*bias_hh.*
Did not use initialization regex that was passed: .*weight_ih.*
Did not use initialization regex that was passed: .*bias_ih.*
Did not use initialization regex that was passed: .*weight_hh.*


In [5]:
passage = 'Kublai originally named his eldest son, Zhenjin, as the Crown Prince, \
but he died before Kublai in 1285.'
gen.implications('When did Zhenjin die?', '1285', passage)

Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.  See documentation for `non_padded_namespaces` parameter in Vocabulary.


[('Who died in 1285?', 'Zhenjin', 'subj')]

# Generating implications for all possible squad answers on dev dataset. You can skip this and load my precomputed implications below.

In [10]:
data, answers = qa_consistency.dataset_utils.load_squad('/home/marcotcr/datasets/squad/')
answer_texts = [[x[0] for x in y] for y in answers]
questions = [x['question'].strip() for x in data]
passages = [x['passage'].strip() for x in data]

In [11]:
all_qs, all_as, all_contexts = qa_consistency.dataset_utils.question_answers_context_product(questions, answer_texts, passages)

In [14]:
parsed_dataset = gen.parse_dataset(all_qs, all_as, all_contexts)

Encountered the arc_loss key in the model's return dictionary which couldn't be split by the batch size. Key will be ignored.
Encountered the tag_loss key in the model's return dictionary which couldn't be split by the batch size. Key will be ignored.
Encountered the loss key in the model's return dictionary which couldn't be split by the batch size. Key will be ignored.


Error in  Q: What light radiation does ozone absorb?
A: ultraviolet


In [16]:
implications = [gen.implications_from_parsed(x) for x in parsed_dataset]

In [18]:
idxs = np.random.choice(len(implications), 5, replace=False)
for i in idxs:
    if not implications[i]:
        continue
    print(i, parsed_dataset[i])
    for x in implications[i]:
        print(x)
    print()

8754 Q: J. A. Hobson wanted which races to develop the world?
A: of highest 'social efficiency
D: J. A. Hobson wanted of highest 'social efficiency to develop the world.
('Who wanted of highest social efficiency to develop the world?', 'J. A. Hobson', 'subj')

12665 Q: What was the name of the contest sponsored by QuickBooks?
A: Small Business Big Game
D: The name of the contest sponsored by QuickBooks was Small Business Big Game .
('The name of the contest sponsored by what was Small Business Big Game ?', 'QuickBooks', 'what')

14166 Q: What was the Disneyland anthology series retitled in 1958?
A: Walt Disney Presents
D: The Disneyland anthology series was retitled in Walt Disney Presents 1958.
('What was retitled in Walt Disney Presents 1958?', 'Disneyland anthology series', 'subj')



In [54]:
output_folder = '/home/marcotcr/tmp/'
squad_path = '/home/marcotcr/datasets/squad/dev-v1.1.json'

In [56]:
all_imps = {}
for qa, imp in zip(parsed_dataset, implications):
    all_imps[qa.as_tuple()] = imp
pickle.dump(all_imps, open(os.path.join(output_folder, "squad_imps.pkl"), 'wb'))

# Start from here if you want to use precomputed implications (link to pkl file in the repository's README)

In [54]:
output_folder = '/home/marcotcr/tmp/'
squad_path = '/home/marcotcr/datasets/squad/dev-v1.1.json'

In [98]:
all_imps = pickle.load(open(os.path.join(output_folder, "squad_imps.pkl"), 'rb'))

In [73]:
squad_jsonl_path = os.path.join(output_folder, 'dev_allennlp.jsonl')
squad_preds_jsonl = os.path.join(output_folder, 'squad_dev_preds.jsonl')
squad_preds_json = os.path.join(output_folder, 'squad_dev_preds.json')
squad_consistency_path  = os.path.join(output_folder, 'squad_consistency.json')
squad_consistency_jsonl  = os.path.join(output_folder, 'squad_consistency.jsonl')
squad_consistency_preds_jsonl = os.path.join(output_folder, 'squad_consistency_preds.jsonl')
squad_consistency_preds_json = os.path.join(output_folder, 'squad_consistency_preds.json')

We are using allennlp, so we need to convert the original dataset to a jsonl file before getting predictions

In [58]:
qa_consistency.dataset_utils.squad_to_allennlp(squad_path, squad_jsonl_path)

In [76]:
print('Run:')
print('allennlp predict https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz\
 %s --output-file /home/marcotcr/tmp/%s --cuda-device 0 --silent' % (squad_jsonl_path, squad_preds_jsonl))


Run:
allennlp predict https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz /home/marcotcr/tmp/dev_allennlp.jsonl --output-file /home/marcotcr/tmp//home/marcotcr/tmp/squad_dev_preds.jsonl --cuda-device 0 --silent


Transform allennlp output to the official squad output format:

In [51]:
qa_consistency.dataset_utils.allennlp_preds_to_squad_format(squad_jsonl_path, squad_preds_jsonl, squad_preds_json)

Generate a consistency dataset to check exact match predictions:

In [79]:
qa_consistency.dataset_utils.generate_implication_squad(squad_path,
                                                        squad_preds_json,
                                                        all_imps,
                                                        squad_consistency_path)

Convert datset to jsonl (for allennlp):

In [72]:
qa_consistency.dataset_utils.squad_to_allennlp(squad_consistency_path, squad_consistency_jsonl)

In [75]:
print('Run:')
print('allennlp predict https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz\
 %s --output-file /home/marcotcr/tmp/%s --cuda-device 0 --silent' % (squad_consistency_jsonl, squad_consistency_preds_jsonl))

Run:
allennlp predict https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz /home/marcotcr/tmp/squad_consistency.jsonl --output-file /home/marcotcr/tmp//home/marcotcr/tmp/squad_consistency_preds.jsonl --cuda-device 0 --silent


Convert allennlp output to squad format:

In [82]:
qa_consistency.dataset_utils.allennlp_preds_to_squad_format(squad_consistency_jsonl, squad_consistency_preds_jsonl, squad_consistency_preds_json)

Evaluate consistency:

In [97]:
stats = qa_consistency.dataset_utils.evaluate_consistency_squad(squad_consistency_path, squad_consistency_preds_json)
print('Consistency by implication type:')
print()
for x, v in stats.items():
    if x == 'all':
        continue
    print('%s : %.1f' % (x, 100* v))
print()
print('Avg  : %.1f' % (100 * stats['all']))

Consistency by implication type:

prep : 74.0
dobj : 68.0
subj : 70.5
amod : 75.1

Avg  : 72.9
