<a href="https://colab.research.google.com/github/sramakrishnan247/Sentence-Similarity/blob/main/ReadingComprehensionGame.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reading Comprehension
This is an interactive game where a user needs to enter a answer for a given question after reading a paragraph. This is a good way to improve your reading comprehension skills.
This game uses a fine-tuned BERT based model that has been trained on the SNLI corpus to compute the semantic similarity. The paragraphs, questions and answers are ranomly generated using SQuAD. There is some boilerplate code that loads the weights, model,etc and the last cell has the actual game implementation.


In [17]:
!pip install transformers

import numpy as np
import pandas as pd
import tensorflow as tf
from transformers import BertForQuestionAnswering
from transformers import BertTokenizer
import transformers
import random
import json
import random
from pprint import pprint



# Setup and installation

### Download the pretrained weights stored in drive

In [18]:
#Taken from https://github.com/nsadawi/Download-Large-File-From-Google-Drive-Using-Python
#taken from this StackOverflow answer: https://stackoverflow.com/a/39225039
import requests

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

file_id = '1--jrge8I9VvfeOUuYgJ2AxpkUbKG8jiK'
destination = 'weights.h5'
download_file_from_google_drive(file_id, destination)

### Create Model 

In [19]:
#Creating the model...
def create_pretrained_model():
    max_length = 128
    # Encoded token ids from BERT tokenizer.
    input_ids = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="input_ids"
    )
    # Attention masks indicates to the model which tokens should be attended to.
    attention_masks = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="attention_masks"
    )
    # Token type ids are binary masks identifying different sequences in the model.
    token_type_ids = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="token_type_ids"
    )

    # Loading pretrained BERT model.
    bert_model = transformers.TFBertModel.from_pretrained("bert-base-uncased")
    # Freeze the BERT model to reuse the pretrained features without modifying them.
    bert_model.trainable = False

    sequence_output, pooled_output = bert_model(
      input_ids, attention_mask=attention_masks, token_type_ids=token_type_ids
    )

    # Add trainable layers on top of frozen layers to adapt the pretrained features on the new data.
    bi_lstm = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(64, return_sequences=True)
    )(sequence_output)

    # Applying hybrid pooling approach to bi_lstm sequence output.
    avg_pool = tf.keras.layers.GlobalAveragePooling1D()(bi_lstm)
    max_pool = tf.keras.layers.GlobalMaxPooling1D()(bi_lstm)
    concat = tf.keras.layers.concatenate([avg_pool, max_pool])
    dropout = tf.keras.layers.Dropout(0.3)(concat)


    # sequence_output = tf.keras.layers.Flatten()(sequence_output)
    output = tf.keras.layers.Dense(3, activation="softmax")(dropout)
    model = tf.keras.models.Model(
        inputs=[input_ids, attention_masks, token_type_ids], outputs=output
    )

    model.compile(
        optimizer=tf.keras.optimizers.Adam(),
        loss="categorical_crossentropy",
        metrics=["acc"],
    )
    return model

### Load model weights

In [20]:
model = create_pretrained_model()
model.load_weights('weights.h5')

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [21]:
model.summary()

Model: "functional_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_ids (InputLayer)          [(None, 128)]        0                                            
__________________________________________________________________________________________________
attention_masks (InputLayer)    [(None, 128)]        0                                            
__________________________________________________________________________________________________
token_type_ids (InputLayer)     [(None, 128)]        0                                            
__________________________________________________________________________________________________
tf_bert_model_1 (TFBertModel)   ((None, 128, 768), ( 109482240   input_ids[0][0]                  
                                                                 attention_masks[0][0] 

## Similarity Checking 

In [22]:
tokenizer = transformers.BertTokenizer.from_pretrained(
          "bert-base-uncased", do_lower_case=True
      )

def is_similar(sentence1, sentence2):
    '''
    Takes a sentence1 and checks if sentence2 is symantically similar to sentence1.
    '''
    max_length = 128

    sent = [sentence1,sentence2]
    
    encoded = tokenizer([sent], return_tensors='pt',add_special_tokens=True,
            max_length=max_length,
            return_attention_mask=True,
            return_token_type_ids=True,
            padding='max_length',
            )

    input_ids = np.array(encoded["input_ids"], dtype="int32")
    attention_masks = np.array(encoded["attention_mask"], dtype="int32")
    token_type_ids = np.array(encoded["token_type_ids"], dtype="int32")

    x_train = [input_ids, attention_masks, token_type_ids]
    # y_train = tf.keras.utils.to_categorical(train_df[0].label, num_classes=3)

    y_pred = np.array(model.predict(x_train))[0]
    # print(y_pred)
    idx = np.argmax(y_pred)
    sentiment_labels = ["contradiction", "entailment", "neutral"]
    return (sentiment_labels[idx], y_pred[idx])


## Load SQuAD for generating paras, questions and answers 

In [25]:
!curl -LO https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
with open('train-v2.0.json') as f:
  dataset = json.load(f)
# dataset

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 40.1M  100 40.1M    0     0  52.4M      0 --:--:-- --:--:-- --:--:-- 52.4M


# GAME 
Answer the question after reading the paragraph. The game will keep on running in an infinite loop. If you want to stop playing, hit N after answering the question.

In [24]:
while(True):
    index = random.randint(0,442)
    group = dataset['data'][index]

    index = random.randint(0,len(group))
    paragraph = group['paragraphs'][index]
    context = paragraph['context']
    question = paragraph['qas'][index]['question']
    item = paragraph['qas'][index]
    if item['is_impossible']:
      answer = paragraph['qas'][index]['plausible_answers'][0]['text']
    else:
      answer = paragraph['qas'][index]['answers'][0]['text']

    print('Read the following paragraph')
    print()
    pprint(context, width = 95)

    print()
    print('Answer the following')
    print()
    print(question)

    user_answer = input()
    print()
    sentiment, similarity = is_similar(answer,user_answer)
    if sentiment == 'contradiction':
      print('Incorrect answer!!')
      print('Correct answer is: ')
      print(answer)
    elif sentiment == 'entailment':
      print('Correct answer!!')
      # print('Your answer is ', str(similarity), ' accurate!')
    else:
      print('Your answer is ', str(similarity), ' accurate!')
      print('Correct answer: ')
      print(answer)
    print('Do you want to coninue: Y/N?')
    continue_game = input()
    if continue_game not in ['Y','y','yes','Yes']:
      break
    print()


Read the following paragraph

('In 1977, Gaddafi dissolved the Republic and created a new socialist state, the Jamahiriya '
 '("state of the masses"). Officially adopting a symbolic role in governance, he retained '
 'power as military commander-in-chief and head of the Revolutionary Committees responsible '
 'for policing and suppressing opponents. Overseeing unsuccessful border conflicts with Egypt '
 "and Chad, Gaddafi's support for foreign militants and alleged responsibility for the "
 'Lockerbie bombing led to Libya\'s label of "international pariah". A particularly hostile '
 'relationship developed with the United States and United Kingdom, resulting in the 1986 '
 'U.S. bombing of Libya and United Nations-imposed economic sanctions. Rejecting his earlier '
 'ideological commitments, from 1999 Gaddafi encouraged economic privatization and sought '
 'rapprochement with Western nations, also embracing Pan-Africanism and helping to establish '
 'the African Union. Amid the Arab Sp