# Test Everything, Carefully: Generating Questions from Text
## GA DSI 26 Capstone Project
## Chapter 2: T5 Model

In this notebook, we will be using pytorch and the hugging face libraries to fine-tune a T5 model and implementing it for our task at hand. 

In [1]:
import numpy as np
import pandas as pd

# modeling
import torch
from torch.utils.data import Dataset, DataLoader
import pytorch_lightning as pl
from pytorch_lightning import Trainer, seed_everything
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.callbacks.early_stopping import EarlyStopping
from transformers import (
    T5ForConditionalGeneration,
    T5TokenizerFast as T5Tokenizer,
    )
from transformers.optimization import Adafactor

# aesthetics
from IPython.display import Markdown, display, clear_output
import re
import warnings
warnings.filterwarnings(
    "ignore", ".*Trying to infer the `batch_size` from an ambiguous collection.*"
)
seed_everything(25429)

# scoring
import spacy
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

Global seed set to 25429


In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

# Unpickle Dataset

In [3]:
train = pd.read_pickle('pickles/train.pkl')
val = pd.read_pickle('pickles/test.pkl')
train.head()

Unnamed: 0,title,context,question,answer
0,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",To whom did the Virgin Mary allegedly appear i...,Saint Bernadette Soubirous
1,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",What is in front of the Notre Dame Main Building?,a copper statue of Christ
2,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",The Basilica of the Sacred heart at Notre Dame...,the Main Building
3,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",What is the Grotto at Notre Dame?,a Marian place of prayer and reflection
4,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",What sits on top of the Main Building at Notre...,a golden statue of the Virgin Mary


# Data Preprocessing

The T5TokenizerFast splits the text into unigrams, and adds additional end-of-sentence token, out-of-vocab token, and padding tokens.

In [4]:
hug = 't5-small'
t5tokenizer = T5Tokenizer.from_pretrained(hug)
t5model = T5ForConditionalGeneration.from_pretrained(hug, return_dict=True)

We can define a separation token to split the context and answer. We will also add a random mask for the answers for the answer-agnostic question generation model. The masking chance will be a hyperparameter that balances between the accuracy of the masked answers with the unmasked ones.

In [5]:
SEP_TOKEN = '<sep>'
MASK_TOKEN = '[MASK]'
MASKING_CHANCE = 0.1

In [6]:
class DataEncodings(Dataset):
    '''
    tokenizes, pads, and adds special tokens
    '''
    def __init__(
        self,
        data: pd.DataFrame,
        tokenizer,
        source_max_token_len: int,
        target_max_token_len: int
        ):
        self.tokenizer = t5tokenizer
        self.data = data
        self.source_max_token_len = source_max_token_len
        self.target_max_token_len = target_max_token_len

    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index:int):
        data_row = self.data.iloc[index]
        # adds a random mask for answer-agnostic qg
        if np.random.rand() > MASKING_CHANCE:
            answer = data_row['answer']
        else:
            answer = MASK_TOKEN
            
        source_encoding = t5tokenizer(
            f"{answer} {SEP_TOKEN} {data_row['context']}",
            max_length= self.source_max_token_len,
            padding='max_length',
            truncation= True,
            return_attention_mask=True,
            return_tensors='pt'
            )
    
        target_encoding = t5tokenizer(
            f"{data_row['answer']} {SEP_TOKEN} {data_row['question']}",
            max_length=self.target_max_token_len,
            padding='max_length',
            truncation = True,
            return_attention_mask=True,
            return_tensors='pt'
            )

        labels = target_encoding['input_ids']  
        labels[labels == 0] = -100 # masked

        encodings = dict(
            answer = data_row['answer'],
            context = data_row['context'],
            question = data_row['question'],
            input_ids = source_encoding['input_ids'].flatten(),
            attention_mask = source_encoding['attention_mask'].flatten(),
            labels=labels.flatten()
        )
        
        return encodings

We will also define a data module to split the data into batches to train our model to prevent out-of-memory errors. 

In [7]:
class DataModule(pl.LightningDataModule):

    def __init__(
        self,
        train: pd.DataFrame,
        val: pd.DataFrame,
        tokenizer,
        batch_size,
        source_max_token_len: int,
        target_max_token_len: int
        ): 
        super().__init__()
        self.batch_size = batch_size
        self.train = train
        self.val = val
        self.tokenizer = t5tokenizer
        self.source_max_token_len = source_max_token_len
        self.target_max_token_len = target_max_token_len

    def setup(self):
        self.train_dataset = DataEncodings(self.train, self.tokenizer, self.source_max_token_len, self.target_max_token_len)
        self.val_dataset = DataEncodings(self.val, self.tokenizer, self.source_max_token_len, self.target_max_token_len)

    def train_dataloader(self):
        return DataLoader(self.train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)

    def val_dataloader(self): 
        return DataLoader(self.val_dataset, batch_size=batch_size, num_workers=0)

In [8]:
data_module = DataModule(train, val, t5tokenizer, batch_size=32, source_max_token_len=128, target_max_token_len=64)
data_module.setup()

# Modeling

The [pytorch lightning](https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html?highlight=training_step#training-step) library simplifies the model training process. We will be using a T5-small model with an Adafactor optimizer, as [used by the original T5 paper](https://discuss.huggingface.co/t/t5-finetuning-tips/684/3). We will also choose to optimize the [cross-entropy loss](https://discuss.huggingface.co/t/what-is-loss-function-for-t5/11669) for the model.

In [9]:
# hyperparameters
num_epochs = 16
batch_size = 32
learning_rate = 0.001

In [10]:
# model 
class T5Model(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = t5model
        self.model.resize_token_embeddings(len(t5tokenizer)) # resizing after adding new tokens to the tokenizer

    # feed forward pass
    def forward(self, input_ids, attention_mask, labels=None):
        output = self.model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        return output.loss, output.logits

    # train model and compute loss
    def training_step(self, batch, batch_idx):
        input_ids = batch['input_ids']
        attention_mask = batch['attention_mask']
        labels = batch['labels']
        loss, output = self(input_ids, attention_mask, labels)
        self.log('train_loss', loss, prog_bar=True, logger=True, batch_size=batch_size)
        return loss

    # gets model predictions, returns loss
    def validation_step(self, batch, batch_idx):
        input_ids = batch['input_ids']
        attention_mask = batch['attention_mask']
        labels = batch['labels']
        loss, output = self(input_ids, attention_mask, labels)
        self.log('val_loss', loss, prog_bar=True, logger=True, batch_size=batch_size)
        return {'val loss': loss}
    
    # def validation_epoch_end(self, outputs):
    #     # outputs = list of dictionaries to print loss
    #     avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
    #     tensorboard_logs = {'avg_val_loss': avg_loss}
    #     return {'val_loss': avg_loss, 'log': tensorboard_logs}

    def configure_optimizers(self):
        return Adafactor(model.parameters(), scale_parameter=False, relative_step=False, lr=learning_rate) 

In [11]:
model = T5Model().to(device)

## Model Architecture

As we examine the model architecture, we can choose to freeze/unfreeze layers in the process of optimizing our model, but it is still currently beyond the scope of this study.

In [12]:
model.named_parameters

<bound method Module.named_parameters of T5Model(
  (model): T5ForConditionalGeneration(
    (shared): Embedding(32100, 512)
    (encoder): T5Stack(
      (embed_tokens): Embedding(32100, 512)
      (block): ModuleList(
        (0): T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=512, out_features=512, bias=False)
                (k): Linear(in_features=512, out_features=512, bias=False)
                (v): Linear(in_features=512, out_features=512, bias=False)
                (o): Linear(in_features=512, out_features=512, bias=False)
                (relative_attention_bias): Embedding(32, 8)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerFF(
              (DenseReluDense): T5DenseReluDense(
                (wi): Linear(in_features=512, out_features=2048, bias=False)
  

In [13]:
# for name, param in model.named_parameters():
#     if name in ['Linear']:
#         print(param.size())

In [14]:
# # to freeze layers in the vanilla model
# for name, param in model.named_parameters():
#      if name == "...":
#         param.requires_grad = False

# Model Training

Pytorch lightning allows us to save checkpoints and picks out the one with the lowest loss function. We can also implement an early stopping function if the loss score keeps rising after 3 consecutive epochs.

In [12]:
# saving model checkpoints, minimizes val loss
callback = ModelCheckpoint(
    dirpath="checkpoints",
    filename="t5-chkpt",
    save_top_k=-1,
    verbose=True,
    monitor="val_loss",
    mode="min",
)

trainer = Trainer(
    # fast_dev_run=True,
    callbacks=[callback, EarlyStopping(monitor="val_loss")],
    max_epochs=num_epochs,
    gpus=1,
    auto_lr_find=True,
    deterministic=True,
    log_every_n_steps=5
)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs


In [16]:
%%time

if __name__ == "__main__":
    model = T5Model()
    trainer.fit(model, data_module)

  rank_zero_deprecation(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type                       | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 60.5 M
-----------------------------------------------------
60.5 M    Trainable params
0         Non-trainable params
60.5 M    Total params
241.969   Total estimated model params size (MB)
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")


Validation sanity check: 0it [00:00, ?it/s]

  rank_zero_warn(
Global seed set to 25429
  rank_zero_warn(


Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Epoch 0, global step 2737: val_loss reached 1.25404 (best 1.25404), saving model to "C:\Users\goeis\Documents\GA Stuff\DSI-working-folder\QG-System\checkpoints\t5-chkpt-v6.ckpt" as top 1


Validating: 0it [00:00, ?it/s]

Epoch 1, global step 5475: val_loss reached 1.23136 (best 1.23136), saving model to "C:\Users\goeis\Documents\GA Stuff\DSI-working-folder\QG-System\checkpoints\t5-chkpt-v7.ckpt" as top 2


Validating: 0it [00:00, ?it/s]

Epoch 2, global step 8213: val_loss reached 1.23399 (best 1.23136), saving model to "C:\Users\goeis\Documents\GA Stuff\DSI-working-folder\QG-System\checkpoints\t5-chkpt-v8.ckpt" as top 3


Validating: 0it [00:00, ?it/s]

Epoch 3, global step 10951: val_loss reached 1.24462 (best 1.23136), saving model to "C:\Users\goeis\Documents\GA Stuff\DSI-working-folder\QG-System\checkpoints\t5-chkpt-v9.ckpt" as top 4


Validating: 0it [00:00, ?it/s]

Epoch 4, global step 13689: val_loss reached 1.24307 (best 1.23136), saving model to "C:\Users\goeis\Documents\GA Stuff\DSI-working-folder\QG-System\checkpoints\t5-chkpt-v10.ckpt" as top 5


Wall time: 1h 2min 6s


In [17]:
print(callback.best_model_score)
print(callback.best_model_path)

tensor(1.2314, device='cuda:0')
C:\Users\goeis\Documents\GA Stuff\DSI-working-folder\QG-System\checkpoints\t5-chkpt-v7.ckpt


# Model Scoring

We use the tensorboard integration with pytorch to view how our model was trained and observe how the loss function changed over the epochs.

In [18]:
%load_ext tensorboard

In [19]:
%tensorboard --logdir lightning_logs/ --host localhost

Reusing TensorBoard on port 6006 (pid 3456), started 4 days, 3:38:24 ago. (Use '!kill 3456' to kill it.)

# Viewing Results

After training our model, it is time to view the results. In the process of decoding the model, there are a few important hyperparameters we can tune:
- beam search: considers the words around the predicted text when predicting the next word. computationally expensive but gives better results
- repetition/length penalty: adds weights that penalizes answers being too long or having repeated words
- [temperature](https://github.com/huggingface/transformers/issues/2029): "controls the randomness of predictions by scaling the logits before applying softmax", which controls the "conservativeness" of the model in prediction

As a general principle, we tried to optimize the hyperparameters so that there would be greater generalizability of the model - such that more correct question-answer pairs can be generated for unseen data. We will then be scoring our models using the BLEU score from `nltk` and the cosine similarity score from `spaCy`. 

In [13]:
# best_model_dir = 'checkpoints/t5-chkpt-v2.ckpt'
# best_model = model.load_from_checkpoint(best_model_dir)

best_model = model.load_from_checkpoint(callback.best_model_path)
best_model.freeze()
# best_model.eval()

In [99]:
def generate(model: T5Model, answer:str, context:str) -> str:
    source_encoding = t5tokenizer(
        f"{answer} {SEP_TOKEN} {context}",
        max_length=512,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        add_special_tokens=True,
        return_tensors='pt'
    )

    generated_ids=model.model.generate(
        input_ids=source_encoding['input_ids'],
        attention_mask=source_encoding['attention_mask'],
        num_beams=20,
        max_length=126,
        repetition_penalty=2.5,
        length_penalty=0.8,
        temperature=0.6,
        early_stopping=True,
        use_cache=True
    )

    preds = {
        t5tokenizer.decode(generated_id, skip_special_tokens=False, clean_up_tokenization_spaces=True)
        for generated_id in generated_ids
    }

    return ''.join(preds)

In [15]:
is_using_gpu = spacy.require_gpu()
if is_using_gpu:
    torch.set_default_tensor_type("torch.cuda.FloatTensor")
    print('spaCy is using GPU!')
else:
    print('spaCy is not using GPU...')
    
nlp = spacy.load("en_core_web_md")

spaCy is using GPU!


In [45]:
def printBold(string):
    display(Markdown('**' + string + '**'))


def show_result(generated:str, answer:str, context:str, original_question:str=''):

    regex = r"(?<=>)(.*?)(?=<)"
    matches = re.findall(regex, generated)
    matches[1] = matches[1][5:]
    final = {cat: match.strip() for cat, match in zip(['Answer', 'Question'], matches)}
    
    printBold('Context')
    print(context)
    printBold('Answer')
    print(answer)
    printBold('Generated Answer/Question')
    print(final)
    if original_question:
        printBold('Original Question')
        print(original_question)
        gen = nlp(matches[1])
        ori = nlp(original_question)
        bleu_score = sentence_bleu(matches[1], original_question, smoothing_function=SmoothingFunction().method5)
        cs_score = ori.similarity(gen)
        printBold('Scores')
        print(f"BLEU: {bleu_score}")
        print(f'Cosine Similarity: {cs_score}')
        return bleu_score, cs_score

## Validation data

We will start with 5 sample questions from the validation set to score the model predictions. We will then provide 5 passages from different contexts to see how the model performs.

In [17]:
model.to(device);

### Unmasked

In [79]:
avg_bleu, avg_cs = [], []
np.random.seed(12)
for i in range(5):
    printBold(f"Question {i+1}")
    sample_question = val.iloc[np.random.randint(len(val)-1)]
    generated = generate(best_model, sample_question['answer'], sample_question['context'])
    blue, cs = show_result(generated, sample_question['answer'], sample_question['context'], sample_question['question'])
    avg_bleu.append(blue) # to avoid clashes
    avg_cs.append(cs)
    print('*'*89)

**Question 1**

**Context**

With International Criminal Court trial dates in 2013 for both President Kenyatta and Deputy President William Ruto related to the 2007 election aftermath, US President Barack Obama chose not to visit the country during his mid-2013 African trip. Later in the summer, Kenyatta visited China at the invitation of President Xi Jinping after a stop in Russia and not having visited the United States as president. In July 2015 Obama visited Kenya, as the first American president to visit the country while in office.


**Answer**

US President Barack Obama


**Generated Answer/Question**

{'Answer': 'US President Barack Obama', 'Question': 'Who chose not to visit Kenya during his 2013 African trip?'}


**Original Question**

Who decided not to come visit the country in 2013?


**Scores**

BLEU: 0.11547005383792518
Cosine Similarity: 0.9212935566902161
**************************************************************************************************************************************


**Question 2**

**Context**

One of the first known experiments on the relationship between combustion and air was conducted by the 2nd century BCE Greek writer on mechanics, Philo of Byzantium. In his work Pneumatica, Philo observed that inverting a vessel over a burning candle and surrounding the vessel's neck with water resulted in some water rising into the neck. Philo incorrectly surmised that parts of the air in the vessel were converted into the classical element fire and thus were able to escape through pores in the glass. Many centuries later Leonardo da Vinci built on Philo's work by observing that a portion of air is consumed during combustion and respiration.


**Answer**

fire


**Generated Answer/Question**

{'Answer': 'fire', 'Question': 'What did Philo believe parts of the vessel were converted into?'}


**Original Question**

What did Philo incorrectly assume that the air became?


**Scores**

BLEU: 0.10454078948289519
Cosine Similarity: 0.9039292335510254
**************************************************************************************************************************************


**Question 3**

**Context**

During the mid-Eocene, it is believed that the drainage basin of the Amazon was split along the middle of the continent by the Purus Arch. Water on the eastern side flowed toward the Atlantic, while to the west water flowed toward the Pacific across the Amazonas Basin. As the Andes Mountains rose, however, a large basin was created that enclosed a lake; now known as the Solimões Basin. Within the last 5–10 million years, this accumulating water broke through the Purus Arch, joining the easterly flow toward the Atlantic.


**Answer**

During the mid-Eocene, it is believed that the drainage basin of the Amazon was split along the middle of the continent by the Purus Arch.


**Generated Answer/Question**

{'Answer': 'During the mid-Eocene, it is believed that the drainage basin of the Amazon was split along the middle of the continent by the Purus Arch.', 'Question': 'What happened to the drainage basin?'}


**Original Question**

In which point did the drainage basin of the Amazon split?


**Scores**

BLEU: 0.09733107986338517
Cosine Similarity: 0.9272109866142273
**************************************************************************************************************************************


**Question 4**

**Context**

Pope Leo X was used to reformers and heretics, and he responded slowly, "with great care as is proper." Over the next three years he deployed a series of papal theologians and envoys against Luther, which served only to harden the reformer's anti-papal theology. First, the Dominican theologian Sylvester Mazzolini drafted a heresy case against Luther, whom Leo then summoned to Rome. The Elector Frederick persuaded the pope to have Luther examined at Augsburg, where the Imperial Diet was held. There, in October 1518, under questioning by papal legate Cardinal Cajetan Luther stated that he did not consider the papacy part of the biblical Church because historistical interpretation of Bible prophecy concluded that the papacy was the Antichrist. The prophecies concerning the Antichrist soon became the center of controversy. The hearings degenerated into a shouting match. More than his writing the 95 Theses, Luther's confrontation with the church cast him as an enemy of the pope. Cajetan's o

**Answer**

arrest


**Generated Answer/Question**

{'Answer': 'arrest', 'Question': "What did Cajetan's original instructions consist of?"}


**Original Question**

What were the papal legate's orders from the Pope?


**Scores**

BLEU: 0.10264004785593345
Cosine Similarity: 0.7661088109016418
**************************************************************************************************************************************


**Question 5**

**Context**

The smaller galleries cover Korea, the Himalayan kingdoms and South East Asia. Korean displays include green-glazed ceramics, silk embroideries from officials' robes and gleaming boxes inlaid with mother-of-pearl made between 500 AD and 2000. Himalayan items include important early Nepalese bronze sculptures, repoussé work and embroidery. Tibetan art from the 14th to the 19th century is represented by notable 14th- and 15th-century religious images in wood and bronze, scroll paintings and ritual objects. Art from Thailand, Burma, Cambodia, Indonesia and Sri Lanka in gold, silver, bronze, stone, terracotta and ivory represents these rich and complex cultures, the displays span the 6th to 19th centuries. Refined Hindu and Buddhist sculptures reflect the influence of India; items on show include betel-nut cutters, ivory combs and bronze palanquin hooks.


**Answer**

Sri Lanka


**Generated Answer/Question**

{'Answer': 'Sri Lanka', 'Question': 'In gold, silver, bronze, stone, terracotta and ivory, what other country has art from?'}


**Original Question**

Which South Asian island nation is represented in the V&A collection?


**Scores**

BLEU: 0.09204134726211426
Cosine Similarity: 0.7634811401367188
**************************************************************************************************************************************


In [52]:
printBold('Average BLEU Score')
print(f"{sum(avg_bleu)/len(avg_bleu)}")
printBold('Average Cosine Similarity Score')
print(f"{sum(avg_cs)/len(avg_cs)}")

**Average BLEU Score**

0.10240466366045065


**Average Cosine Similarity Score**

0.8564047455787659


### Masked

In [86]:
avg_bleu, avg_cs = [], []
np.random.seed(12)
for i in range(5):
    printBold(f"Question {i+1}")
    sample_question = val.iloc[np.random.randint(len(val)-1)]
    generated = generate(best_model, '[MASK]', sample_question['context'])
    blue, cs = show_result(generated, '[MASK]', sample_question['context'], sample_question['question'])
    avg_bleu.append(blue) # to avoid clashes
    avg_cs.append(cs)
    print('**************************************************************************************************************************************')

**Question 1**

**Context**

With International Criminal Court trial dates in 2013 for both President Kenyatta and Deputy President William Ruto related to the 2007 election aftermath, US President Barack Obama chose not to visit the country during his mid-2013 African trip. Later in the summer, Kenyatta visited China at the invitation of President Xi Jinping after a stop in Russia and not having visited the United States as president. In July 2015 Obama visited Kenya, as the first American president to visit the country while in office.


**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'President Xi Jinping', 'Question': 'Who was the first American president to visit Kenya?'}


**Original Question**

Who decided not to come visit the country in 2013?


**Scores**

BLEU: 0.1052060490523318
Cosine Similarity: 0.8921566009521484
**************************************************************************************************************************************


**Question 2**

**Context**

One of the first known experiments on the relationship between combustion and air was conducted by the 2nd century BCE Greek writer on mechanics, Philo of Byzantium. In his work Pneumatica, Philo observed that inverting a vessel over a burning candle and surrounding the vessel's neck with water resulted in some water rising into the neck. Philo incorrectly surmised that parts of the air in the vessel were converted into the classical element fire and thus were able to escape through pores in the glass. Many centuries later Leonardo da Vinci built on Philo's work by observing that a portion of air is consumed during combustion and respiration.


**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'Philo of Byzantium', 'Question': 'Who conducted one of the first known experiments on the relationship between combustion and air?'}


**Original Question**

What did Philo incorrectly assume that the air became?


**Scores**

BLEU: 0.10691671651659736
Cosine Similarity: 0.824653148651123
**************************************************************************************************************************************


**Question 3**

**Context**

During the mid-Eocene, it is believed that the drainage basin of the Amazon was split along the middle of the continent by the Purus Arch. Water on the eastern side flowed toward the Atlantic, while to the west water flowed toward the Pacific across the Amazonas Basin. As the Andes Mountains rose, however, a large basin was created that enclosed a lake; now known as the Solimões Basin. Within the last 5–10 million years, this accumulating water broke through the Purus Arch, joining the easterly flow toward the Atlantic.


**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'mid-Eocene', 'Question': 'When was the drainage basin of the Amazon split?'}


**Original Question**

In which point did the drainage basin of the Amazon split?


**Scores**

BLEU: 0.11060349984475588
Cosine Similarity: 0.9798660278320312
**************************************************************************************************************************************


**Question 4**

**Context**

Pope Leo X was used to reformers and heretics, and he responded slowly, "with great care as is proper." Over the next three years he deployed a series of papal theologians and envoys against Luther, which served only to harden the reformer's anti-papal theology. First, the Dominican theologian Sylvester Mazzolini drafted a heresy case against Luther, whom Leo then summoned to Rome. The Elector Frederick persuaded the pope to have Luther examined at Augsburg, where the Imperial Diet was held. There, in October 1518, under questioning by papal legate Cardinal Cajetan Luther stated that he did not consider the papacy part of the biblical Church because historistical interpretation of Bible prophecy concluded that the papacy was the Antichrist. The prophecies concerning the Antichrist soon became the center of controversy. The hearings degenerated into a shouting match. More than his writing the 95 Theses, Luther's confrontation with the church cast him as an enemy of the pope. Cajetan's o

**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'Pope Leo X', 'Question': 'Who was used to reformers and heretics?'}


**Original Question**

What were the papal legate's orders from the Pope?


**Scores**

BLEU: 0.10007404665953515
Cosine Similarity: 0.744892418384552
**************************************************************************************************************************************


**Question 5**

**Context**

The smaller galleries cover Korea, the Himalayan kingdoms and South East Asia. Korean displays include green-glazed ceramics, silk embroideries from officials' robes and gleaming boxes inlaid with mother-of-pearl made between 500 AD and 2000. Himalayan items include important early Nepalese bronze sculptures, repoussé work and embroidery. Tibetan art from the 14th to the 19th century is represented by notable 14th- and 15th-century religious images in wood and bronze, scroll paintings and ritual objects. Art from Thailand, Burma, Cambodia, Indonesia and Sri Lanka in gold, silver, bronze, stone, terracotta and ivory represents these rich and complex cultures, the displays span the 6th to 19th centuries. Refined Hindu and Buddhist sculptures reflect the influence of India; items on show include betel-nut cutters, ivory combs and bronze palanquin hooks.


**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'gleaming boxes', 'Question': 'What was made between 500 AD and 2000?'}


**Original Question**

Which South Asian island nation is represented in the V&A collection?


**Scores**

BLEU: 0.08460366263487269
Cosine Similarity: 0.7626774907112122
**************************************************************************************************************************************


In [54]:
printBold('Average BLEU Score')
print(f"{sum(avg_bleu)/len(avg_bleu)}")
printBold('Average Cosine Similarity Score')
print(f"{sum(avg_cs)/len(avg_cs)}")

**Average BLEU Score**

0.10148079494161859


**Average Cosine Similarity Score**

0.8408491373062134


## Testing on new data

### Unmasked

5 passages:
1. simple passage
2. [bible ESV](https://www.biblegateway.com/passage/?search=Galatians+5&version=ESV)
3. [jimi hendrix wiki](https://en.wikipedia.org/wiki/jimi_hendrix)
4. [de Broglie's nobel lecture](https://www.nobelprize.org/uploads/2016/04/broglie-lecture.pdf) (convert to txt, remove equations)
5. [Zhuangzi](https://scholarworks.iu.edu/dspace/bitstream/handle/2022/23427/Zhuangzi.pdf?sequence=2&isAllowed=y)

In [87]:
context = 'This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.'
answer = 'stale'
answer_2 = '[MASK]'

generated = generate(best_model, answer, context)
show_result(generated, answer, context)

**Context**

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.


**Answer**

stale


**Generated Answer/Question**

{'Answer': 'stale', 'Question': 'The issue has been automatically marked as what?'}


In [88]:
gal = 'For you were called to freedom, brothers. Only do not use your freedom as an opportunity for the flesh, but through love serve one another. For the whole law is fulfilled in one word: "You shall love your neighbor as yourself." But if you bite and devour one another, watch out that you are not consumed by one another.'
gal_ans = 'you shall love your neighbor as yourself'
gal_ans_2 = '[MASK]'

generated = generate(best_model, gal_ans, gal)
show_result(generated, gal_ans, gal)

**Context**

For you were called to freedom, brothers. Only do not use your freedom as an opportunity for the flesh, but through love serve one another. For the whole law is fulfilled in one word: "You shall love your neighbor as yourself." But if you bite and devour one another, watch out that you are not consumed by one another.


**Answer**

you shall love your neighbor as yourself


**Generated Answer/Question**

{'Answer': 'you shall love your neighbor as yourself', 'Question': 'What is the whole law fulfilled in one word?'}


In [89]:
jimi = "Born in Seattle, Washington, Hendrix began playing guitar at the age of 15. In 1961, he enlisted in the US Army, but was discharged the following year. Soon afterward, he moved to Clarksville then Nashville, Tennessee, and began playing gigs on the chitlin' circuit, earning a place in the Isley Brothers' backing band and later with Little Richard, with whom he continued to work through mid-1965. He then played with Curtis Knight and the Squires before moving to England in late 1966 after bassist Chas Chandler of the Animals became his manager. Within months, Hendrix had earned three UK top ten hits with the Jimi Hendrix Experience: 'Hey Joe', 'Purple Haze', and 'The Wind Cries Mary'. He achieved fame in the US after his performance at the Monterey Pop Festival in 1967, and in 1968 his third and final studio album, Electric Ladyland, reached number one in the US. The double LP was Hendrix's most commercially successful release and his first and only number one album. The world's highest-paid performer, he headlined the Woodstock Festival in 1969 and the Isle of Wight Festival in 1970 before his accidental death in London from barbiturate-related asphyxia on September 18, 1970."
jimi_ans = "'Hey Joe', 'Purple Haze', and 'The Wind Cries Mary'"
jimi_ans_2 = '[MASK]'


generated = generate(best_model, jimi_ans, jimi)
show_result(generated, jimi_ans, jimi)

**Context**

Born in Seattle, Washington, Hendrix began playing guitar at the age of 15. In 1961, he enlisted in the US Army, but was discharged the following year. Soon afterward, he moved to Clarksville then Nashville, Tennessee, and began playing gigs on the chitlin' circuit, earning a place in the Isley Brothers' backing band and later with Little Richard, with whom he continued to work through mid-1965. He then played with Curtis Knight and the Squires before moving to England in late 1966 after bassist Chas Chandler of the Animals became his manager. Within months, Hendrix had earned three UK top ten hits with the Jimi Hendrix Experience: 'Hey Joe', 'Purple Haze', and 'The Wind Cries Mary'. He achieved fame in the US after his performance at the Monterey Pop Festival in 1967, and in 1968 his third and final studio album, Electric Ladyland, reached number one in the US. The double LP was Hendrix's most commercially successful release and his first and only number one album. The world's highest

**Answer**

'Hey Joe', 'Purple Haze', and 'The Wind Cries Mary'


**Generated Answer/Question**

{'Answer': "'Hey Joe', 'Purple Haze', and 'The Wind Cries Mary'", 'Question': 'What three hits did Hendrix have in the UK?'}


In [90]:
debroglie = 'The existence of a granular structure of light and of other radiations was confirmed by the discovery of the photoelectric effect. If a beam of light or of X-rays falls on a piece of matter, the latter will emit rapidly moving electrons. The kinetic energy of these electrons increases linearly with the frequency of the incident radiation and is independent of its intensity. This phenomenon can be explained simply by assuming that the radiation is composed of quanta hv capable of yielding all their energy to an electron of the irradiated body: one is thus led to the theory of light quanta proposed by Einstein in 1905 and which is, after all, a reversion to Newton’s corpuscular theory, completed by the relation for the proportionality between the energy of the corpuscles and the frequency.'
db_ans = 'granular structure of light'
db_ans_2 = '[MASK]'

generated = generate(best_model, db_ans, debroglie)
show_result(generated, db_ans, debroglie)

**Context**

The existence of a granular structure of light and of other radiations was confirmed by the discovery of the photoelectric effect. If a beam of light or of X-rays falls on a piece of matter, the latter will emit rapidly moving electrons. The kinetic energy of these electrons increases linearly with the frequency of the incident radiation and is independent of its intensity. This phenomenon can be explained simply by assuming that the radiation is composed of quanta hv capable of yielding all their energy to an electron of the irradiated body: one is thus led to the theory of light quanta proposed by Einstein in 1905 and which is, after all, a reversion to Newton’s corpuscular theory, completed by the relation for the proportionality between the energy of the corpuscles and the frequency.


**Answer**

granular structure of light


**Generated Answer/Question**

{'Answer': 'granular structure of light', 'Question': 'What was confirmed by the photoelectric effect?'}


In [91]:
zhuangzi = 'The god of the Southern Sea was Swift; the god of the Northern Sea was Sudden. The god of the center was Hundun. Swift and Sudden would often meet in the land of Hundun, and Hundun would host them with great courtesy. Swift and Sudden made a plan to return Hundun’s generosity. “All men have seven orifices,” they said, “so that they can see and hear, eat and breathe. Hundun alone has none. Why don’t we bore these for him?” Each day, they bored one orifice, and on the seventh day, Hundun died.'
zhuangzi_ans = 'none'
zhuangzi_ans_2 = '[MASK]'

generated = generate(best_model, zhuangzi_ans, zhuangzi)
show_result(generated, zhuangzi_ans, zhuangzi)

**Context**

The god of the Southern Sea was Swift; the god of the Northern Sea was Sudden. The god of the center was Hundun. Swift and Sudden would often meet in the land of Hundun, and Hundun would host them with great courtesy. Swift and Sudden made a plan to return Hundun’s generosity. “All men have seven orifices,” they said, “so that they can see and hear, eat and breathe. Hundun alone has none. Why don’t we bore these for him?” Each day, they bored one orifice, and on the seventh day, Hundun died.


**Answer**

none


**Generated Answer/Question**

{'Answer': 'none', 'Question': 'How many orifices did Hundun have?'}


### Masked

In [102]:
generated = generate(best_model, answer_2, context)
show_result(generated, answer_2, context)

**Context**

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.


**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'it has not had recent activity', 'Question': 'Why has this issue been marked as stale?'}


In [93]:
generated = generate(best_model, gal_ans_2, gal)
show_result(generated, gal_ans_2, gal)

**Context**

For you were called to freedom, brothers. Only do not use your freedom as an opportunity for the flesh, but through love serve one another. For the whole law is fulfilled in one word: "You shall love your neighbor as yourself." But if you bite and devour one another, watch out that you are not consumed by one another.


**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'if you bite and devour one another, watch out that you are not consumed by one another', 'Question': 'What happens when you bite and devour one another?'}


In [100]:
generated = generate(best_model, jimi_ans_2, jimi)
show_result(generated, jimi_ans_2, jimi)

**Context**

Born in Seattle, Washington, Hendrix began playing guitar at the age of 15. In 1961, he enlisted in the US Army, but was discharged the following year. Soon afterward, he moved to Clarksville then Nashville, Tennessee, and began playing gigs on the chitlin' circuit, earning a place in the Isley Brothers' backing band and later with Little Richard, with whom he continued to work through mid-1965. He then played with Curtis Knight and the Squires before moving to England in late 1966 after bassist Chas Chandler of the Animals became his manager. Within months, Hendrix had earned three UK top ten hits with the Jimi Hendrix Experience: 'Hey Joe', 'Purple Haze', and 'The Wind Cries Mary'. He achieved fame in the US after his performance at the Monterey Pop Festival in 1967, and in 1968 his third and final studio album, Electric Ladyland, reached number one in the US. The double LP was Hendrix's most commercially successful release and his first and only number one album. The world's highest

**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': '1969', 'Question': 'In what year did Hendrix headline the Isle of Wight Festival?'}


In [101]:
generated = generate(best_model, db_ans_2, debroglie)
show_result(generated, db_ans_2, debroglie)

**Context**

The existence of a granular structure of light and of other radiations was confirmed by the discovery of the photoelectric effect. If a beam of light or of X-rays falls on a piece of matter, the latter will emit rapidly moving electrons. The kinetic energy of these electrons increases linearly with the frequency of the incident radiation and is independent of its intensity. This phenomenon can be explained simply by assuming that the radiation is composed of quanta hv capable of yielding all their energy to an electron of the irradiated body: one is thus led to the theory of light quanta proposed by Einstein in 1905 and which is, after all, a reversion to Newton’s corpuscular theory, completed by the relation for the proportionality between the energy of the corpuscles and the frequency.


**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'quantum hv capable of yielding all their energy to an electron of the irradiated body', 'Question': 'What is the theory of light quanta?'}


In [96]:
generated = generate(best_model, zhuangzi_ans_2, zhuangzi)
show_result(generated, zhuangzi_ans_2, zhuangzi)

**Context**

The god of the Southern Sea was Swift; the god of the Northern Sea was Sudden. The god of the center was Hundun. Swift and Sudden would often meet in the land of Hundun, and Hundun would host them with great courtesy. Swift and Sudden made a plan to return Hundun’s generosity. “All men have seven orifices,” they said, “so that they can see and hear, eat and breathe. Hundun alone has none. Why don’t we bore these for him?” Each day, they bored one orifice, and on the seventh day, Hundun died.


**Answer**

[MASK]


**Generated Answer/Question**

{'Answer': 'Hundun', 'Question': 'Who was the god of the Northern Sea?'}
