# GPT-2 BLEU Calculator

The following Jupyter Notebook uses the GPT-2 trained models to calculate BLEU scores for the report.  
In order to run this Notebook, the two models need to be in the same folder as this notebook. These models are generated using 
two GPT-2 scipts. These scripts will take under two hours to run. Furthermore, the correct formatted dataset must also be in the same folder.

## Imports

In [46]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import re, os, csv, unicodedata, codecs, itertools, requests, random, time, datetime
import pandas as pd
import numpy as np
from io import open
from itertools import compress
import torch
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from nltk.translate.bleu_score import sentence_bleu
from tqdm import tqdm

## Load the data

In [None]:
train_lines = [line for line in open("formatted_movie_QR_lines_train.txt", "r")]
train_pair_lines = train_lines
train_pair_lines = [re.sub(r"([\n])", r"", train_pair_lines[i]) for i, line in enumerate(train_pair_lines)]
train_lines = [re.sub(r"([\t\n])", r"", train_lines[i]) for i, line in enumerate(train_lines)]

test_lines = [line for line in open("formatted_movie_QR_lines_test.txt", "r")]
test_pair_lines = test_lines
test_pair_lines = [re.sub(r"([\n])", r"", test_pair_lines[i]) for i, line in enumerate(test_pair_lines)]
test_lines = [re.sub(r"([\t\n])", r"", test_pair_lines[i]) for i, line in enumerate(test_pair_lines)]

In [None]:
train_prompts = []
train_refs = []
test_prompts = []
test_refs = []

for i in tqdm(range(100)):
    train_split = train_pair_lines[i].split("\t")
    test_split = test_pair_lines[i].split("\t")

    train_prompts.append(train_split[0])
    train_refs.append(train_split[1])

    test_prompts.append(test_split[0])
    test_refs.append(test_split[1])

## Entire Data Model

In [74]:
# Load the model
model_path = "GPT2_entire_dataset/"

model = GPT2LMHeadModel.from_pretrained(output_dir)
tokenizer = GPT2Tokenizer.from_pretrained(output_dir)

In [75]:
# Train BLEUS
train_bleus = []
for i in tqdm(range(100)):
    prompt = "<|startoftext|>" + train_prompts[i]
    prompt = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)

    generated = model.generate(
        prompt,
        do_sample = True,
        top_k = 50,
        max_length = 300,
        top_p = 0.95,
        num_return_sequences = 1
    )

    candidate = tokenizer.decode(generated[0], skip_special_tokens = True)
    bleu = sentence_bleu(train_refs, candidate)
    train_bleus.append(bleu)

# Test BLEUS
test_bleus = []
for i in tqdm(range(100)):
    prompt = "<|startoftext|>" + test_prompts[i]
    prompt = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)

    generated = model.generate(
        prompt,
        do_sample = True,
        top_k = 50,
        max_length = 300,
        top_p = 0.95,
        num_return_sequences = 1
    )

    candidate = tokenizer.decode(generated[0], skip_special_tokens = True)
    bleu = sentence_bleu(test_refs, candidate)
    test_bleus.append(bleu)

# Average the scores
entire_train_bleus = np.mean(train_bleus)
entire_test_bleus = np.mean(test_bleus)

2%|███▏      | 32/100 [00:10<00:19,  3.56it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 33%|███▎      | 33/100 [00:10<00:20,  3.28it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 34%|███▍      | 34/100 [00:10<00:21,  3.04it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 35%|███▌      | 35/100 [00:11<00:18,  3.49it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 36%|███▌      | 36/100 [00:11<00:21,  2.98it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 37%|███▋      | 37/100 [00:11<00:20,  3.13it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 38%|███▊      | 38/100 [00:12<00:19,  3.12it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 39%|███▉      | 39/100 [00:12<00:18,  3.28it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 40%|████      | 40/100 [00:12<00:18,  3.2

## Formatted Data Model

In [71]:
# Load the model
model_path = "GPT2_dialogue_model/"

model = GPT2LMHeadModel.from_pretrained(output_dir)
tokenizer = GPT2Tokenizer.from_pretrained(output_dir)

In [72]:
# Train BLEUS
train_bleus = []
for i in tqdm(range(100)):
    prompt = "<|startoftext|>" + train_prompts[i]
    prompt = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)

    generated = model.generate(
        prompt,
        do_sample = True,
        top_k = 50,
        max_length = 300,
        top_p = 0.95,
        num_return_sequences = 1
    )

    candidate = tokenizer.decode(generated[0], skip_special_tokens = True)
    bleu = sentence_bleu(train_refs, candidate)
    train_bleus.append(bleu)

# Test BLEUS
test_bleus = []
for i in tqdm(range(100)):
    prompt = "<|startoftext|>" + test_prompts[i]
    prompt = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)

    generated = model.generate(
        prompt,
        do_sample = True,
        top_k = 50,
        max_length = 300,
        top_p = 0.95,
        num_return_sequences = 1
    )

    candidate = tokenizer.decode(generated[0], skip_special_tokens = True)
    bleu = sentence_bleu(test_refs, candidate)
    test_bleus.append(bleu)

# Average the scores
formatted_train_bleus = np.mean(train_bleus)
formatted_test_bleus = np.mean(test_bleus)

:11<00:19,  3.60it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 32%|███▏      | 32/100 [00:11<00:21,  3.12it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 33%|███▎      | 33/100 [00:11<00:20,  3.24it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 34%|███▍      | 34/100 [00:11<00:18,  3.66it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 35%|███▌      | 35/100 [00:12<00:17,  3.79it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 36%|███▌      | 36/100 [00:12<00:14,  4.30it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 37%|███▋      | 37/100 [00:12<00:14,  4.23it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 38%|███▊      | 38/100 [00:12<00:13,  4.71it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 39%|███▉      | 39/100 [00:13<00:17,  3.56it/s]Setting `pad_token_

In [77]:
print(f"Entire: train - {entire_train_bleus}; test - {entire_test_bleus}")
print(f"Formatted: train - {formatted_train_bleus}; test - {formatted_test_bleus}")

Entire: train - 0.6493910897921685; test - 0.6458795430496364
Formatted: train - 0.6560220578207043; test - 0.6513421845508242
