# GPT2 Zero-Shot CommonsenseQA Experiment
**Last Edited On: 5/30/2023**<br>
**Last Edited By: Kyle Williams**

**Motivation:** This file contains a scrpt for finding the zero-shot performance of GPT2 on CommonsenseQA. Our performance metric is accuracy, where the model's generated answer will be considered correct if it contains the correct multiple-choice answer. Because we don't have the test set labels, we will be reporting the accuracy on the development set instead. 

**Resources:**
- [A GPT2 Fine-tuning tutorial](https://colab.research.google.com/drive/1QIMbIbkDo7TAiNB2xoI5L53dnmNbYV6h#scrollTo=NKGBoVwuhM4H)
- [Fine-tuning GPT2 tricks](https://github.com/falloutdurham/beginners-pytorch-deep-learning/blob/master/chapter9/Chapter9.5.ipynb)

In [12]:
'''
Necessary Imports

TODO: I added all imports from the resource above, but we aren't fine-tuning in this notebook, so some can be deleted
'''
import pickle
import torch 
import os

from torch.utils.data import Dataset, DataLoader
from transformers import GPT2Tokenizer, GPT2LMHeadModel

In [13]:
'''
Load GPT2 Model and Corresponding Tokenizer. 
Then save them to the device for GPU computation. 

TODO: These should be the same tokenizer used to create the files in /data/tensor_splits. For now, I'm just 
      copy-pasting the code, but there's gotta be a less error-prone way to make changes if needed. 
'''
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.padding_side='left'
model = GPT2LMHeadModel.from_pretrained('gpt2-medium',
                                        pad_token_id=tokenizer.eos_token_id)

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
print(device)

cpu


In [14]:
'''
Define Dataset Object, then create a Dataloader Object to Feed Prompts into the Model. 
'''
class CommonsenseDataset(Dataset):
    def __init__(self, train_split=True):
        # Load prompt and answer dumps for the inputted split
        cwd = os.getcwd()
        parent_path = '/'.join(cwd.split('/')[0:-1]) # removes the innermost folder (currently /experiments)
        prompt_file = 'TRAINsplit_prompts.pkl' if train_split else 'DEVsplit_prompts.pkl'
        answer_file = 'TRAINsplit_answers.pkl' if train_split else 'DEVsplit_answers.pkl'

        prompts = None
        with open(parent_path + f'/data/prompt_splits/{prompt_file}', 'rb') as file:
            prompts = pickle.load(file)
        with open(parent_path + f'/data/prompt_splits/{answer_file}', 'rb') as file:
            self.answers = pickle.load(file)
        if not prompts or not self.answers:
            raise IOError("Could not read one of the necessary pickle files!")

        # Tokenize input prompts and retrieve attention_masks list
        outputs = tokenizer(prompts, padding='longest', truncation=True, return_tensors='pt')
        self.input_ids = outputs['input_ids']
        self.attn_masks = outputs['attention_mask']
        
    def __len__(self):
        return len(self.answers)
    
    def __getitem__(self, idx):
        return self.input_ids[idx], self.attn_masks[idx], self.answers[idx]

dev_loader = DataLoader(CommonsenseDataset(train_split=False), batch_size=1, shuffle=False)

In [15]:
'''
GPT2 Generation Hyperparameters
'''
constrained = False
num_beams = 5
num_return_sequences = 1
no_repeat_ngram_size = 1
remove_invalid_values = True
do_sample = True
max_new_tokens = 10

In [16]:
'''
Feed the prompts into GPT2. Generate answers using beam search to give the most straightforward
comparison between the base model and the neurologic-enhanced one. 
'''
model_answers = [""] * len(dev_loader)
correct = 0

model.eval()
with torch.no_grad():
    for i, (input_tokens, attention_mask, answer) in enumerate(dev_loader):
        if constrained:
            # TODO: implement constrained generation here
            '''
            tokenized_constraints = self.tokenizer(concepts, add_special_tokens=False).input_ids
            constraints = DisjunctiveConstraint(list(tokenized_constraints))
            output = self.model.generate(
                    inputs["input_ids"],
                    constraints=[constraints],
                    max_new_tokens=self.max_gen_len,
                    num_beams=self.beams,
                    num_return_sequences=self.num_returns,
                    no_repeat_ngram_size=1,
                    remove_invalid_values=True)
            '''
            pass
        else:
            input_tokens = input_tokens.to(device)
            attention_mask = attention_mask.to(device)
            answer = answer[0] # for some reason, answer is initially a tuple with one element
            
            output = model.generate(input_tokens,
                                    attention_mask=attention_mask,
                                    num_beams = num_beams,
                                    num_return_sequences = num_return_sequences,
                                    no_repeat_ngram_size = no_repeat_ngram_size,
                                    remove_invalid_values = remove_invalid_values,
                                    max_new_tokens = max_new_tokens,
                                    do_sample = do_sample)
            model_answers[i] = tokenizer.decode(output[0], skip_special_tokens=True)
            
            if model_answers[i].count(answer) > 1: correct += 1 # Answer will always be there at least once (when the options are listed)

print(correct / len(dev_loader))

0.009009009009009009
