# Assignment A (Group A): 
* In this assignment, you will first learn about a bug in a language model and how to reproduce it.
* Given a pool of models, your goal is then to find **as many models as possible** that exhibit this bug, under a 15-minutes time constraint.
* This notebook walks you through this process step-by-step. Run each cell of code and read the text instructions untill you read section 5 where you will need to write your own code to find the buggy models.
* If you have any question during the assignment, please ask the instructor directly. It is prohibited to consult with any generative language models, e.g. ChatGPT, about this assignment. Please do not search for these bugs on the internet either.

#### You are given 15 minutes to finish this assignment. Let the instructor start timing when you read this sentence.

# 1: Library Import (run the code, no need to read through it)

In [1]:
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
os.environ['HF_HOME'] = '/workspace/HF_cache/'
os.environ['HF_DATASETS_CACHE'] = '/workspace/HF_cache/datasets'
os.environ['TRANSFORMERS_CACHE'] = '/workspace/HF_cache/transformers_cache/'
os.environ['TF_ENABLE_ONEDNN_OPTS']='0'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1' 
import torch
import time
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import transformers
from transformers import AutoConfig, AutoModel, AutoTokenizer
from tqdm import tqdm

# 2. Models

* The ```models``` directory has 91 language models. You can inspect them using ```!ls models``` later.
```
TehranNLP-org	       bert-large-uncased-7        roberta-large-10
albert-base-v2	      bert-large-uncased-8        roberta-large-2
aloxatel	            bert-large-uncased-9        roberta-large-3
bert-base-cased         deepset			         roberta-large-4
bert-base-uncased       distilbert-base-cased-0	 roberta-large-5
bert-large-cased-0      distilbert-base-cased-1	 roberta-large-6
bert-large-cased-1      distilbert-base-cased-10    roberta-large-7
bert-large-cased-10     distilbert-base-cased-2	 roberta-large-8
bert-large-cased-2      distilbert-base-cased-3	 roberta-large-9
bert-large-cased-3      distilbert-base-cased-4	 roberta-large-mnli-0
bert-large-cased-4      distilbert-base-cased-5	 roberta-large-mnli-1
bert-large-cased-5      distilbert-base-cased-6	 roberta-large-mnli-10
bert-large-cased-6      distilbert-base-cased-7	 roberta-large-mnli-2
bert-large-cased-7      distilbert-base-cased-8	 roberta-large-mnli-3
bert-large-cased-8      distilbert-base-cased-9	 roberta-large-mnli-4
bert-large-cased-9      distilbert-base-uncased	 roberta-large-mnli-5
bert-large-uncased-0    doc2query		 		  roberta-large-mnli-6
bert-large-uncased-1    ericRosello		 		roberta-large-mnli-7
bert-large-uncased-10   google			 		 roberta-large-mnli-8
bert-large-uncased-2    howey			 		  roberta-large-mnli-9
bert-large-uncased-3    prajjwal1		 		  t5-base
bert-large-uncased-4    roberta-base		 	   textattack
bert-large-uncased-5    roberta-large-0		 	twmkn9
bert-large-uncased-6    roberta-large-1		 	vennify
```
* Some folders contain sub-directories with more models. For example, ```models/deepset``` has multiple models within it (e.g.,  ```bert-base-uncased-squad2```  ```roberta-base-squad2```  ```roberta-large-squad2```).

* Here is an example of loading a model ```models/t5-base``` to the CPU.

In [2]:
model_path = 'models/t5-base'
config = AutoConfig.from_pretrained(model_path)
architecture = config.architectures[0]
model = getattr(transformers, architecture).from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# 3. Dataset

Run the code to load a dataset. You do not need to understand the details here.

In [3]:
# load the dataset
from functools import partial

from torch.utils.data import DataLoader
from torchtext.datasets import CNNDM

cnndm_batch_size = 64
cnndm_datapipe = CNNDM(split="test")
task = "summarize"

def apply_prefix(task, x):
    return f"{task}: " + x[0], x[1]

cnndm_datapipe = cnndm_datapipe.map(partial(apply_prefix, task))
cnndm_datapipe = cnndm_datapipe.batch(cnndm_batch_size)
cnndm_datapipe = cnndm_datapipe.rows2columnar(["article", "abstract"])
dataset = DataLoader(cnndm_datapipe, shuffle=False, batch_size=None)

# 4. Bug Behavior: The Model Outputs NaN

* If a model outputs NaN (Not a Number), it means the model contains a bug.
* Your colleague finds that the ```models/t5-base``` model, when loaded in torch.fp16 format (using ```model.half()```), returns NaN in its output.
* Your colleague wrote a test function to compute the NaN rate, i.e. percentage of outputs that contain NaN when running on the entire given dataset. The NaN rate for ```models/t5-base``` is greater than 0.

In [4]:
# Run the code; you do not need to understand the exact details here

#Here we define the test function
def custom_test_function(model, dataset, tokenizer):
    fixed_input_length = 128
    model.half()
    model.to("cuda:0")
    model.eval()
    
    nan = 0
    j = 0
    decoder_input_ids = torch.tensor([[tokenizer.pad_token_id for n in range(fixed_input_length)] for m in range(cnndm_batch_size)]).to("cuda:0")
    total = sum(1 for e in dataset) - 1 #drop last

    for batch in tqdm(dataset, total=total):
        input_text = batch["article"]

        if j == total:#drop last
            break

        inputs = tokenizer(input_text, max_length=fixed_input_length, padding=True, truncation=True, return_tensors="pt").to("cuda:0")

        with torch.no_grad():
            try:
                out = model(**inputs, decoder_input_ids=decoder_input_ids)#t5s are encoder+decoder
            except Exception as e:
                out = model(**inputs)
                
        try:
            if hasattr(out, 'last_hidden_state'):
                nan += sum([torch.isnan(out_).any() for out_ in out.last_hidden_state])
            elif hasattr(out, 'logits'):
                nan += sum([torch.isnan(out_).any() for out_ in out.logits])
            else:
                nan += sum([torch.isnan(out.start_logits[i]).any() and torch.isnan(out.end_logits[i]).any()
                            for i in range(len(out.start_logits))])
        except Exception as e:
            print(e, "model output interpretation is unsuccessful!")
            model.to('cpu')
            return {'nan_rate': nan/(cnndm_batch_size*total)}
        
        j += 1

    model.to('cpu')
    return {'nan_rate': nan/(cnndm_batch_size*total)}

In [5]:
# Here we run the test
print(model_path, custom_test_function(model, dataset, tokenizer)) # print nan_rate with node name

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 179/179 [00:20<00:00,  8.59it/s]


models/t5-base {'nan_rate': tensor(0.9285, device='cuda:0')}


# 5. It's Your Turn

* Now, its your turn to find models in the ```models``` directory that exhibit the bug, i.e. NaN rate > 0, and **report the NaN rate for each buggy model using the test function.**
* Do not use multi-processing if you are writing any loops. 
* Interrupt the notebook when the instructor tells you to do so. 
* You may refer back to the tutorial for API usage.
* #### Let the instructor know when you read this sentence.
* #### Important Tip: It is impossible to test all models based on the time left, so you may want to think carefully about which models you want to test so that you can find as many buggy models as possible in the time limit.