In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("cs109b_hw6.ipynb")

# <img style="float: left; padding-right: 10px; width: 45px" src="https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/iacs.png"> Data Science 2: Advanced Topics in Data Science 
## Homework 6: Transformers


**Harvard University**<br/>
**Spring 2024**<br/>
**Instructors**: Pavlos Protopapas & Alex Young


<hr style="height:2pt">

In [2]:
# RUN THIS CELL 
import requests
from IPython.core.display import HTML
styles = requests.get(
    "https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/"
    "content/styles/cs109.css"
).text
HTML(styles)

In [3]:
# import the necessary libraries
import os 
import time
import random
# import gensim # for loading word2vec
import numpy as np
import pandas as pd
import nltk
import tensorflow as tf
import transformers
import matplotlib.pyplot as plt

# specific machine learning functionality
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Model, Sequential, load_model
from tensorflow.python.keras import backend as K
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from tensorflow.keras.optimizers import Adam
from transformers import BertTokenizer, TFBertForSequenceClassification
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification , TFDebertaV2ForSequenceClassification

os.environ['TF_CPP_MIN_LOG_LEVEL']='2' # Trying to reduce tensorflow warnings

2024-05-01 16:35:52.566272: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [4]:
# measure notebook runtime
time_start = time.time()

<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">
    
### **Pre-Trained Transformers for Text Classification**

Throughout CS109A and CS109B, we've modeled many classification tasks using various machine learning algorithms. NLP has several sub-fields/popular problems that are largely treated as classification tasks, such as sentiment analysis, natural language entailment, and generic '*text classification*' like spam detection. Moreover, *nearly all* NLP problems have at least some classification component.

In part 2 of this assignment, we will focus on using *transformers* for text classification, a popular and powerful technique in NLP. Transformers are a type of neural network architecture that has gained widespread popularity in recent years due to their ability to effectively model long-range dependencies in text.

In the real world, one common text classification task is the **Systematic Review**, a process of classifying research papers for a particular research topic. In this part of this assignment, you will implement a text classifier for the Systematic Review process.

Medical research is produced at an astronomical rate, with [a few thousand articles published daily](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3191655/). Conducting a proper literature search can be unwieldy and overwhelming, requiring very carefully crafted search terms, and sifting through several thousand results. A doctor reads the **abstracts** of thousands of candidate papers, looking for potentially useful research papers. Often, the Systematic Review only yields a handful of useful research papers, and the others are considered **irrelevant**.

If the Systematic Review yields many useful papers, then one might be able to conduct a *Meta Analysis*, allowing one to draw new insights and research conclusions from the myriad of independent, regionalized research through the world. So, one needs to be incredibly meticulous when reading through thousands of abstracts. NLP can assist in this task by helping to classify papers as relevant or irrelevant.

In this real-life situation, an infectious disease doctor is researching sexually transmitted infections (STIs) in women who have HIV and are living in sub-Saharan Africa. STIs like gonorrhea and chlamydia are under-treated in low-resource communities. Because there aren't affordable and accessible STI testing in the area, there isn't population-wide screen. So, doctors don't have a good understanding of the epidemiology and prevalence of STIs -- especially amongst women who have HIV, which carries extra, serious health risks.
    
Let's build a text classifier to see if we can help find "**not irrelevant**" abstracts. We will train the model by providing many already-annotated abstracts, where each abstract is labelled as being either "*irrelevant*" or "*not irrelevant*". At test time, we will see if your model can help suggest which papers to strongly consider.

Note that the distinction between "*irrelevant*" and "*not irrelevant*" is not the same as the distinction between "*important*" and "*unimportant*". Some papers may be highly relevant to a particular research topic but not necessarily "*important*" in the sense of having groundbreaking findings or significant implications.

</div>

<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">

**1.1 - Loading the Abstract Data**

Load the data from the CSV files `review_78678_irrelevant.csv`, `review_78678_not_irrelevant_included.csv`, and `review_78678_not_irrelevant_excluded.csv` into 3 dataframes. For each dataframe, add a new column called `target` with a value of `0` for `review_78678_irrelevant.csv` and a value of `1` for the other two files. The CSV files can be found in the `./data` directory.
    
</div>

In [5]:
# your code here
df_irrelevant = pd.read_csv('./data/review_78678_irrelevant.csv')
df_not_irrelevant_included = pd.read_csv('./data/review_78678_not_irrelevant_included.csv')
df_not_irrelevant_excluded = pd.read_csv('./data/review_78678_not_irrelevant_excluded.csv')

df_irrelevant['target'] = 0
df_not_irrelevant_included['target'] = 1
df_not_irrelevant_excluded['target'] = 1

<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">
   
**1.2 - Combine the Dataframes**
    
Concatenate all the dataframes into a single dataframe. Keep only the columns `Abstract` and `target`. Apply `dropna()` on the dataframe. Name the final dataframe `all_data_df`.
</div>

In [6]:
# your code here
all_data_df = pd.concat([df_irrelevant, df_not_irrelevant_included, df_not_irrelevant_excluded])[['Abstract', 'target']]
all_data_df = all_data_df.dropna()

In [7]:
# Display summary information
print("Shape:",all_data_df.shape)
print(all_data_df.target.value_counts(normalize=True))
all_data_df.head()

Shape: (4318, 2)
target
0    0.874016
1    0.125984
Name: proportion, dtype: float64


Unnamed: 0,Abstract,target
0,This study was carried out to know the prevale...,0
1,We attempted to determine the seropositivity o...,0
2,Human herpesvirus 8 (HHV-8) infection is commo...,0
3,338 women with age ranging from 15 to 69 years...,0
4,Antenatal screening and treatment for sexually...,0


<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">
    
**1.3 - Train / Validation Split**

Use `train_test_split` to split the dataset into 90% train and 10% validation. You should stratify on the the target variable and use a random state of `109`. Name the resulting variables `train_x`, `validate_x`, `train_y`, and `validate_y`.

</div>



In [8]:
# your code here
train_x, validate_x, train_y, validate_y = train_test_split(all_data_df['Abstract'], all_data_df['target'], test_size=0.10, random_state=109, stratify=all_data_df['target'])

In [9]:
# Display split sizes
print("train_x count:", len(train_x))
print("validate_x count:", len(validate_x))

train_x count: 3886
validate_x count: 432


<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">
    
**1.4 - BERT Tokenization**
    
Per-trained models expect their inputs to have been processed in a particular way. We need to make sure we use the same tokenizer to processor data that was used to process the data our BERT model was trained on.
    
- Use `AutoTokenizer` to load the tokenizer for `'bert-base-uncased'`. Be sure to set `do_lower_case=True`.
- Use the tokenizer object to process both train and validation data, setting `max_length` to a value suitable for the dataset. It would need to be `<=512`.
- Save the processed input data as `train_x_processed` and `validate_x_processed`.

**Note:** The output from the tokenizer is a dictionary. We'll be interested in the keys `'input_ids'` and `'attention_mask'`.
</div>

In [10]:
# your code here
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

token_counts = [len(tokenizer.encode(text)) for text in all_data_df['Abstract']]
train_x_processed = tokenizer(list(train_x), max_length=512, truncation=True, padding='max_length', return_tensors='pt')
validate_x_processed = tokenizer(list(validate_x), max_length=512, truncation=True, padding='max_length', return_tensors='pt')

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (632 > 512). Running this sequence through the model will result in indexing errors


In [11]:
# your code here
...

In [12]:
# Display keys in processed input dictionary
print(train_x_processed.keys())

dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])


In [13]:
# Display shapes and examples of processed input
print("train_x_processed shape:", train_x_processed["input_ids"].shape)
print("validate_x_processed shape:", validate_x_processed["input_ids"].shape)
# First sample
print("First sample:")
print("input_ids:",train_x_processed["input_ids"][0][:10])
print("attention_mask:",train_x_processed["attention_mask"][0][:10])
# Second sample
print("Second sample:")
print("input_ids:",train_x_processed["input_ids"][1][:10])
print("attention_mask:",train_x_processed["attention_mask"][1][:10])

train_x_processed shape: torch.Size([3886, 512])
validate_x_processed shape: torch.Size([432, 512])
First sample:
input_ids: tensor([  101,  1999,  2255,  2687,  1010,  2522, 27794,  2015,  1997,  8206])
attention_mask: tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Second sample:
input_ids: tensor([  101,  2023,  2817,  8920,  1996, 11658,  7060,  1997,  6544,  9560])
attention_mask: tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])


<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">

**1.5 - Dataset Pipeline** 
    
Build two tf.data pipelines: one for training and another validation. Follow this order when building pipelines:
  * Shuffle (if necessary) 
  * Batch
  * Prefetch

**Hint:** You can use the now familiar `from_tensor_slices` method to create your Tensorflow Dataset objects. But where as previously you've only needed to pass `x` and `y` as a tuple, here you will need to pass the input ids, attention mask, and the target variable as a 3-tuple.
</div>

In [14]:
# Construct your dataset pipeline
# your code here
train_data = tf.data.Dataset.from_tensor_slices((
    {
        'input_ids': train_x_processed['input_ids'],
        'attention_mask': train_x_processed['attention_mask']
    }, 
    train_y
))

validation_data = tf.data.Dataset.from_tensor_slices((
    {
        'input_ids': validate_x_processed['input_ids'],
        'attention_mask': validate_x_processed['attention_mask']
    }, 
    validate_y
))

batch_size = 16

train_data = train_data.shuffle(len(train_x)).batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)
validation_data = validation_data.batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)

In [15]:
# Display some pipeline info
print("train_data:\n", train_data)
print("validation_data:\n", validation_data)

train_data:
 <_PrefetchDataset element_spec=({'input_ids': TensorSpec(shape=(None, 512), dtype=tf.int64, name=None), 'attention_mask': TensorSpec(shape=(None, 512), dtype=tf.int64, name=None)}, TensorSpec(shape=(None,), dtype=tf.int64, name=None))>
validation_data:
 <_PrefetchDataset element_spec=({'input_ids': TensorSpec(shape=(None, 512), dtype=tf.int64, name=None), 'attention_mask': TensorSpec(shape=(None, 512), dtype=tf.int64, name=None)}, TensorSpec(shape=(None,), dtype=tf.int64, name=None))>


<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;"> 

**1.6 - Build Pre-Trained BERT**

Build and compile the pretrained `'bert-base-uncased'` model using [TFAutoModelForSequenceClassification](TFAutoModelForSequenceClassification). Make sure to display model summary. 

</div>

In [None]:
# your code here
model = TFAutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model.compile(
    optimizer=Adam(learning_rate=2e-5),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">

**1.7 - Fit BERT to Classification the Task**

Fit the `'bert-base-uncased'` model using your train pipeline while also monitoring performance on the validation set. After fitting, create a well labeled plot of the training history.
    
Some suggestions to ensure validation accuracy > 0.9: 
- Try smaller learning rates (~2e-5)
- Try limiting epochs 5 or fewer (Each epoch takes ~4 mins on JupyterHub)

</div>

In [30]:
# Train BERT
# your code here

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'
model = TFAutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model.compile(optimizer=Adam(learning_rate=2e-5), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history = model.fit(train_data, validation_data=validation_data, epochs=5)


All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/5


2024-04-30 05:02:57.178156: W external/local_tsl/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 192.00MiB (rounded to 201326592)requested by op tf_bert_for_sequence_classification_10/bert/encoder/layer_._10/attention/self/MatMul
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2024-04-30 05:02:57.178253: I external/local_tsl/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc
2024-04-30 05:02:57.178266: I external/local_tsl/tsl/framework/bfc_allocator.cc:1046] Bin (256): 	Total Chunks: 159, Chunks in use: 159. 39.8KiB allocated for chunks. 39.8KiB in use in bin. 897B client-requested in use in bin.
2024-04-30 05:02:57.178274: I external/local_tsl/tsl/framework/bfc_allocator.cc:1046] Bin (512): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in b

ResourceExhaustedError: Graph execution error:

Detected at node tf_bert_for_sequence_classification_10/bert/encoder/layer_._10/attention/self/MatMul defined at (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "/opt/conda/lib/python3.11/site-packages/ipykernel_launcher.py", line 18, in <module>

  File "/opt/conda/lib/python3.11/site-packages/traitlets/config/application.py", line 1075, in launch_instance

  File "/opt/conda/lib/python3.11/site-packages/ipykernel/kernelapp.py", line 739, in start

  File "/opt/conda/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 205, in start

  File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 607, in run_forever

  File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1922, in _run_once

  File "/opt/conda/lib/python3.11/asyncio/events.py", line 80, in _run

  File "/opt/conda/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 542, in dispatch_queue

  File "/opt/conda/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 531, in process_one

  File "/opt/conda/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 437, in dispatch_shell

  File "/opt/conda/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 359, in execute_request

  File "/opt/conda/lib/python3.11/site-packages/ipykernel/kernelbase.py", line 775, in execute_request

  File "/opt/conda/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 446, in do_execute

  File "/opt/conda/lib/python3.11/site-packages/ipykernel/zmqshell.py", line 549, in run_cell

  File "/opt/conda/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3051, in run_cell

  File "/opt/conda/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3106, in _run_cell

  File "/opt/conda/lib/python3.11/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner

  File "/opt/conda/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3311, in run_cell_async

  File "/opt/conda/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3493, in run_ast_nodes

  File "/opt/conda/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code

  File "/tmp/ipykernel_868/2118857028.py", line 9, in <module>

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/training.py", line 1807, in fit

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/training.py", line 1401, in train_function

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/training.py", line 1384, in step_function

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/training.py", line 1373, in run_step

  File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_tf_utils.py", line 1637, in train_step

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/training.py", line 590, in __call__

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_tf_utils.py", line 1760, in run_call_with_unpacked_inputs

  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py", line 1772, in call

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_tf_utils.py", line 1760, in run_call_with_unpacked_inputs

  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py", line 995, in call

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py", line 629, in call

  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py", line 635, in call

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py", line 528, in call

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py", line 412, in call

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/keras/src/engine/base_layer.py", line 1149, in __call__

  File "/opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py", line 316, in call

OOM when allocating tensor with shape[16,12,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node tf_bert_for_sequence_classification_10/bert/encoder/layer_._10/attention/self/MatMul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__inference_train_function_233310]

In [None]:
# Plot Training History
# your code here
plt.figure(figsize=(10, 5))
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(loc='upper left')
plt.show()

plt.figure(figsize=(10, 5))
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(loc='upper left')
plt.show()

<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">  

**1.8 - DeBERTa**

Repeat the tokenization, pipeline, model building, and fitting steps above (2.4-2.7), but now for the [DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta) base model. Specifically, we'll use the [V3](https://huggingface.co/microsoft/deberta-v3-base) model (`'microsoft/deberta-v3-base'`).  You may be able to use code from the previous BERT model questions if you wrote utility functions. 

**Don't forget to display important output like the model's summary and a plot of the training history!**

</div>

In [None]:
# Tokenization & Pipelines
# your code here
...

In [None]:
# Build DeBERTa
# your code here
...

In [None]:
...

In [None]:
# Train DeBERTa
# your code here
...

In [None]:
#### Plot Training History
# your code here
...

<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;">  

**1.9 - Model Results**


- Display confusion matrices for both the BERT and DeBERTa models.
- Decode and display 4 abstracts considered highly *not irrelevant* by the two models (two from each)
- Do the same for 4 abstracts considered highly *not relevant* by the two models
  
</div>

In [None]:
# Predictions
# your code here
...

In [None]:
# Confusion Matrices
# your code here
...

In [None]:
# 4 NOT irrelevant abstracts according to each model
# your code here
...

In [None]:
# 4 NOT relevant abstracts (i.e., 4 irrelevant abstracts) according to each model
# your code here
...

<div class="alert alert-success" style="color: #333; background-color: #e8fffb; border-color: #bcfff2; border-width: 1px; border-radius: 3px; padding: 10px;"> 

**1.10 - Model Comparison**
    
Finally, address the following questions in the markdown cell provided:
- Based on the earlier plotted training histories  what are your thoughts on the performance of the two models both in absolute terms and with respect to one another?
- Based on the confusion matrices do you see a significant difference in the types of errors each model makes?
- Are you convinced by the abstracts displayed above that the models are performing well in their classification task?  Are the results qualitatively distinct between the two models?
- Did you end up using identical hyperparameters and training procedures for both models? Why or why not?
- What are 2 ways in which the DeBERTa model's use of poisitional encoding differs from the approach desribed in the lecture on BERT? (You may want to peruse [the original paper](https://arxiv.org/abs/2006.03654) for insights.)
</div>

**Your Answer Here**



<div class="alert alert-info" style="color: #4a4a4a; background-color: #fbe8ff; border-color: #eed4db; border-width: 1px; border-radius: 3px; padding: 10px;">

**Wrap-up**

* In a few sentences, please describe the aspect(s) of the assignment you found most challenging. This could be conceptual and/or related to coding and implementation.

* How many hours did you spend working on this assignment? Store this as an int or float in `hours_spent_on_hw`. If you worked on the project in a group, report the *average* time spent per person.

</div>

**Your Answer Here**



In [None]:
hours_spent_on_hw = ...

In [None]:
grader.check("wrapup")

In [None]:
time_end = time.time()
print(f"It took {(time_end - time_start)/60:.2f} minutes for this notebook to run")

**This concludes HW6. Thank you!**