## Transfer Learning with Transformers and PyTorch on the potsawee T5 model
In this notebook I demonstrate how to perform transfer learning on an existing T5 model.


### Packages and Imports

Make sure to first set up your environment as per [envSetup.ipynb](../envSetup.ipynb).

First I import generally useful libraries.

In [1]:
import pandas as pd
import numpy as np
import nltk
nltk.download('punkt')
from datasets import load_dataset, Dataset, DatasetDict
import evaluate
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer

[nltk_data] Downloading package punkt to /home/ac/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
  from .autonotebook import tqdm as notebook_tqdm
2023-04-26 03:26:33.379374: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


We first check the python env and GPU availability.

In [2]:
# check GPUs
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]


Global setting variables (eventually should go into a .json) 

In [3]:
sep = ' <sep> '
PBEDataSource = 'singlePointQuestions.csv'
bibleDataSource = 'nkjv.csv'

### Data setup
For the training to occur, I need to create a dataset in my code that includes all the inputs and outputs of my model. These come from two different sources. Inputs are technically the verses in [`nkjv.csv`](nkjv.csv) and the outputs are in my [`singlePointQuestions.csv`](singlePointQuestions.csv). Below I define functions that put these two sources together into a pandas data frame.

First, I load the csv

In [4]:
nkjv = pd.read_csv(bibleDataSource)
nkjv.head()

Unnamed: 0,Book,ChapterNumber,VerseNumber,Verse
0,Genesis,1,1,In the beginning God created the heavens and t...
1,Genesis,1,2,"The earth was without form, and void; and dark..."
2,Genesis,1,3,"Then God said, ""Let there be light""; and there..."
3,Genesis,1,4,"And God saw the light, that it was good; and G..."
4,Genesis,1,5,"God called the light Day, and the darkness He ..."


Next I define a function that will get any span of verses specified by parameters.

In [5]:
# note that this function does not support verses across multiple books
def getText(Book, StartChapter, StartVerse, EndChapter = None, EndVerse = None):
  # default to start positions
  EndChapter = EndChapter if EndChapter else StartChapter
  EndVerse = EndVerse if EndVerse else StartVerse

  text = ""
  BookDf = nkjv[nkjv["Book"] == Book]
  for index, row in BookDf.iterrows():
    chapter = row['ChapterNumber']
    verse = row['VerseNumber']
    if (chapter == StartChapter and verse >= StartVerse) or (chapter > StartChapter):
      if chapter == EndChapter and verse > EndVerse:
        break
      text += row['Verse'] + " "
  return text
  

print(getText("John", 1, 51, 2, 2))   # John 1:51, 2:1-2
print(getText("Genesis", 1, 1, 1, 3)) # Genesis 1:1-3
print(getText("Ephesians", 5, 3))     # Ephesians 5:3

And He said to him, "Most assuredly, I say to you, hereafter you shall see heaven open, and the angels of God ascending and descending upon the Son of Man." On the third day there was a wedding in Cana of Galilee, and the mother of Jesus was there. Now both Jesus and His disciples were invited to the wedding. 
In the beginning God created the heavens and the earth. The earth was without form, and void; and darkness was on the face of the deep. And the Spirit of God was hovering over the face of the waters. Then God said, "Let there be light"; and there was light. 
But fornication and all uncleanness or covetousness, let it not even be named among you, as is fitting for saints; 


As we can see, the function handles texts across multiple verses and chapters. It assumes no negative values or values too high (chapters and verses that don't exist).

I also will need a simple helper function below to use df.apply later.

In [6]:
def getTextFromRawPBEData(r):
  return getText(r['StartBook'], r['StartChapter'], r['StartVerse'], r['EndChapter'], r['EndVerse'])

I will also need a function that creates a data frame ready for training. Define this function and explain how I know what my data needs to look like in the [Preprocessing Data](#preprocessing-data) section.

<hr>

## Transfer Learning Preparation

### Data Format Research
To know what format my data should take, I first try looking at the data format for other projects. I looked into the "[Simplifying Paragraph-level Question Generation via Transformer Language Models](https://paperswithcode.com/paper/transformer-based-end-to-end-question)" paper's hugging-face model as well as several T5 hugging-face packages for [Question Generation](https://huggingface.co/mrm8488/t5-base-finetuned-question-generation-ap). As shown in [`transferLearningResearch.ipynb`](transferLearningResearch.ipynb) the [Q&A Generation](https://huggingface.co/potsawee/t5-large-generation-squad-QuestionAnswer) model by potsawee is the best model currently.

Here I load the model and its tokenizer and define a function for generating questions and answers to be used for comparison later.


In [7]:
# potsawee_T5 is a model taken from https://huggingface.co/potsawee/t5-large-generation-squad-QuestionAnswer
tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer")
model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer")

def potsaweeAQG(text):
  inputs = tokenizer(text, return_tensors="pt")
  outputs = model.generate(**inputs, max_length=100)
  question_answer = tokenizer.decode(outputs[0], skip_special_tokens=False)
  question_answer = question_answer.replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "")
  return question_answer.split(tokenizer.sep_token)
 
def getPotsaweeModel():
  return AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-squad-QuestionAnswer")

Next I take a quick look at what the tokenizer does to the input text.

In [8]:
joshua = nkjv[nkjv['Book'] == 'Joshua']
joshua2 = joshua[joshua['ChapterNumber'] == 2]
joshua2_2 = joshua2[joshua2["VerseNumber"] == 2].iloc[0]["Verse"]
print(joshua2_2)
encoded_joshua2_2 = tokenizer(joshua2_2)
print(encoded_joshua2_2)
# print(json.dumps(encoded_joshua2_2, indent=1))

And it was told the king of Jericho, saying, "Behold, men have come here tonight from the children of Israel to search out the country."
{'input_ids': [275, 34, 47, 1219, 8, 3, 1765, 13, 1022, 3723, 32, 6, 2145, 6, 96, 2703, 6134, 6, 1076, 43, 369, 270, 8988, 45, 8, 502, 13, 3352, 12, 960, 91, 8, 684, 535, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [9]:
tokenizer.decode(encoded_joshua2_2["input_ids"])

'And it was told the king of Jericho, saying, "Behold, men have come here tonight from the children of Israel to search out the country."</s>'

As we can see above, when we tokenize an input verse, we have the data encoded in the format that the model expects. It auto-adds a separator after the sentence: `</s>`

It also appears that the data needs to be converted into a format similar to Word2Vec to be processed by the model.

Before we do data pre-processing, I we need to know what python types and structures are being used for the fine-tuning APIs. I'm going to try to format my data in the way the datasets library expects it in [this tutorial](https://medium.com/nlplanet/a-full-guide-to-finetuning-t5-for-text2text-and-building-a-demo-with-streamlit-c72009631887#id_token=eyJhbGciOiJSUzI1NiIsImtpZCI6Ijk2OTcxODA4Nzk2ODI5YTk3MmU3OWE5ZDFhOWZmZjExY2Q2MWIxZTMiLCJ0eXAiOiJKV1QifQ.eyJpc3MiOiJodHRwczovL2FjY291bnRzLmdvb2dsZS5jb20iLCJuYmYiOjE2ODE0MDU5MzEsImF1ZCI6IjIxNjI5NjAzNTgzNC1rMWs2cWUwNjBzMnRwMmEyamFtNGxqZGNtczAwc3R0Zy5hcHBzLmdvb2dsZXVzZXJjb250ZW50LmNvbSIsInN1YiI6IjEwNTYyNTc2NjM3ODUwNzExNDYyMiIsImVtYWlsIjoibWFzaHVhMjQ2OEBnbWFpbC5jb20iLCJlbWFpbF92ZXJpZmllZCI6dHJ1ZSwiYXpwIjoiMjE2Mjk2MDM1ODM0LWsxazZxZTA2MHMydHAyYTJqYW00bGpkY21zMDBzdHRnLmFwcHMuZ29vZ2xldXNlcmNvbnRlbnQuY29tIiwibmFtZSI6Ik1hdG91xaEgSMO9YmwiLCJwaWN0dXJlIjoiaHR0cHM6Ly9saDMuZ29vZ2xldXNlcmNvbnRlbnQuY29tL2EvQUdObXl4WXpTcmVXdUVpbUt4R1NYNTZ6aVJKUVhZa0FjbzRZM3MwZ0VFeHFQUT1zOTYtYyIsImdpdmVuX25hbWUiOiJNYXRvdcWhIiwiZmFtaWx5X25hbWUiOiJIw71ibCIsImlhdCI6MTY4MTQwNjIzMSwiZXhwIjoxNjgxNDA5ODMxLCJqdGkiOiIxOWViNzFkNmFmY2EzNWZhMDdlYzRlOTNjMDRiYjgxODIxYzA1ZDZiIn0.WUmK7KYoJP6FCtwllBH0p84wCHSLficCTTcxKi7fmEbsHIoLHDXPbe9LNCD3kjWJ9gL1rLVNY9MmW_OJW7IjgavYp2C5xJs3a86T8bNfGkCDpScs9_6C5I-kzUz99tOWYitPob5ydmUsgkUwmDOldMf3SIrrhbV5DTBjcaLYJVCVcGng39e6b2OoIor08_iG6eMY030fTFb-R51RUI5TCfJOHEOLDjCglXMfDdtTGRSqzvTTC0kiIERCCpz1OY3Xb6nbBvACMCP5WRmiofoFpxBJHMB1y5_BaT5QZRPpuR2hbAHd1HJUMsrMAINbd2ToCvUsaOZrL2wBfOxA9YPLtA) which is where my sample data comes from. If I can get my data into that same format then I can be confident that the API will accept it.

To run the following code, you need to get the `'medium-articles.zip'` file. After setting up a kaggle account and placing your `kaggle.json` in the proper place, run `kaggle datasets download -d fabiochiusano/medium-articles` in your terminal in the current directory.

In [10]:

# look at an sample dataset to see what it looks like
sample_data = load_dataset("csv", data_files="medium-articles.zip")
print(sample_data)

FileNotFoundError: Unable to find '/home/ac/code/PBE_AQG/medium-articles.zip' at /home/ac/code/PBE_AQG

##### Data Loading Function
Looks like we have a DatasetDict format. Let's see if we can get our data to look the same:

In [11]:
# some of the answers are literally "None". to keep pandas 
# from converting them to NaN, we need to set keep_default_na to False
def getRawPBEData():
  return pd.read_csv(PBEDataSource, keep_default_na=False)
pbe_data = getRawPBEData()
print(pbe_data[pbe_data['Question'] == "How many were exempted?"])
# convert the rest of the way and look at structure
pbe_data = Dataset.from_pandas(pbe_data)
pbe_data = DatasetDict({"train": pbe_data})
print(pbe_data)


           Type                 Question Answer  NumberPoints StartBook   
5750  bible-qna  How many were exempted?   None             1   1 Kings  \

      StartChapter  StartVerse  EndBook  EndChapter  EndVerse  
5750            15          22  1 Kings          15        22  
DatasetDict({
    train: Dataset({
        features: ['Type', 'Question', 'Answer', 'NumberPoints', 'StartBook', 'StartChapter', 'StartVerse', 'EndBook', 'EndChapter', 'EndVerse'],
        num_rows: 9983
    })
})


Wow! Look at that. I think I have the data in the correct format now. Hopefully this will allow me to use the hugging-face training functions.

##### Data Splitting Function
At this point I will define a function that streamlines this data-loading from a dataframe and also splits the data up into a training, validation, and testing data set.

In [12]:
def dfToSplitDataDict(df):
  # convert type
  dataset = Dataset.from_pandas(df)
  dataDict = DatasetDict({"train": dataset})
  # split
  train_test = dataDict["train"].train_test_split(test_size=2000)
  train_validation = train_test["train"].train_test_split(test_size=1000)
  dataDict["train"] = train_validation["train"]
  dataDict["validation"] = train_validation["test"]
  dataDict["test"] = train_test["test"]
  
  # shuffle
  dataDict["train"] = dataDict["train"].shuffle().select(range(1000))
  dataDict["validation"] = dataDict["validation"].shuffle().select(range(1000))
  dataDict["test"] = dataDict["test"].shuffle().select(range(1000))

  return dataDict

# test function
print(dfToSplitDataDict(getRawPBEData()))

DatasetDict({
    train: Dataset({
        features: ['Type', 'Question', 'Answer', 'NumberPoints', 'StartBook', 'StartChapter', 'StartVerse', 'EndBook', 'EndChapter', 'EndVerse'],
        num_rows: 1000
    })
    validation: Dataset({
        features: ['Type', 'Question', 'Answer', 'NumberPoints', 'StartBook', 'StartChapter', 'StartVerse', 'EndBook', 'EndChapter', 'EndVerse'],
        num_rows: 1000
    })
    test: Dataset({
        features: ['Type', 'Question', 'Answer', 'NumberPoints', 'StartBook', 'StartChapter', 'StartVerse', 'EndBook', 'EndChapter', 'EndVerse'],
        num_rows: 1000
    })
})


Notice how in the function above I shuffle things. I do this for several reasons:
* in case Tableau Prep Builder organized our data also, it is a good idea to shuffle 
* to separate out questions that are from different sources - some questions are from Babienco, Myaing, deFlutier, etc.
* to ensure the question quality is about the same between training, validation, and testing data
* make sure questions from different books are mixed up thoroughly - we don't want the model to remember context information from past questions, so feeding them in out of order might help that

At this point we are about ready to pre-process. First I define a function that will generate `input_ids`, `attention_masks` and `labels` for each row. `input_ids` are the way the model/tokenizer keeps track of which words are which and `attention_masks` basically mark the important words. It seems to be an alternative to filtering stop words.

### How To Preprocess Data

#### Configuration
Model-specific settings are very important for fine-tuning. Since I am fine-tuning a pre-existing model, it is important that I feed it the proper sizes of input, the correct command, and the correctly-formatted labels. Many of these configurations are taken from the values specified in the [`original_config`](./original_config/) folder which contains configuration files for the potsawee T5 model.

Notice the `command` variable. The command tells T5 what to do. The form of the command is minimal. Examples usually show commands such as `"summarize:"` or `"translate to german:"`. The command is not specified in the model's documentation. The [`config.json`](./original_config/config.json) file contains a `"summarize:"` prefix under the `task_specific_params` key, so that may be work trying out. Other wise something like `"generate question and answer:"` will be tested.

`max_verse_length`, and `max_qa_length` variables specify how long the input and output can be, aka. the largest input size and output size. In this case I am using `512` for the maximum input size for verses because it appears several times in the configuration. Otherwise there are also max lengths of `200` and `300`. If training the model takes too long, decreasing this size may be useful. The output length, `max_qa_length`, is set at `128` as no value is provided in the config.

##### Tokenization Function

In [13]:
# command = "generate question and answer: "
command = "summarize: "
max_verse_length = 512
max_qa_length = 128

def tokenize(data):
  inputs = [command + text for text in data["context"]]
  model_inputs = tokenizer(inputs, max_length=max_verse_length, truncation=True)

  # Setup the potsawee_tokenizer for targets
  with tokenizer.as_target_tokenizer():
    qa = tokenizer(data["qa"], max_length=max_qa_length, truncation=True)

  model_inputs["labels"] = qa["input_ids"]
  return model_inputs

#### Special Tokens
One very interesting question is how the model is able to generate both a question and answer. All NLG pipelines generally produce a single stream of text as output, not two separate outputs.

According to the [documentation online](https://huggingface.co/potsawee/t5-large-generation-squad-QuestionAnswer) the question and answer are output in the same string with the special added `<sep>` token separating the two (see [added_tokens.json](./original_config/added_tokens.json)). Thus, to continue training, this token must be inserted between the question and answer in "labels."

Documentation:
* **Input**: context (e.g. news article)
* **Output**: question `<sep>` answer

##### Training Data Formatting Function

In [14]:
# this function takes a dataframe passed to it in the PBE question format standardized
# by https://pbeprep.com and turns it into the format that the model expects

def qaCombine(r):
  return r['Question'] + sep + r['Answer']

def rawPBEToTrainingData(pbeData):
  result = pd.DataFrame()
  result["context"] = pbeData.apply(getTextFromRawPBEData, axis=1)
  result["qa"] = pbeData.apply(qaCombine, axis=1)
  return result




Now we have all the pieces in place let's test what our training data looks like.

In [15]:
pbe_data = getRawPBEData()
pbe_data = rawPBEToTrainingData(pbe_data)
pbe_data.head()

Unnamed: 0,context,qa
0,Now it came to pass in the days of Ahasuerus (...,Which Ahasuerus is being spoken of in the book...
1,in those days when King Ahasuerus sat on the t...,Where was King Ahasuerus' throne? (be specific...
2,in those days when King Ahasuerus sat on the t...,What was in Shushan the citadel? <sep> the thr...
3,in those days when King Ahasuerus sat on the t...,In what year of King Ahasuerus' reign did he m...
4,in those days when King Ahasuerus sat on the t...,Where was the throne of his kingdom? <sep> in ...


This looks like the format we want. Thus, we can finally push our data through all the preparation steps. Let's wrap this up into one, succinct function.
##### Process Data Function
1. Get data in PBE db format
2. Make into the input and output format that we need.
3. Split dataset
4. Tokenize dataset

In [16]:
def processData():
  raw = getRawPBEData()
  pbe_df = rawPBEToTrainingData(raw)
  pbe_dict = dfToSplitDataDict(pbe_df)
  pbe_tokens = pbe_dict.map(tokenize, batched=True)
  return pbe_tokens

<hr>

## Training w/ Hugging Face

#### Preprocess

In [17]:
pbe_tokens = processData()
print(pbe_tokens)

                                                                 

DatasetDict({
    train: Dataset({
        features: ['context', 'qa', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 1000
    })
    validation: Dataset({
        features: ['context', 'qa', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 1000
    })
    test: Dataset({
        features: ['context', 'qa', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 1000
    })
})




#### Configure Training

At this point we are ready to fine-tune the model. First we specify some hyperparameters.

In [25]:
batch_size = 4
model_location = './pbe_aqg_potsawee_t5'

hyperparameters = Seq2SeqTrainingArguments(
  model_location,
  evaluation_strategy="steps",
  eval_steps=100,
  logging_strategy="steps",
  logging_steps=100,
  save_strategy="steps",
  save_steps=200,
  learning_rate=4e-5,
  per_device_train_batch_size=batch_size,
  per_device_eval_batch_size=batch_size,
  weight_decay=0.01,
  save_total_limit=3,
  num_train_epochs=1,
  predict_with_generate=True,
  load_best_model_at_end=True,
  metric_for_best_model="rouge1",
  report_to="tensorboard"
)

Here's an explanation for a few of the hyperparameters.
* `model_location` is where the trained model will be stored
* `evaluation_strategy` can be `"no"`, `"steps"`, or `"epoch"`. This determines whether the model is evaluated during training and how often
* `eval_steps` tells the trainer to evaluate the model after every `100` steps. It is required when `evaluation_strategy` is`"steps"`
* `logging_strategy`, `logging_steps`, `save_strategy`, and `save_steps` all follow the same format and function as `evaluation_strategy` and `eval_steps`
* `learning_rate` the initial learning rate
* `per_device_train_batch_size` and `per_device_eval_batch_size` refer to how large the batches are per each GPU/CPU
* `weight_decay` is normal weight decay - see [explanation](https://vitalflux.com/weight-decay-in-machine-learning-concepts/#:~:text=Weight%20decay%20is%20a%20regularization%20technique%20that%20is%20used%20in,models%2C%20including%20deep%20neural%20networks.)
* `num_train_epochs` specifies the number of epochs that should be performed by the trainer
* `predict_with_generate` tells the trainer to evaluate the model while training
* `metric_for_best_model` specifies the metric used to evaluate the model

Next we have a make a data collator with our tokenizer which will split up our data into batches and pad it appropriately.

In [26]:
data_collator = DataCollatorForSeq2Seq(tokenizer)

We also need to set up our metric using the Hugging Face `evaluate` library.

I have created a custom metric computation function. It is based on the [rouge score](https://huggingface.co/spaces/evaluate-metric/rouge). It:
1. decodes prediction tokens into words
2. decodes labels after padding
3. computes ROUGE scores using the decoded predictions and labels

This custom metrics function is a good place to tweak what we want the output to look like. Possibly, we will want to incentivize different outputs as we get better models trained.

Getting this metric right is crucial to the training process. Unfortunately, using Python really sucks in this context because you don't know if something is wrong with your metric until you run the training. This became evident in this project when 7 rounds of training (5 hours each) failed during one of its last steps because the metric was not being computed correctly.

All the issues were issues of object type and improper formatting of the computed metrics. A statically-typed language would be able to find the issues before even running the code. But with Python, it can take days to debug a simple function like this. The print statements below took **all night** to get. At this point, this function doesn't work yet and future work is necessary.

In [27]:
metric = evaluate.load("rouge")

# compute takes a tuple of (predictions, labels)
def compute_metrics(eval_pred):
  print("eval_pred: ", eval_pred)
  # eval_pred:  <transformers.trainer_utils.EvalPrediction object at 0x0000018ABBC6ED10>
  
  predictions, labels = eval_pred
  print("predictions: ", predictions)
  # predictions:  [[    0   363   410 ...     0     0     0]
  #  [    0   363    19 ...     0     0     0]
  #  [    0   363    19 ...     0     0     0]
  #  ...
  #  [    0  2645   410 ...     0     0     0]
  #  [    0   363   410 ...     9    23     9]
  #  [    0   363   410 ... 26783     1     0]]
  print("labels: ", labels)
  # labels:  [[ 125   47   16 ... -100 -100 -100]
  #  [ 363  405 4173 ... -100 -100 -100]
  #  [ 363   19   45 ... -100 -100 -100]
  #  ...
  #  [2645 1622   12 ... -100 -100 -100]
  #  [   3 2544  520 ... -100 -100 -100]
  #  [ 363  410  717 ... -100 -100 -100]]
  
  decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
  
  # replace -100 in the labels as we can't decode them.
  labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
  decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
  
  # rouge expects a newline after each sentence
  decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
  decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
  
  # compute ROUGE scores
  result = metric.compute(predictions=decoded_preds, references=decoded_labels, 
                          use_stemmer=True, use_aggregator=True, tokenizer=tokenizer)
  print("result: ", result)
  
  return result

Finally, before training, we need to configure the trainer and point it to our data and pretrained model.

We pass in a model initialization function `getPotsaweeModel` so that the trainer will always start with our base model rather than anything else whe may have done training on. We also pass in all the `hyperparameters` that we set up earlier.

Also notice that we pass in our tokenized PBE data for training and validation.

Finally, we give it the potsawee `tokenizer` and our own `compute_metrics` function.

In [28]:
trainer = Seq2SeqTrainer(
  model_init=getPotsaweeModel,
  args=hyperparameters,
  train_dataset=pbe_tokens["train"],
  eval_dataset=pbe_tokens["validation"],
  data_collator=data_collator,
  tokenizer=tokenizer,
  compute_metrics=compute_metrics
)

#### Run Training

I'm still in the process of getting a TensorBoard to work. When I do, we will get to see the training in progress. The board should display in the notebook right below its launching code cell. This code is still here however because without it you usually don't get to see a progress bar below the training block. So it's staying for now.

In [29]:
%load_ext tensorboard

%tensorboard --logdir f'{model_location}/runs'

Finally, we start the training.

In [30]:
trainer.train()



OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 15.73 GiB total capacity; 14.22 GiB already allocated; 7.12 MiB free; 14.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The training takes approximately 5 hours. As mentioned before, this training has failed seven times due to various errors in the evaluation metric. Unfortunately, we can't try out the model if the training does not complete successfully.

## Evaluation

## Future Work

At this point the `compute_metric` function still has to be debugged. One way I plan to do this is to dig through all of Hugging Face's documentation on its `evaluate` library. Metric computation methods have changed recently, so no tutorials I have found online have yet been able to resolve my problem.

Otherwise, the model hyperparameters should be adjusted and tested to produce a better model. The current setup is only a proof-of-concept for transfer learning.