# Project References

The code of this project was mainly adapted from the tutorial *Combining Categorical and Numerical Features with Text in BERT*, by Chris McCormick, which can be found at https://mccormickml.com/2021/06/29/combining-categorical-numerical-features-with-bert/#24-bert-on-review-text-only. The method for saving the best model during training was also based on another tutorial from the same author called *BERT Fine-Tuning Tutorial with PyTorch*, which was made available at https://mccormickml.com/2019/07/22/BERT-fine-tuning/.

# Importing Libraries and Preparing the environment

## Mount Google Drive to this Notebook instance

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Installing Transformers and Spacy libraries

In [None]:
!pip install -q transformers
!pip install -q -U spacy

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m27.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.1/50.1 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.0/45.0 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
en-core-web-sm 3.6.0 requires spacy<3.7.0,>=3.6.0, but you have spacy 3.7.2 which is incompatible.[0m[31m
[0m

## Downloading Spacy Portuguese Pipeline

In [None]:
!python -q -m spacy download pt_core_news_lg

2023-11-29 20:49:45.372081: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-29 20:49:45.372140: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-29 20:49:45.372176: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-29 20:49:45.380157: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-29 20:49:49.668384: I tensorflow/c

## Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import statistics
import transformers
import spacy
import torch
import time
import datetime
import random
import gc

from torch import cuda
from build_dataset import Corpus
from torch.optim import AdamW
from torch.utils.data import DataLoader, TensorDataset, RandomSampler, SequentialSampler
from transformers import BertModel, BertTokenizer, get_linear_schedule_with_warmup
from sklearn import metrics
from sklearn.metrics import accuracy_score, cohen_kappa_score, mean_squared_error

## Configuring use of GPU


In [None]:
# If there's a GPU available...
if cuda.is_available():

    # Tell PyTorch to use the GPU.
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

There are 1 GPU(s) available.
We will use the GPU: Tesla V100-SXM2-16GB


# Loading and Preprocessing Data

## Reading Corpus (Essay-Br)

### Essays

In [None]:
c = Corpus()

train_df, validation_df, test_df = c.read_splits()

train_df.head()

Unnamed: 0,prompt,title,essay,c1,c2,c3,c4,c5,score
0,86,Bem está mental,[É notório que as redes sociais estão cada vez...,120,120,120,160,160,680
1,86,A questão das redes sociais na saúde mental da...,[Com o avanço das tecnologias ao redor do mund...,200,160,160,200,160,880
2,47,Democratização do ensino remoto no Brasil,[A declaração universal dos direitos humanos a...,160,160,160,160,160,800
3,89,"ZZZ (POR FAVOR, IGNOREM O TAMANHO, PRECISO DA ...","[Barão de Itararé, um dos criadores do jornali...",160,160,160,200,200,880
4,40,Um samba enredo que está por vir,"[No carnaval deste ano, a escola de samba Uniã...",160,160,120,200,200,840


### Prompts

In [None]:
prompts_df = pd.read_csv("./extended-corpus/extended_prompts.csv")

prompts_df.head()

Unnamed: 0,id,title,description,category
0,0,Carnaval e apropriação cultural,"['No Carnaval de 2020, veio novamente à tona u...",sociedade e cultura
1,1,Qualificação e o futuro do emprego,['O número de pessoas desempregadas no mundo d...,economia
2,2,Supremo Tribunal Federal e opinião pública,"['Ao longo dos últimos dez anos, o papel do ST...",política
3,3,"Ciência, tecnologia e superação dos limites hu...","['Com o avanço da biotecnologia, da engenharia...",ciência e tecnologia
4,4,Um réu deve ou não ser preso após a condenação...,"['No início deste mês de novembro, o Supremo T...",sociedade e cultura


## Adjusting Dataframes

In [None]:
competence = "c3"

def adjust_columns(dataframe, competence):
    dataframe["label"] = [label//40 for label in dataframe[competence]]
    new_df = dataframe[['prompt', 'essay', 'label']].copy()
    return new_df

preprocessed_train_df = adjust_columns(train_df, competence)
preprocessed_validation_df = adjust_columns(validation_df, competence)
preprocessed_test_df = adjust_columns(test_df, competence)

preprocessed_train_df.head()

Unnamed: 0,prompt,essay,label
0,86,[É notório que as redes sociais estão cada vez...,3
1,86,[Com o avanço das tecnologias ao redor do mund...,4
2,47,[A declaração universal dos direitos humanos a...,4
3,89,"[Barão de Itararé, um dos criadores do jornali...",4
4,40,"[No carnaval deste ano, a escola de samba Uniã...",3


# Preparing the Dataset and Dataloader

## Configuring variables

In [None]:
max_len = 512
batch_size = 16
epochs = 6
learning_rate = 2e-5

checkpoint = "neuralmind/bert-base-portuguese-cased"
tokenizer = BertTokenizer.from_pretrained(checkpoint)

tokenizer_config.json:   0%|          | 0.00/43.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/210k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/647 [00:00<?, ?B/s]

## Creating the datasets for the neural network

### Auxiliary function to format prompts

In [None]:
def prompt_to_string(prompt_texts):

    prompt_str = str(prompt_texts)
    prompt_str = " ".join(prompt_str.split("\', '"))
    prompt_str = "".join(prompt_str.split("[\'"))
    prompt_str = "".join(prompt_str.split("']"))

    return prompt_str

### Load spacy pipeline

In [None]:
nlp = spacy.load("pt_core_news_lg")

### Tokenize function

In [None]:
def create_tokenized_dataset(dataframe, prompts_dataframe):
    # Tokenize all of the sentences and map the tokens to their word IDs.
    input_ids = []
    attention_masks = []
    features = []

    print('Encoding all essays in the dataset...')

    # For every sentence...
    idx = 0
    for essay in dataframe['essay']:
        # Get and Concat Essay and Prompt Title
        essay_str = " ".join(essay)
        prompt_title = prompts_dataframe['title'][dataframe['prompt'][idx]]
        essay_w_prompt = prompt_title + ' [SEP] ' + essay_str

        # Encoding
        encoded_dict = tokenizer.encode_plus(
                            essay_w_prompt,                 # Sentence to encode.
                            add_special_tokens = True,      # Add '[CLS]' and '[SEP]'
                            max_length = max_len,           # Pad & truncate all sentences.
                            truncation = True,
                            padding = 'max_length',
                            return_attention_mask = True,   # Construct attn. masks.
                            return_tensors = 'pt',          # Return pytorch tensors.
                    )


        # Process Essay with Spacy
        text = essay_str
        doc = nlp(text)
        # Process Prompt with Spacy
        prompt_texts = prompt_to_string(prompts_dataframe['description'][dataframe['prompt'][idx]])
        prompt_doc = nlp(prompt_texts)

        #---- FEATURES ----#

        # Number of paragraphs (feat1)
        paragraph_count = np.size(essay)

        # Lexical Diversity (feat2)
        unique_words = np.unique([token.lemma_ for token in doc])
        lexical_diversity = np.size(unique_words)

        # Number of sentences (feat3)
        sentence_count = 0
        for sent in doc.sents:
            sentence_count += 1

        # Prompt-Essay Similarity (feat4)
        p_similarity = doc.similarity(prompt_doc)

        # Entity Labels (feat5)
        entity_labels = []
        for entity in doc.ents:
            entity_labels.append(entity)
        entity_count = np.size(entity_labels)

        # Add features to the list.
        features.append(
            [paragraph_count, lexical_diversity, sentence_count, p_similarity, entity_count]
        )

        # Add the encoded sentence to the list.
        input_ids.append(encoded_dict['input_ids'])

        # And its attention mask (simply differentiates padding from non-padding).
        attention_masks.append(encoded_dict['attention_mask'])

        # Increment Essay idx.
        idx += 1

    # Convert the lists into tensors.
    input_ids = torch.cat(input_ids, dim=0)
    attention_masks = torch.cat(attention_masks, dim=0)
    features = torch.tensor(features)
    labels = torch.tensor(dataframe['label'].tolist())

    return TensorDataset(input_ids, attention_masks, features, labels)

### Create Tokenized Datasets

In [None]:
train_dataset = create_tokenized_dataset(preprocessed_train_df, prompts_df)
val_dataset = create_tokenized_dataset(preprocessed_validation_df, prompts_df)
test_dataset = create_tokenized_dataset(preprocessed_test_df, prompts_df)

Encoding all essays in the dataset...


  p_similarity = doc.similarity(prompt_doc)


Encoding all essays in the dataset...
Encoding all essays in the dataset...


### Creating the dataloader for the neural network

In [None]:
train_dataloader = DataLoader(
            train_dataset,                            # The training samples.
            sampler = RandomSampler(train_dataset),   # Select batches randomly
            batch_size = batch_size                   # Trains with this batch size.
        )

validation_dataloader = DataLoader(
            val_dataset,                              # The validation samples.
            sampler = SequentialSampler(val_dataset), # Pull out batches sequentially.
            batch_size = batch_size                   # Evaluate with this batch size.
        )

# Fine Tuning

## Loading pretrained model

In [None]:
class CustomModel(torch.nn.Module):
    def __init__(self):
        super(CustomModel, self).__init__()
        self.l1 = transformers.BertModel.from_pretrained(checkpoint)
        self.l2 = torch.nn.Dropout(0.3)
        self.l3 = torch.nn.Linear(773, 6)

    def forward(self, ids, attention_mask, features):
        # Get pooled output from BERT layer
        _, output_1 = self.l1(ids, attention_mask, return_dict=False)
        # Concat features with pooled output and apply dropout
        output_2 = self.l2(torch.cat([output_1, features], dim=1))
        # Apply linear transformation
        output = self.l3(output_2)
        return output

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

CustomModel(
  (l1): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(29794, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=

## Defining auxiliary Variables, Functions and Classes

### Loss Fuction

In [None]:
loss_fn = torch.nn.CrossEntropyLoss()

### Accuracy function

In [None]:
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

### Time Format function

In [None]:
def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    # Round to the nearest second.
    elapsed_rounded = int(round((elapsed)))

    # Format as hh:mm:ss
    return str(datetime.timedelta(seconds=elapsed_rounded))

### Class to save best model

In [None]:
class SaveBestModel:
    """
    Class to save the best model while training. If the current epoch's
    validation loss is less than the previous least less, then save the
    model state.
    """
    def __init__(
        self, model_i, best_valid_loss=float('inf')
    ):
        self.best_valid_loss = best_valid_loss
        self.model_i = model_i + 1

    def __call__(
        self, current_valid_loss,
        epoch, model, optimizer
    ):
        if current_valid_loss < self.best_valid_loss:
            self.best_valid_loss = current_valid_loss
            print(f"\nBest validation loss: {self.best_valid_loss}")
            print(f"\nSaving best model for epoch: {epoch+1}\n")
            path = f'./drive/MyDrive/TCC/code/model_third_strategy/best_model_{self.model_i}.pth'
            torch.save({
                'epoch': epoch+1,
                'valid_loss': current_valid_loss,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                }, path)

# Train, Eval and Test Loop

In [None]:
models = 10
metrics = []

In [None]:
for model_i in range(0, models):

    print("")
    print('=====================================')
    print('========== Model {:} / {:} =========='.format(model_i + 1, models))
    print('=====================================')

    # ========================================
    #               Setup
    # ========================================

    # Measure the total training time for the whole run.
    total_t0 = time.time()

    # Initialize SaveBestModel class
    save_best_model = SaveBestModel(model_i)

    # Instantiate model
    model = CustomModel()
    model.to(device)

    # Define optimizer
    optimizer = AdamW(model.parameters(),
              lr = learning_rate,
              eps = 1e-8
            )

    # Total number of training steps is [number of batches] x [number of epochs].
    total_steps = len(train_dataloader) * epochs

    # Create the learning rate scheduler.
    scheduler = get_linear_schedule_with_warmup(optimizer,
                                                num_warmup_steps = 0, # Default value in run_glue.py
                                                num_training_steps = total_steps)


    # For each epoch...
    for epoch_i in range(0, epochs):

        # ========================================
        #               Training
        # ========================================

        # Perform one full pass over the training set.

        print("")
        print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
        print('Training...')

        # Measure how long the training epoch takes.
        t0 = time.time()

        # Reset the total loss for this epoch.
        total_train_loss = 0

        # Put the model into training mode. Don't be mislead--the call to
        # `train` just changes the *mode*, it doesn't *perform* the training.
        # `dropout` and `batchnorm` layers behave differently during training
        # vs. test (source: https://stackoverflow.com/questions/51433378/what-does-model-train-do-in-pytorch)
        model.train()

        # For each batch of training data...
        for step, batch in enumerate(train_dataloader):

            # Progress update every 100 batches.
            if step % 100 == 0 and not step == 0:
                # Calculate elapsed time in minutes.
                elapsed = format_time(time.time() - t0)

                # Report progress.
                print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))

            # Unpack this training batch from our dataloader.
            #
            # As we unpack the batch, we'll also copy each tensor to the GPU using the
            # `to` method.
            #
            # `batch` contains four pytorch tensors:
            #   [0]: input ids
            #   [1]: attention masks
            #   [2]: features
            #   [3]: labels
            b_input_ids = batch[0].to(device)
            b_input_mask = batch[1].to(device)
            b_features = batch[2].to(device)
            b_labels = batch[3].to(device)


            # Always clear any previously calculated gradients before performing a
            # backward pass. PyTorch doesn't do this automatically because
            # accumulating the gradients is "convenient while training RNNs".
            # (source: https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch)
            model.zero_grad()

            # Perform a forward pass (evaluate the model on this training batch).
            # In PyTorch, calling `model` will in turn call the model's `forward`
            # function and pass down the arguments.

            logits = model(b_input_ids,
                          attention_mask=b_input_mask,
                          features=b_features)

            # Calculate Cross-Entropy Loss for this batch.
            loss = loss_fn(logits, b_labels)

            # Accumulate the training loss over all of the batches so that we can
            # calculate the average loss at the end. `loss` is a Tensor containing a
            # single value; the `.item()` function just returns the Python value
            # from the tensor.
            total_train_loss += loss.item()

            # Perform a backward pass to calculate the gradients.
            loss.backward()

            # Clip the norm of the gradients to 1.0.
            # This is to help prevent the "exploding gradients" problem.
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

            # Update parameters and take a step using the computed gradient.
            # The optimizer dictates the "update rule"--how the parameters are
            # modified based on their gradients, the learning rate, etc.
            optimizer.step()

            # Update the learning rate.
            scheduler.step()

        # Calculate the average loss over all of the batches.
        avg_train_loss = total_train_loss / len(train_dataloader)

        # Measure how long this epoch took.
        training_time = format_time(time.time() - t0)

        print("")
        print("  Average training loss: {0:.2f}".format(avg_train_loss))
        print("  Training epoch took: {:}".format(training_time))

        # ========================================
        #               Validation
        # ========================================
        # After the completion of each training epoch, measure our performance on
        # our validation set.

        print("")
        print("Running Validation...")

        t0 = time.time()

        # Put the model in evaluation mode--the dropout layers behave differently
        # during evaluation.
        model.eval()

        # Tracking variables
        total_eval_accuracy = 0
        total_eval_loss = 0
        nb_eval_steps = 0

        # Evaluate data for one epoch
        for batch in validation_dataloader:

            # Unpack this training batch from our dataloader.
            #
            # As we unpack the batch, we'll also copy each tensor to the GPU using
            # the `to` method.
            #
            # `batch` contains four pytorch tensors:
            #   [0]: input ids
            #   [1]: attention masks
            #   [2]: features
            #   [3]: labels
            b_input_ids = batch[0].to(device)
            b_input_mask = batch[1].to(device)
            b_features = batch[2].to(device)
            b_labels = batch[3].to(device)

            # Tell pytorch not to bother with constructing the compute graph during
            # the forward pass, since this is only needed for backprop (training).
            with torch.no_grad():

                # Forward pass, calculate logit predictions. The "logits" are the
                # output values prior to applying an activation function like the
                # softmax.
                logits = model(b_input_ids,
                              attention_mask=b_input_mask,
                              features=b_features)


            # Calculate the loss.
            loss = loss_fn(logits, b_labels)

            # Accumulate the validation loss.
            total_eval_loss += loss.item()

            # Move logits and labels to CPU
            logits = logits.detach().cpu().numpy()
            label_ids = b_labels.to('cpu').numpy()

            # Calculate the accuracy for this batch of test sentences, and
            # accumulate it over all batches.
            total_eval_accuracy += flat_accuracy(logits, label_ids)


        # Report the final accuracy for this validation run.
        avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
        print("  Accuracy: {0:.2f}".format(avg_val_accuracy))

        # Calculate the average loss over all of the batches.
        avg_val_loss = total_eval_loss / len(validation_dataloader)

        # Measure how long the validation run took.
        validation_time = format_time(time.time() - t0)

        print("  Validation Loss: {0:.2f}".format(avg_val_loss))
        print("  Validation took: {:}".format(validation_time))

        # Save the best model till now if we have the least loss in the current epoch.
        save_best_model(
            avg_val_loss, epoch_i, model, optimizer
        )

    print("")
    print("Training complete!")

    print("Total training took {:} (h:mm:ss)".format(format_time(time.time()-total_t0)))


    # ========================================
    #              Loading Model
    # ========================================

    # Load best model
    model_idx = model_i + 1
    print("")
    print(f"Loading best model for model {model_idx}")

    path = f'./drive/MyDrive/TCC/code/model_third_strategy/best_model_{model_idx}.pth'
    best_model_cp = torch.load(path)
    best_model_epoch = best_model_cp['epoch']
    best_model_valid_loss = best_model_cp['valid_loss']
    print(f"  Best model was saved at {best_model_epoch} epochs\n")

    model = CustomModel()
    model.load_state_dict(best_model_cp['model_state_dict'])

    model.to(device)


    # ========================================
    #                 Testing
    # ========================================

    print("")
    print("Running Test...")

    # Create a DataLoader to batch our test samples for us. We'll use a sequential
    # sampler this time--don't need this to be random!
    prediction_sampler = SequentialSampler(test_dataset)
    prediction_dataloader = DataLoader(test_dataset, sampler=prediction_sampler, batch_size=batch_size)

    # Put model in evaluation mode
    model.eval()

    # Tracking variables
    predictions , true_labels = [], []

    # Predict
    for batch in prediction_dataloader:
      # Add batch to GPU
      batch = tuple(t.to(device) for t in batch)

      # Unpack the inputs from our dataloader
      b_input_ids, b_input_mask, b_features, b_labels = batch

      # Telling the model not to compute or store gradients, saving memory and
      # speeding up prediction
      with torch.no_grad():
          # Forward pass, calculate logit predictions.
          logits = model(b_input_ids,
                        attention_mask=b_input_mask,
                        features=b_features)

      # Move logits and labels to CPU
      logits = logits.detach().cpu().numpy()
      label_ids = b_labels.to('cpu').numpy()

      # Store predictions and true labels
      predictions.append(logits)
      true_labels.append(label_ids)

    # ========================================
    #                 Metrics
    # ========================================

    # Combine the results across all batches.
    flat_predictions = np.concatenate(predictions, axis=0)

    # For each sample, pick the label (0 or 1) with the higher score.
    flat_predictions = np.argmax(flat_predictions, axis=1).flatten()

    # Combine the correct labels for each batch into a single list.
    flat_true_labels = np.concatenate(true_labels, axis=0)

    # Calculate acuracy
    accuracy = accuracy_score(flat_true_labels, flat_predictions)

    # Calculate the QWK
    qwk = cohen_kappa_score(flat_true_labels, flat_predictions, weights='quadratic')

    # Calculate the RMSE
    rmse = mean_squared_error(flat_true_labels, flat_predictions, squared=False)

    # Calculate the Horizontal Discrepancy
    hd = np.sum(abs(flat_predictions - flat_true_labels) > 2) / len(flat_true_labels)

    print('  Accuracy Score: %.4f' % accuracy)
    print('  Quadratic Weighted Kappa Score: %.4f' % qwk)
    print('  Rooted Mean Squared Error Score: %.4f' % rmse)
    print('  Horizontal Discrepancy: %.4f' % hd)

    # Record metrics for this model.
    metrics.append(
        {
            'Model': model_i + 1,
            'Best Epoch': best_model_epoch,
            'Valid. Loss': best_model_valid_loss,
            'Accuracy': accuracy,
            'QWK': qwk,
            'RMSE': rmse,
            'Horizontal Discrepancy': hd
        }
    )

    # Delete variables to free memory space
    optimizer.zero_grad(set_to_none=True)
    model.to(torch.device("cpu"))
    model = None
    del model
    del b_input_ids
    del b_input_mask
    del b_features
    del b_labels
    del logits
    del predictions
    del true_labels
    del batch

    gc.collect()
    torch.cuda.empty_cache()




Training...
  Batch   100  of    288.    Elapsed: 0:00:48.
  Batch   200  of    288.    Elapsed: 0:01:34.

  Average training loss: 1.42
  Training epoch took: 0:02:14

Running Validation...
  Accuracy: 0.51
  Validation Loss: 1.16
  Validation took: 0:00:09

Best validation loss: 1.1638759086208958

Saving best model for epoch: 1


Training...
  Batch   100  of    288.    Elapsed: 0:00:46.
  Batch   200  of    288.    Elapsed: 0:01:31.

  Average training loss: 1.04
  Training epoch took: 0:02:11

Running Validation...
  Accuracy: 0.54
  Validation Loss: 1.15
  Validation took: 0:00:09

Best validation loss: 1.1506968025238282

Saving best model for epoch: 2


Training...
  Batch   100  of    288.    Elapsed: 0:00:46.
  Batch   200  of    288.    Elapsed: 0:01:31.

  Average training loss: 0.85
  Training epoch took: 0:02:11

Running Validation...
  Accuracy: 0.56
  Validation Loss: 1.12
  Validation took: 0:00:09

Best validation loss: 1.115078051244059

Saving best model for epoch

# Metrics Table

In [None]:
# Create a DataFrame from our models metrics.
df_metrics = pd.DataFrame(data=metrics)

# Use the 'Model' as the row index.
df_metrics = df_metrics.set_index('Model')

# Display the table.
df_metrics

Unnamed: 0_level_0,Best Epoch,Valid. Loss,Accuracy,QWK,RMSE,Horizontal Discrepancy
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3,1.115078,0.5846,0.55678,0.795122,0.012158
2,2,1.05392,0.566363,0.467413,0.798935,0.008105
3,4,1.286429,0.587639,0.567957,0.765254,0.006079
4,3,1.038534,0.601824,0.564431,0.762601,0.009119
5,2,1.11511,0.607903,0.565647,0.757937,0.007092
6,2,1.280622,0.545086,0.409879,0.831254,0.007092
7,3,1.34002,0.624113,0.550536,0.769215,0.007092
8,2,1.196723,0.597771,0.568188,0.758605,0.007092
9,3,1.048505,0.598784,0.565514,0.760606,0.009119
10,3,1.21735,0.598784,0.594388,0.757268,0.007092


# Final Metrics

In [None]:
# Accuracy
mean_accuracy = df_metrics['Accuracy'].mean()
std_accuracy = df_metrics['Accuracy'].std()

# Quadratic Weighted Kappa
mean_qwk = df_metrics['QWK'].mean()
std_qwk = df_metrics['QWK'].std()

# Root Mean Score Error
mean_rmse = df_metrics['RMSE'].mean()
std_rmse = df_metrics['RMSE'].std()

# Horizontal Discrepancy
mean_hd = df_metrics['Horizontal Discrepancy'].mean()
std_hd = df_metrics['Horizontal Discrepancy'].std()

print('Mean Accuracy: ', mean_accuracy)
print('Standard Deviation Accuracy: ', std_accuracy)
print('-----------------------------------------------------------------')
print('Mean QWK: ', mean_qwk)
print('Standard Deviation QWK: ', std_qwk)
print('-----------------------------------------------------------------')
print('Mean RMSE: ', mean_rmse)
print('Standard Deviation RMSE: ', std_rmse)
print('-----------------------------------------------------------------')
print('Mean Horizontal Discrepancy: ', mean_hd)
print('Standard Deviation Horizontal Discrepancy: ', std_hd)

Mean Accuracy:  0.5912867274569402
Standard Deviation Accuracy:  0.022193358379270882
-----------------------------------------------------------------
Mean QWK:  0.5410733610609674
Standard Deviation QWK:  0.056785248455887616
-----------------------------------------------------------------
Mean RMSE:  0.7756795944226156
Standard Deviation RMSE:  0.02472388618833528
-----------------------------------------------------------------
Mean Horizontal Discrepancy:  0.008004052684903748
Standard Deviation Horizontal Discrepancy:  0.001751611277256324
