This notebook is written based on [this reference implementation](https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/6%20-%20Transformers%20for%20Sentiment%20Analysis.ipynb).

Other refs for model:
* https://stackoverflow.com/questions/65205582/how-can-i-add-a-bi-lstm-layer-on-top-of-bert-model
* https://discuss.pytorch.org/t/how-to-connect-hook-two-or-even-more-models-together/21033
* https://pytorch.org/tutorials/beginner/transformer_tutorial.html
* https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html

Other refs for torchtext:
* https://towardsdatascience.com/use-torchtext-to-load-nlp-datasets-part-i-5da6f1c89d84
* https://towardsdatascience.com/use-torchtext-to-load-nlp-datasets-part-ii-f146c8b9a496
* http://anie.me/On-Torchtext/

# Imports and setup

In [1]:
import pandas as pd
import numpy as np
import os
import random
# random.seed(1)
import re

# Data processing.
import constants # constants.py
import dataset # dataset.py
import torch

# Model.
import models # models.py
import torch.nn as nn
from transformers import DistilBertModel

# Training.
import training # training.py
import utils # utils.py

# If you make a code change that doesn't get picked up by
# Jupyter notebook, try reloading like below:
# import imp
# imp.reload(training)

# Read the data
Skip this section if you've already ran the notebook once and have the csvs locally.

In [3]:
# data_df = dataset.read_multiple_datasets([1,2,3], 'Creativity_Combined', shuffle=True)

In [4]:
'''This cell is commented out because the csvs should already exist in the directory.
If you are running the notebook for the first time, run them to generate the csvs.'''
# split into train, test sets. (Train set will be further split into 
# train+validation sets, via k-fold CV.)
# train_df = data_df[:1000]
# test_df = data_df[1000:] # roughly 203 test examples set aside

# write them to CSV files
# train_df.to_csv('ktrain.csv', index=False, header=False)
# test_df.to_csv('ktest.csv', index=False, header=False)

## Preprocessing and transform into torchtext Dataset format.

From what I understand, some preprocessing is done when data.Field() is applied.

In [2]:
train_dataset, test_dataset = dataset.get_train_test_datasets()

In [3]:
# Transform train_dataset into an np array representation.
# This will be used for generating the K folds.
train_exs_arr = np.array(train_dataset.examples)

# Training pipeline begins here


In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

param_grid = {
    'dropout': [0.2],
    'batch_size': [8],
    'max_epochs': [10],
    'lr': [3e-05, 5e-05, 1e-04],
}

results, best_model = training.perform_hyperparameter_search(param_grid, train_exs_arr, save_weights=True)
print(best_model)

'''commented out portion below which ran a single experiment'''
# params = {
#     'dropout': 0.2,
#     'batch_size': 8,
#     'max_epochs': 10,
#     'lr': 5e-05
# }

#valid_corrs = training.launch_experiment(1, train_exs_arr, params, save_weights=True)
#print('validation correlations: {}'.format(valid_corrs))

eid 0, params {'batch_size': 8, 'dropout': 0.2, 'lr': 3e-05, 'max_epochs': 10}




training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 11.078 | Train Corr: 0.17
	 Val. Loss: 3.977 |  Val. Corr: 0.52
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 47s
	 Train Loss: 3.008 | Train Corr: 0.60
	 Val. Loss: 3.307 |  Val. Corr: 0.58
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 2.283 | Train Corr: 0.71
	 Val. Loss: 3.102 |  Val. Corr: 0.60
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 47s
	 Train Loss: 1.622 | Train Corr: 0.81
	 Val. Loss: 2.930 |  Val. Corr: 0.59
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 47s
	 Train Loss: 1.249 | Train Corr: 0.86
	 Val. Loss: 2.710 |  Val. Corr: 0.62
Epoch: 05 | Epoch Time: 0m 47s
	 Train Loss: 0.925 | Train Corr: 0.90
	 Val. Loss: 3.054 |  Val. Corr: 0.63
Epoch: 06 | Epoch Time: 0m 47s
	 Train Loss: 0.723 | Train Corr: 0.92
	 Val. Loss: 2.802 |  Val. Corr: 0.64
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 10.803 | Train Corr: 0.14
	 Val. Loss: 5.215 |  Val. Corr: 0.62
Epoch: 01 | Epoch Time: 0m 48s
	 Train Loss: 2.846 | Train Corr: 0.61
	 Val. Loss: 3.015 |  Val. Corr: 0.67
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 2.303 | Train Corr: 0.70
	 Val. Loss: 3.825 |  Val. Corr: 0.68
Epoch: 03 | Epoch Time: 0m 47s
	 Train Loss: 1.685 | Train Corr: 0.79
	 Val. Loss: 2.956 |  Val. Corr: 0.69
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 48s
	 Train Loss: 1.067 | Train Corr: 0.87
	 Val. Loss: 2.618 |  Val. Corr: 0.70
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 48s
	 Train Loss: 0.806 | Train Corr: 0.91
	 Val. Loss: 2.565 |  Val. Corr: 0.69
Epoch: 06 | Epoch Time: 0m 48s
	 Train Loss: 0.695 | Train Corr: 0.92
	 Val. Loss: 2.805 |  Val. Corr: 0.68
Epoch: 07 | Epoch Time: 0m 48s
	 Train Loss: 0.622 | Train Corr: 0.93
	 Val. Loss: 2.881 |  Val. Corr: 0.70
Epoch: 08 | Epoch Time: 0m 47s
	 Train Loss: 0.486 | Train Co

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 13.101 | Train Corr: 0.08
	 Val. Loss: 5.485 |  Val. Corr: 0.50
Epoch: 01 | Epoch Time: 0m 47s
	 Train Loss: 3.579 | Train Corr: 0.51
	 Val. Loss: 4.100 |  Val. Corr: 0.63
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 2.954 | Train Corr: 0.62
	 Val. Loss: 2.599 |  Val. Corr: 0.78
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 46s
	 Train Loss: 2.259 | Train Corr: 0.72
	 Val. Loss: 1.702 |  Val. Corr: 0.80
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 47s
	 Train Loss: 1.550 | Train Corr: 0.82
	 Val. Loss: 1.541 |  Val. Corr: 0.79
Epoch: 05 | Epoch Time: 0m 48s
	 Train Loss: 1.127 | Train Corr: 0.87
	 Val. Loss: 1.549 |  Val. Corr: 0.79
Epoch: 06 | Epoch Time: 0m 46s
	 Train Loss: 0.892 | Train Corr: 0.90
	 Val. Loss: 1.614 |  Val. Corr: 0.81
Epoch: 07 | Epoch Time: 0m 47s
	 Train Loss: 0.643 | Train Corr: 0.93
	 Val. Loss: 1.568 |  Val. Corr: 0.80
updating saved weights of best model
Epoch: 08 | Epoch Time: 

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 45s
	 Train Loss: 10.451 | Train Corr: 0.17
	 Val. Loss: 3.196 |  Val. Corr: 0.66
Epoch: 01 | Epoch Time: 0m 45s
	 Train Loss: 2.782 | Train Corr: 0.63
	 Val. Loss: 4.156 |  Val. Corr: 0.75
Epoch: 02 | Epoch Time: 0m 45s
	 Train Loss: 2.020 | Train Corr: 0.75
	 Val. Loss: 2.286 |  Val. Corr: 0.76
Epoch: 03 | Epoch Time: 0m 45s
	 Train Loss: 1.557 | Train Corr: 0.81
	 Val. Loss: 2.425 |  Val. Corr: 0.77
Epoch: 04 | Epoch Time: 0m 45s
	 Train Loss: 1.087 | Train Corr: 0.87
	 Val. Loss: 1.909 |  Val. Corr: 0.78
Epoch: 05 | Epoch Time: 0m 46s
	 Train Loss: 0.777 | Train Corr: 0.91
	 Val. Loss: 1.882 |  Val. Corr: 0.78
Epoch: 06 | Epoch Time: 0m 45s
	 Train Loss: 0.621 | Train Corr: 0.93
	 Val. Loss: 2.020 |  Val. Corr: 0.76
Epoch: 07 | Epoch Time: 0m 47s
	 Train Loss: 0.556 | Train Corr: 0.94
	 Val. Loss: 2.735 |  Val. Corr: 0.78
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 44s
	 Train Loss: 9.418 | Train Corr: 0.18
	 Val. Loss: 3.601 |  Val. Corr: 0.55
Epoch: 01 | Epoch Time: 0m 44s
	 Train Loss: 3.305 | Train Corr: 0.53
	 Val. Loss: 3.779 |  Val. Corr: 0.62
Epoch: 02 | Epoch Time: 0m 44s
	 Train Loss: 2.768 | Train Corr: 0.63
	 Val. Loss: 3.040 |  Val. Corr: 0.62
Epoch: 03 | Epoch Time: 0m 43s
	 Train Loss: 1.986 | Train Corr: 0.75
	 Val. Loss: 2.570 |  Val. Corr: 0.68
Epoch: 04 | Epoch Time: 0m 43s
	 Train Loss: 1.432 | Train Corr: 0.83
	 Val. Loss: 2.662 |  Val. Corr: 0.66
Epoch: 05 | Epoch Time: 0m 44s
	 Train Loss: 1.007 | Train Corr: 0.88
	 Val. Loss: 2.806 |  Val. Corr: 0.69
Epoch: 06 | Epoch Time: 0m 42s
	 Train Loss: 0.750 | Train Corr: 0.91
	 Val. Loss: 3.471 |  Val. Corr: 0.67
Epoch: 07 | Epoch Time: 0m 44s
	 Train Loss: 0.578 | Train Corr: 0.94
	 Val. Loss: 2.556 |  Val. Corr: 0.69
eid 1, params {'batch_size': 8, 'dropout': 0.2, 'lr': 5e-05, 'max_epochs': 10}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 8.656 | Train Corr: 0.22
	 Val. Loss: 4.411 |  Val. Corr: 0.36
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 47s
	 Train Loss: 3.104 | Train Corr: 0.58
	 Val. Loss: 4.239 |  Val. Corr: 0.49
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 2.160 | Train Corr: 0.74
	 Val. Loss: 3.174 |  Val. Corr: 0.56
Epoch: 03 | Epoch Time: 0m 47s
	 Train Loss: 1.493 | Train Corr: 0.83
	 Val. Loss: 3.256 |  Val. Corr: 0.56
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 9.797 | Train Corr: 0.20
	 Val. Loss: 6.622 |  Val. Corr: 0.41
Epoch: 01 | Epoch Time: 0m 48s
	 Train Loss: 3.222 | Train Corr: 0.54
	 Val. Loss: 6.233 |  Val. Corr: 0.60
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 2.521 | Train Corr: 0.67
	 Val. Loss: 4.677 |  Val. Corr: 0.59
Epoch: 03 | Epoch Time: 0m 47s
	 Train Loss: 1.750 | Train Corr: 0.78
	 Val. Loss: 4.512 |  Val. Corr: 0.66
Epoch: 04 | Epoch Time: 0m 48s
	 Train Loss: 1.025 | Train Corr: 0.88
	 Val. Loss: 4.316 |  Val. Corr: 0.69
updating saved weights of best model
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 10.129 | Train Corr: 0.18
	 Val. Loss: 4.922 |  Val. Corr: 0.42
Epoch: 01 | Epoch Time: 0m 47s
	 Train Loss: 2.986 | Train Corr: 0.61
	 Val. Loss: 4.141 |  Val. Corr: 0.77
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 2.267 | Train Corr: 0.73
	 Val. Loss: 2.824 |  Val. Corr: 0.75
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 46s
	 Train Loss: 1.437 | Train Corr: 0.84
	 Val. Loss: 1.725 |  Val. Corr: 0.77
Epoch: 04 | Epoch Time: 0m 47s
	 Train Loss: 1.002 | Train Corr: 0.89
	 Val. Loss: 1.789 |  Val. Corr: 0.78
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 48s
	 Train Loss: 0.689 | Train Corr: 0.92
	 Val. Loss: 1.613 |  Val. Corr: 0.80
updating saved weights of best model
Epoch: 06 | Epoch Time: 0m 46s
	 Train Loss: 0.604 | Train Corr: 0.93
	 Val. Loss: 1.500 |  Val. Corr: 0.80
Epoch: 07 | Epoch Time: 0m 47s
	 Train Loss: 0.488 | Train Corr: 0.95
	 Val. Loss: 2.143 |  Val

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 45s
	 Train Loss: 9.599 | Train Corr: 0.13
	 Val. Loss: 3.957 |  Val. Corr: 0.58
Epoch: 01 | Epoch Time: 0m 45s
	 Train Loss: 3.423 | Train Corr: 0.51
	 Val. Loss: 6.089 |  Val. Corr: 0.62
Epoch: 02 | Epoch Time: 0m 45s
	 Train Loss: 2.693 | Train Corr: 0.65
	 Val. Loss: 2.714 |  Val. Corr: 0.77
Epoch: 03 | Epoch Time: 0m 45s
	 Train Loss: 1.865 | Train Corr: 0.77
	 Val. Loss: 2.262 |  Val. Corr: 0.77
Epoch: 04 | Epoch Time: 0m 46s
	 Train Loss: 1.258 | Train Corr: 0.85
	 Val. Loss: 2.087 |  Val. Corr: 0.76
Epoch: 05 | Epoch Time: 0m 46s
	 Train Loss: 0.895 | Train Corr: 0.90
	 Val. Loss: 2.096 |  Val. Corr: 0.77
Epoch: 06 | Epoch Time: 0m 45s
	 Train Loss: 0.659 | Train Corr: 0.93
	 Val. Loss: 2.196 |  Val. Corr: 0.76
Epoch: 07 | Epoch Time: 0m 47s
	 Train Loss: 0.572 | Train Corr: 0.94
	 Val. Loss: 2.330 |  Val. Corr: 0.75
Epoch: 08 | Epoch Time: 0m 45s
	 Train Loss: 0.630 | Train Corr: 0.93
	 Val. Loss: 1.960 |  Val. Corr: 0.78
Epoch: 09 | Epoch Time: 0m 4

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 44s
	 Train Loss: 7.944 | Train Corr: 0.22
	 Val. Loss: 4.363 |  Val. Corr: 0.35
Epoch: 01 | Epoch Time: 0m 44s
	 Train Loss: 3.065 | Train Corr: 0.58
	 Val. Loss: 4.238 |  Val. Corr: 0.60
Epoch: 02 | Epoch Time: 0m 44s
	 Train Loss: 2.228 | Train Corr: 0.72
	 Val. Loss: 3.405 |  Val. Corr: 0.53
Epoch: 03 | Epoch Time: 0m 43s
	 Train Loss: 1.565 | Train Corr: 0.81
	 Val. Loss: 2.626 |  Val. Corr: 0.67
Epoch: 04 | Epoch Time: 0m 43s
	 Train Loss: 1.102 | Train Corr: 0.87
	 Val. Loss: 2.588 |  Val. Corr: 0.67
Epoch: 05 | Epoch Time: 0m 44s
	 Train Loss: 0.842 | Train Corr: 0.90
	 Val. Loss: 2.968 |  Val. Corr: 0.61
Epoch: 06 | Epoch Time: 0m 42s
	 Train Loss: 0.726 | Train Corr: 0.92
	 Val. Loss: 3.142 |  Val. Corr: 0.65
Epoch: 07 | Epoch Time: 0m 44s
	 Train Loss: 0.525 | Train Corr: 0.94
	 Val. Loss: 2.612 |  Val. Corr: 0.68
eid 2, params {'batch_size': 8, 'dropout': 0.2, 'lr': 0.0001, 'max_epochs': 10}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 6.749 | Train Corr: 0.28
	 Val. Loss: 4.333 |  Val. Corr: 0.39
Epoch: 01 | Epoch Time: 0m 47s
	 Train Loss: 2.838 | Train Corr: 0.64
	 Val. Loss: 4.805 |  Val. Corr: 0.57
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 1.792 | Train Corr: 0.79
	 Val. Loss: 3.385 |  Val. Corr: 0.57
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 7.472 | Train Corr: 0.23
	 Val. Loss: 7.835 |  Val. Corr: 0.32
Epoch: 01 | Epoch Time: 0m 48s
	 Train Loss: 3.403 | Train Corr: 0.52
	 Val. Loss: 6.256 |  Val. Corr: 0.61
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 2.305 | Train Corr: 0.70
	 Val. Loss: 6.737 |  Val. Corr: 0.54
Epoch: 03 | Epoch Time: 0m 47s
	 Train Loss: 1.885 | Train Corr: 0.77
	 Val. Loss: 5.129 |  Val. Corr: 0.63
Epoch: 04 | Epoch Time: 0m 48s
	 Train Loss: 1.162 | Train Corr: 0.86
	 Val. Loss: 4.746 |  Val. Corr: 0.67
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 48s
	 Train Loss: 0.927 | Train Corr: 0.89
	 Val. Loss: 2.608 |  Val. Corr: 0.68
Epoch: 06 | Epoch Time: 0m 48s
	 Train Loss: 0.696 | Train Corr: 0.92
	 Val. Loss: 3.952 |  Val. Corr: 0.66
Epoch: 07 | Epoch Time: 0m 48s
	 Train Loss: 0.585 | Train Corr: 0.93
	 Val. Loss: 5.021 |  Val. Corr: 0.65
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 47s
	 Train Loss: 7.371 | Train Corr: 0.24
	 Val. Loss: 4.631 |  Val. Corr: 0.36
Epoch: 01 | Epoch Time: 0m 47s
	 Train Loss: 3.411 | Train Corr: 0.54
	 Val. Loss: 4.258 |  Val. Corr: 0.73
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 47s
	 Train Loss: 2.694 | Train Corr: 0.66
	 Val. Loss: 2.460 |  Val. Corr: 0.70
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 46s
	 Train Loss: 1.984 | Train Corr: 0.77
	 Val. Loss: 2.407 |  Val. Corr: 0.71
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 47s
	 Train Loss: 1.186 | Train Corr: 0.87
	 Val. Loss: 2.397 |  Val. Corr: 0.77
Epoch: 05 | Epoch Time: 0m 48s
	 Train Loss: 0.944 | Train Corr: 0.90
	 Val. Loss: 2.443 |  Val. Corr: 0.72
Epoch: 06 | Epoch Time: 0m 46s
	 Train Loss: 0.639 | Train Corr: 0.93
	 Val. Loss: 2.436 |  Val. Corr: 0.77
updating saved weights of best model
Epoch: 07 | Epoch Time: 0m 47s
	 Train Loss: 0.512 | Train Corr: 0.95
	 Val. Loss: 1.966 |  Val.

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 45s
	 Train Loss: 8.091 | Train Corr: 0.20
	 Val. Loss: 4.655 |  Val. Corr: 0.46
Epoch: 01 | Epoch Time: 0m 45s
	 Train Loss: 3.149 | Train Corr: 0.56
	 Val. Loss: 4.086 |  Val. Corr: 0.70
Epoch: 02 | Epoch Time: 0m 45s
	 Train Loss: 2.437 | Train Corr: 0.69
	 Val. Loss: 2.761 |  Val. Corr: 0.73
Epoch: 03 | Epoch Time: 0m 45s
	 Train Loss: 1.685 | Train Corr: 0.80
	 Val. Loss: 2.364 |  Val. Corr: 0.75
Epoch: 04 | Epoch Time: 0m 46s
	 Train Loss: 1.246 | Train Corr: 0.85
	 Val. Loss: 2.115 |  Val. Corr: 0.76
Epoch: 05 | Epoch Time: 0m 46s
	 Train Loss: 0.864 | Train Corr: 0.90
	 Val. Loss: 2.323 |  Val. Corr: 0.76
Epoch: 06 | Epoch Time: 0m 45s
	 Train Loss: 0.695 | Train Corr: 0.92
	 Val. Loss: 2.288 |  Val. Corr: 0.74
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 43s
	 Train Loss: 7.029 | Train Corr: 0.24
	 Val. Loss: 4.627 |  Val. Corr: 0.40
Epoch: 01 | Epoch Time: 0m 44s
	 Train Loss: 3.538 | Train Corr: 0.50
	 Val. Loss: 5.197 |  Val. Corr: 0.45
Epoch: 02 | Epoch Time: 0m 44s
	 Train Loss: 2.793 | Train Corr: 0.63
	 Val. Loss: 3.898 |  Val. Corr: 0.55
Epoch: 03 | Epoch Time: 0m 43s
	 Train Loss: 1.780 | Train Corr: 0.78
	 Val. Loss: 3.009 |  Val. Corr: 0.62
Epoch: 04 | Epoch Time: 0m 43s
	 Train Loss: 1.293 | Train Corr: 0.85
	 Val. Loss: 3.686 |  Val. Corr: 0.65
('batch_size_8; dropout_0.2; lr_3e-05; max_epochs_10', 0.7249703158036596)


'commented out portion below which ran a single experiment'

# Test the trained model on held-out dataset.

In [6]:
# Get a test iterator
test_iterator = training.get_iterator(test_dataset, 8, device)

In [8]:
# load the best model saved
bert = DistilBertModel.from_pretrained(constants.WEIGHTS_NAME)
model = models.BERTLinear(bert, constants.OUTPUT_DIM, 0.2)
model.load_state_dict(torch.load("1_best_valid_loss.pt"))
model.to(device)
model.eval()
# If you change the criterion, make sure it matches with the training criterion in training.py
criterion = nn.MSELoss(size_average=False)
criterion = criterion.to(device)
test_loss, test_corr = training.evaluate(model, test_iterator, criterion)
print(test_loss)
print(test_corr)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


2.5071848928928375
0.6903010024542606


# Misc other stuff

Link to the trainer class: https://huggingface.co/transformers/main_classes/trainer.html



Default training arguments: https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments

Batch size per device: 8

Epoch: 3



This should be the model I used to generate my initial results: https://huggingface.co/transformers/model_doc/distilbert.html#distilbertforsequenceclassification
"DistilBert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks."