This notebook is written based on [this reference implementation](https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/6%20-%20Transformers%20for%20Sentiment%20Analysis.ipynb).

Other refs for model:
* https://stackoverflow.com/questions/65205582/how-can-i-add-a-bi-lstm-layer-on-top-of-bert-model
* https://discuss.pytorch.org/t/how-to-connect-hook-two-or-even-more-models-together/21033
* https://pytorch.org/tutorials/beginner/transformer_tutorial.html
* https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html

Other refs for torchtext:
* https://towardsdatascience.com/use-torchtext-to-load-nlp-datasets-part-i-5da6f1c89d84
* https://towardsdatascience.com/use-torchtext-to-load-nlp-datasets-part-ii-f146c8b9a496
* http://anie.me/On-Torchtext/

# Imports and setup

In [1]:
import pandas as pd
import numpy as np
import os
import random
# random.seed(1)
import re

# Data processing.
import constants # constants.py
import dataset # dataset.py
import torch

# Model.
import models # models.py
import torch.nn as nn
from transformers import DistilBertModel

# Training.
import training # training.py
import utils # utils.py

# If you make a code change that doesn't get picked up by
# Jupyter notebook, try reloading like below:
# import imp
# imp.reload(training)

# Read the data

In [3]:
# data_df = dataset.get_multiple_datasets([1,2,3], 'Creativity_Combined', shuffle=True)

In [4]:
'''This cell is commented out because the csvs should already exist in the directory.
If you are running the notebook for the first time, run them to generate the csvs.'''
# split into train, test sets. (Train set will be further split into 
# train+validation sets, via k-fold CV.)
# train_df = data_df[:1000]
# test_df = data_df[1000:] # roughly 203 test examples set aside

# write them to CSV files
# train_df.to_csv('ktrain.csv', index=False, header=False)
# test_df.to_csv('ktest.csv', index=False, header=False)

## Preprocessing and transform into torchtext Dataset format.

From what I understand, some preprocessing is done when data.Field() is applied.

In [2]:
train_dataset, test_dataset = dataset.get_train_test_datasets()

In [3]:
# Transform train_dataset into an np array representation.
# This will be used for generating the K folds.
train_exs_arr = np.array(train_dataset.examples)

# Training pipeline begins here


In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

param_grid = {
    'dropout': [0.2],
    'batch_size': [8],
    'max_epochs': [10],
    'lr': [5e-05],
    'hidden_dim': [128,256,512,768],
    'num_layers': [1,2,3],
    'bidirectional': [False,True],
}

results, best_model = training.perform_hyperparameter_search(param_grid, train_exs_arr, rnn=True, save_weights=True)
print(best_model)

eid 0, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 128, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 1}




training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 52s
	 Train Loss: 25.232 | Train Corr: 0.03
	 Val. Loss: 5.094 |  Val. Corr: 0.41
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 52s
	 Train Loss: 5.509 | Train Corr: 0.04
	 Val. Loss: 4.337 |  Val. Corr: 0.44
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 52s
	 Train Loss: 4.452 | Train Corr: 0.36
	 Val. Loss: 4.108 |  Val. Corr: 0.42
Epoch: 03 | Epoch Time: 0m 52s
	 Train Loss: 3.905 | Train Corr: 0.47
	 Val. Loss: 4.680 |  Val. Corr: 0.45
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 51s
	 Train Loss: 3.407 | Train Corr: 0.57
	 Val. Loss: 3.048 |  Val. Corr: 0.60
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 52s
	 Train Loss: 2.683 | Train Corr: 0.68
	 Val. Loss: 2.912 |  Val. Corr: 0.63
updating saved weights of best model
Epoch: 06 | Epoch Time: 0m 52s
	 Train Loss: 2.020 | Train Corr: 0.77
	 Val. Loss: 2.753 |  Val. Corr: 0.62
Epoch: 07 | Epoch Tim

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 52s
	 Train Loss: 41.768 | Train Corr: -0.01
	 Val. Loss: 9.163 |  Val. Corr: 0.42
Epoch: 01 | Epoch Time: 0m 52s
	 Train Loss: 7.213 | Train Corr: -0.02
	 Val. Loss: 4.748 |  Val. Corr: 0.42
Epoch: 02 | Epoch Time: 0m 52s
	 Train Loss: 5.501 | Train Corr: -0.01
	 Val. Loss: 4.917 |  Val. Corr: 0.20
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 52s
	 Train Loss: 32.702 | Train Corr: 0.00
	 Val. Loss: 6.692 |  Val. Corr: 0.47
Epoch: 01 | Epoch Time: 0m 52s
	 Train Loss: 6.006 | Train Corr: 0.04
	 Val. Loss: 4.002 |  Val. Corr: 0.39
Epoch: 02 | Epoch Time: 0m 52s
	 Train Loss: 4.967 | Train Corr: 0.23
	 Val. Loss: 4.119 |  Val. Corr: 0.41
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 51s
	 Train Loss: 27.860 | Train Corr: 0.01
	 Val. Loss: 6.452 |  Val. Corr: 0.41
Epoch: 01 | Epoch Time: 0m 50s
	 Train Loss: 5.426 | Train Corr: 0.07
	 Val. Loss: 4.465 |  Val. Corr: 0.36
Epoch: 02 | Epoch Time: 0m 51s
	 Train Loss: 4.295 | Train Corr: 0.38
	 Val. Loss: 4.169 |  Val. Corr: 0.55
Epoch: 03 | Epoch Time: 0m 51s
	 Train Loss: 3.716 | Train Corr: 0.47
	 Val. Loss: 4.325 |  Val. Corr: 0.56
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 49s
	 Train Loss: 29.726 | Train Corr: -0.01
	 Val. Loss: 7.273 |  Val. Corr: 0.43
Epoch: 01 | Epoch Time: 0m 49s
	 Train Loss: 5.450 | Train Corr: 0.05
	 Val. Loss: 4.650 |  Val. Corr: 0.29
Epoch: 02 | Epoch Time: 0m 48s
	 Train Loss: 4.143 | Train Corr: 0.41
	 Val. Loss: 4.189 |  Val. Corr: 0.47
eid 1, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 128, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 2}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 68.571 | Train Corr: 0.01
	 Val. Loss: 20.921 |  Val. Corr: 0.45
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 8.987 | Train Corr: -0.05
	 Val. Loss: 4.588 |  Val. Corr: -0.25
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 53s
	 Train Loss: 4.930 | Train Corr: 0.23
	 Val. Loss: 3.597 |  Val. Corr: 0.49
Epoch: 03 | Epoch Time: 0m 53s
	 Train Loss: 4.356 | Train Corr: 0.36
	 Val. Loss: 4.699 |  Val. Corr: 0.08
Epoch: 04 | Epoch Time: 0m 52s
	 Train Loss: 3.884 | Train Corr: 0.46
	 Val. Loss: 3.932 |  Val. Corr: 0.36
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 65.979 | Train Corr: -0.02
	 Val. Loss: 11.897 |  Val. Corr: 0.46
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 6.934 | Train Corr: -0.06
	 Val. Loss: 4.814 |  Val. Corr: 0.43
Epoch: 02 | Epoch Time: 0m 53s
	 Train Loss: 5.214 | Train Corr: 0.08
	 Val. Loss: 4.962 |  Val. Corr: 0.42
Epoch: 03 | Epoch Time: 0m 54s
	 Train Loss: 4.393 | Train Corr: 0.33
	 Val. Loss: 4.023 |  Val. Corr: 0.60
Epoch: 04 | Epoch Time: 0m 54s
	 Train Loss: 3.600 | Train Corr: 0.50
	 Val. Loss: 4.340 |  Val. Corr: 0.57
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 66.426 | Train Corr: 0.01
	 Val. Loss: 16.859 |  Val. Corr: 0.49
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 8.219 | Train Corr: -0.03
	 Val. Loss: 4.056 |  Val. Corr: 0.48
Epoch: 02 | Epoch Time: 0m 53s
	 Train Loss: 5.546 | Train Corr: 0.04
	 Val. Loss: 4.036 |  Val. Corr: 0.33
Epoch: 03 | Epoch Time: 0m 53s
	 Train Loss: 4.755 | Train Corr: 0.30
	 Val. Loss: 4.883 |  Val. Corr: 0.29
Epoch: 04 | Epoch Time: 0m 53s
	 Train Loss: 4.274 | Train Corr: 0.42
	 Val. Loss: 4.326 |  Val. Corr: 0.47
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 52s
	 Train Loss: 66.275 | Train Corr: -0.00
	 Val. Loss: 20.598 |  Val. Corr: 0.47
Epoch: 01 | Epoch Time: 0m 51s
	 Train Loss: 8.516 | Train Corr: -0.05
	 Val. Loss: 4.808 |  Val. Corr: 0.45
Epoch: 02 | Epoch Time: 0m 52s
	 Train Loss: 5.377 | Train Corr: 0.10
	 Val. Loss: 4.632 |  Val. Corr: 0.24
Epoch: 03 | Epoch Time: 0m 52s
	 Train Loss: 4.474 | Train Corr: 0.32
	 Val. Loss: 4.635 |  Val. Corr: 0.34
Epoch: 04 | Epoch Time: 0m 52s
	 Train Loss: 4.006 | Train Corr: 0.43
	 Val. Loss: 5.263 |  Val. Corr: 0.32
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 50s
	 Train Loss: 60.463 | Train Corr: -0.02
	 Val. Loss: 18.207 |  Val. Corr: 0.45
Epoch: 01 | Epoch Time: 0m 50s
	 Train Loss: 7.059 | Train Corr: 0.01
	 Val. Loss: 4.839 |  Val. Corr: 0.00
Epoch: 02 | Epoch Time: 0m 49s
	 Train Loss: 5.260 | Train Corr: 0.11
	 Val. Loss: 3.978 |  Val. Corr: 0.46
Epoch: 03 | Epoch Time: 0m 49s
	 Train Loss: 4.037 | Train Corr: 0.42
	 Val. Loss: 4.380 |  Val. Corr: 0.37
eid 2, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 128, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 3}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 54s
	 Train Loss: 82.967 | Train Corr: 0.00
	 Val. Loss: 32.776 |  Val. Corr: 0.46
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 54s
	 Train Loss: 11.593 | Train Corr: -0.03
	 Val. Loss: 4.588 |  Val. Corr: 0.47
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 54s
	 Train Loss: 5.661 | Train Corr: -0.00
	 Val. Loss: 4.537 |  Val. Corr: 0.48
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 54s
	 Train Loss: 83.181 | Train Corr: -0.02
	 Val. Loss: 25.126 |  Val. Corr: 0.36
Epoch: 01 | Epoch Time: 0m 54s
	 Train Loss: 11.742 | Train Corr: -0.01
	 Val. Loss: 4.759 |  Val. Corr: 0.36
Epoch: 02 | Epoch Time: 0m 54s
	 Train Loss: 5.816 | Train Corr: -0.03
	 Val. Loss: 4.864 |  Val. Corr: 0.36
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 54s
	 Train Loss: 83.692 | Train Corr: -0.02
	 Val. Loss: 21.633 |  Val. Corr: 0.53
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 54s
	 Train Loss: 8.923 | Train Corr: -0.04
	 Val. Loss: 4.066 |  Val. Corr: 0.52
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 54s
	 Train Loss: 5.604 | Train Corr: 0.01
	 Val. Loss: 4.054 |  Val. Corr: 0.22
Epoch: 03 | Epoch Time: 0m 54s
	 Train Loss: 4.916 | Train Corr: 0.29
	 Val. Loss: 4.271 |  Val. Corr: 0.42
Epoch: 04 | Epoch Time: 0m 53s
	 Train Loss: 4.240 | Train Corr: 0.41
	 Val. Loss: 4.367 |  Val. Corr: 0.47
updating saved weights of best model
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 70.592 | Train Corr: -0.01
	 Val. Loss: 12.720 |  Val. Corr: 0.59
Epoch: 01 | Epoch Time: 0m 52s
	 Train Loss: 5.992 | Train Corr: -0.00
	 Val. Loss: 4.916 |  Val. Corr: 0.14
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 51s
	 Train Loss: 87.142 | Train Corr: -0.03
	 Val. Loss: 35.763 |  Val. Corr: 0.42
Epoch: 01 | Epoch Time: 0m 51s
	 Train Loss: 11.809 | Train Corr: -0.03
	 Val. Loss: 5.079 |  Val. Corr: 0.41
Epoch: 02 | Epoch Time: 0m 50s
	 Train Loss: 5.897 | Train Corr: -0.04
	 Val. Loss: 4.885 |  Val. Corr: 0.14
Epoch: 03 | Epoch Time: 0m 50s
	 Train Loss: 4.883 | Train Corr: 0.27
	 Val. Loss: 4.638 |  Val. Corr: 0.33
eid 3, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 256, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 1}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 55s
	 Train Loss: 19.386 | Train Corr: 0.04
	 Val. Loss: 4.344 |  Val. Corr: 0.18
Epoch: 01 | Epoch Time: 0m 55s
	 Train Loss: 3.884 | Train Corr: 0.44
	 Val. Loss: 4.646 |  Val. Corr: 0.38
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 55s
	 Train Loss: 3.176 | Train Corr: 0.58
	 Val. Loss: 3.936 |  Val. Corr: 0.54
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 55s
	 Train Loss: 2.270 | Train Corr: 0.72
	 Val. Loss: 3.252 |  Val. Corr: 0.61
Epoch: 04 | Epoch Time: 0m 54s
	 Train Loss: 1.557 | Train Corr: 0.82
	 Val. Loss: 3.683 |  Val. Corr: 0.60
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 55s
	 Train Loss: 1.289 | Train Corr: 0.85
	 Val. Loss: 2.794 |  Val. Corr: 0.61
Epoch: 06 | Epoch Time: 0m 55s
	 Train Loss: 1.132 | Train Corr: 0.87
	 Val. Loss: 2.889 |  Val. Corr: 0.59
Epoch: 07 | Epoch Time: 0m 55s
	 Train Loss: 0.958 | Train Corr: 0.89
	 Val. Loss: 3.003 |  Val

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 55s
	 Train Loss: 25.039 | Train Corr: 0.01
	 Val. Loss: 4.973 |  Val. Corr: 0.33
Epoch: 01 | Epoch Time: 0m 55s
	 Train Loss: 4.717 | Train Corr: 0.22
	 Val. Loss: 6.033 |  Val. Corr: 0.35
Epoch: 02 | Epoch Time: 0m 55s
	 Train Loss: 3.720 | Train Corr: 0.46
	 Val. Loss: 5.949 |  Val. Corr: 0.55
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 54s
	 Train Loss: 25.272 | Train Corr: 0.01
	 Val. Loss: 3.931 |  Val. Corr: 0.42
Epoch: 01 | Epoch Time: 0m 55s
	 Train Loss: 4.221 | Train Corr: 0.37
	 Val. Loss: 4.665 |  Val. Corr: 0.44
Epoch: 02 | Epoch Time: 0m 55s
	 Train Loss: 3.703 | Train Corr: 0.49
	 Val. Loss: 4.424 |  Val. Corr: 0.59
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 54s
	 Train Loss: 21.092 | Train Corr: 0.04
	 Val. Loss: 4.753 |  Val. Corr: 0.45
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 3.638 | Train Corr: 0.47
	 Val. Loss: 4.204 |  Val. Corr: 0.54
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 51s
	 Train Loss: 22.774 | Train Corr: 0.01
	 Val. Loss: 4.581 |  Val. Corr: 0.50
Epoch: 01 | Epoch Time: 0m 52s
	 Train Loss: 3.872 | Train Corr: 0.43
	 Val. Loss: 4.377 |  Val. Corr: 0.53
Epoch: 02 | Epoch Time: 0m 51s
	 Train Loss: 2.965 | Train Corr: 0.61
	 Val. Loss: 3.020 |  Val. Corr: 0.66
Epoch: 03 | Epoch Time: 0m 51s
	 Train Loss: 2.184 | Train Corr: 0.73
	 Val. Loss: 3.562 |  Val. Corr: 0.64
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 51s
	 Train Loss: 1.591 | Train Corr: 0.81
	 Val. Loss: 2.748 |  Val. Corr: 0.67
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 52s
	 Train Loss: 1.298 | Train Corr: 0.85
	 Val. Loss: 2.606 |  Val. Corr: 0.68
Epoch: 06 | Epoch Time: 0m 51s
	 Train Loss: 1.030 | Train Corr: 0.88
	 Val. Loss: 2.626 |  Val. Corr: 0.67
Epoch: 07 | Epoch Time: 0m 52s
	 Train Loss: 0.877 | Train Corr: 0.90
	 Val. Loss: 2.945 |  Val. Corr: 0.63
eid 4, params {'batch_size': 8, 'bidirectional': False, 'drop

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 56s
	 Train Loss: 37.570 | Train Corr: 0.01
	 Val. Loss: 4.434 |  Val. Corr: 0.34
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 56s
	 Train Loss: 4.726 | Train Corr: 0.16
	 Val. Loss: 4.316 |  Val. Corr: 0.38
Epoch: 02 | Epoch Time: 0m 56s
	 Train Loss: 3.712 | Train Corr: 0.48
	 Val. Loss: 4.534 |  Val. Corr: 0.39
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 56s
	 Train Loss: 3.132 | Train Corr: 0.58
	 Val. Loss: 2.963 |  Val. Corr: 0.59
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 55s
	 Train Loss: 2.150 | Train Corr: 0.74
	 Val. Loss: 2.820 |  Val. Corr: 0.63
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 57s
	 Train Loss: 1.651 | Train Corr: 0.81
	 Val. Loss: 2.759 |  Val. Corr: 0.64
updating saved weights of best model
Epoch: 06 | Epoch Time: 0m 57s
	 Train Loss: 1.342 | Train Corr: 0.85
	 Val. Loss: 2.728 |  Val. Corr: 0.62
updating saved weight

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 57s
	 Train Loss: 34.985 | Train Corr: 0.00
	 Val. Loss: 4.874 |  Val. Corr: 0.03
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 4.305 | Train Corr: 0.30
	 Val. Loss: 7.151 |  Val. Corr: 0.55
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 3.535 | Train Corr: 0.49
	 Val. Loss: 3.869 |  Val. Corr: 0.62
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 56s
	 Train Loss: 35.457 | Train Corr: 0.01
	 Val. Loss: 4.095 |  Val. Corr: -0.03
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 4.394 | Train Corr: 0.33
	 Val. Loss: 4.395 |  Val. Corr: 0.61
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 3.761 | Train Corr: 0.48
	 Val. Loss: 2.937 |  Val. Corr: 0.67
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 57s
	 Train Loss: 3.059 | Train Corr: 0.61
	 Val. Loss: 2.447 |  Val. Corr: 0.72
Epoch: 04 | Epoch Time: 0m 56s
	 Train Loss: 2.194 | Train Corr: 0.74
	 Val. Loss: 2.600 |  Val. Corr: 0.69
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 57s
	 Train Loss: 1.663 | Train Corr: 0.81
	 Val. Loss: 2.408 |  Val. Corr: 0.77
Epoch: 06 | Epoch Time: 0m 57s
	 Train Loss: 1.303 | Train Corr: 0.86
	 Val. Loss: 2.871 |  Val. Corr: 0.72
updating saved weights of best model
Epoch: 07 | Epoch Time: 0m 56s
	 Train Loss: 1.117 | Train Corr: 0.88
	 Val. Loss: 2.245 |  Val. Corr: 0.76
updating saved weights 

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 39.923 | Train Corr: -0.03
	 Val. Loss: 4.760 |  Val. Corr: 0.30
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 4.360 | Train Corr: 0.29
	 Val. Loss: 4.376 |  Val. Corr: 0.46
Epoch: 02 | Epoch Time: 0m 52s
	 Train Loss: 3.697 | Train Corr: 0.47
	 Val. Loss: 3.969 |  Val. Corr: 0.51
eid 5, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 256, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 3}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 57s
	 Train Loss: 48.097 | Train Corr: 0.01
	 Val. Loss: 4.481 |  Val. Corr: -0.22
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 58s
	 Train Loss: 5.433 | Train Corr: -0.05
	 Val. Loss: 4.429 |  Val. Corr: 0.50
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 58s
	 Train Loss: 51.030 | Train Corr: -0.01
	 Val. Loss: 5.147 |  Val. Corr: 0.32
Epoch: 01 | Epoch Time: 0m 58s
	 Train Loss: 5.109 | Train Corr: 0.04
	 Val. Loss: 4.877 |  Val. Corr: 0.38
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 57s
	 Train Loss: 47.193 | Train Corr: -0.00
	 Val. Loss: 4.170 |  Val. Corr: 0.51
Epoch: 01 | Epoch Time: 0m 58s
	 Train Loss: 4.902 | Train Corr: 0.15
	 Val. Loss: 4.359 |  Val. Corr: 0.46
Epoch: 02 | Epoch Time: 0m 58s
	 Train Loss: 4.009 | Train Corr: 0.43
	 Val. Loss: 4.711 |  Val. Corr: 0.30
updating saved weights of best model
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 56s
	 Train Loss: 52.079 | Train Corr: -0.00
	 Val. Loss: 4.817 |  Val. Corr: 0.57
Epoch: 01 | Epoch Time: 0m 56s
	 Train Loss: 5.689 | Train Corr: -0.02
	 Val. Loss: 4.796 |  Val. Corr: 0.55
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 4.516 | Train Corr: 0.25
	 Val. Loss: 4.603 |  Val. Corr: 0.30
Epoch: 03 | Epoch Time: 0m 56s
	 Train Loss: 3.977 | Train Corr: 0.39
	 Val. Loss: 4.498 |  Val. Corr: 0.48
Epoch: 04 | Epoch Time: 0m 56s
	 Train Loss: 3.452 | Train Corr: 0.51
	 Val. Loss: 4.353 |  Val. Corr: 0.52
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 54s
	 Train Loss: 42.469 | Train Corr: -0.03
	 Val. Loss: 5.099 |  Val. Corr: 0.41
Epoch: 01 | Epoch Time: 0m 54s
	 Train Loss: 4.361 | Train Corr: 0.29
	 Val. Loss: 4.482 |  Val. Corr: 0.40
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 53s
	 Train Loss: 3.375 | Train Corr: 0.53
	 Val. Loss: 3.945 |  Val. Corr: 0.52
updating saved weights of best model
eid 6, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 512, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 1}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 57s
	 Train Loss: 17.907 | Train Corr: 0.09
	 Val. Loss: 4.474 |  Val. Corr: 0.35
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 3.357 | Train Corr: 0.53
	 Val. Loss: 4.783 |  Val. Corr: 0.50
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 2.954 | Train Corr: 0.61
	 Val. Loss: 3.349 |  Val. Corr: 0.56
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 56s
	 Train Loss: 2.032 | Train Corr: 0.75
	 Val. Loss: 3.167 |  Val. Corr: 0.55
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 56s
	 Train Loss: 1.402 | Train Corr: 0.84
	 Val. Loss: 3.091 |  Val. Corr: 0.60
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 57s
	 Train Loss: 1.119 | Train Corr: 0.87
	 Val. Loss: 2.756 |  Val. Corr: 0.61
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 57s
	 Train Loss: 17.528 | Train Corr: 0.12
	 Val. Loss: 4.758 |  Val. Corr: 0.48
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 3.298 | Train Corr: 0.53
	 Val. Loss: 5.925 |  Val. Corr: 0.65
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 2.568 | Train Corr: 0.66
	 Val. Loss: 5.369 |  Val. Corr: 0.67
Epoch: 03 | Epoch Time: 0m 57s
	 Train Loss: 1.882 | Train Corr: 0.77
	 Val. Loss: 4.099 |  Val. Corr: 0.69
Epoch: 04 | Epoch Time: 0m 57s
	 Train Loss: 1.262 | Train Corr: 0.85
	 Val. Loss: 3.001 |  Val. Corr: 0.70
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 58s
	 Train Loss: 0.995 | Train Corr: 0.88
	 Val. Loss: 2.533 |  Val. Corr: 0.71
Epoch: 06 | Epoch Time: 0m 58s
	 Train Loss: 0.846 | Train Corr: 0.90
	 Val. Loss: 3.708 |  Val. Corr: 0.68
Epoch: 07 | Epoch Time: 0m 56s
	 Train Loss: 0.594 | Train Corr: 0.93
	 Val. Loss: 2.780 |  Val. Corr: 0.70
Epoch: 08 | Epoch Time: 0m 57s
	 Train Loss: 0.666 | Train Corr: 0.93
	 Val. Loss: 2.568 |  Val. C

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 56s
	 Train Loss: 17.492 | Train Corr: 0.10
	 Val. Loss: 3.925 |  Val. Corr: 0.52
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 3.762 | Train Corr: 0.47
	 Val. Loss: 3.891 |  Val. Corr: 0.61
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 2.862 | Train Corr: 0.63
	 Val. Loss: 3.655 |  Val. Corr: 0.72
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 57s
	 Train Loss: 2.339 | Train Corr: 0.72
	 Val. Loss: 1.623 |  Val. Corr: 0.79
Epoch: 04 | Epoch Time: 0m 56s
	 Train Loss: 1.655 | Train Corr: 0.81
	 Val. Loss: 1.910 |  Val. Corr: 0.76
Epoch: 05 | Epoch Time: 0m 57s
	 Train Loss: 1.133 | Train Corr: 0.87
	 Val. Loss: 1.657 |  Val. Corr: 0.77
Epoch: 06 | Epoch Time: 0m 57s
	 Train Loss: 0.914 | Train Corr: 0.90
	 Val. Loss: 2.822 |  Val. Corr: 0.77
Epoch: 07 | Epoch Time: 0m 56s
	 Train Loss: 0.748 | Train Corr: 0.92
	 Val. Loss: 1.922 |  Val. Corr: 0.78
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 55s
	 Train Loss: 17.634 | Train Corr: 0.10
	 Val. Loss: 4.480 |  Val. Corr: 0.45
Epoch: 01 | Epoch Time: 0m 55s
	 Train Loss: 3.423 | Train Corr: 0.51
	 Val. Loss: 3.920 |  Val. Corr: 0.56
Epoch: 02 | Epoch Time: 0m 56s
	 Train Loss: 3.020 | Train Corr: 0.59
	 Val. Loss: 3.395 |  Val. Corr: 0.69
Epoch: 03 | Epoch Time: 0m 55s
	 Train Loss: 2.157 | Train Corr: 0.73
	 Val. Loss: 2.579 |  Val. Corr: 0.72
Epoch: 04 | Epoch Time: 0m 55s
	 Train Loss: 1.531 | Train Corr: 0.82
	 Val. Loss: 3.275 |  Val. Corr: 0.73
Epoch: 05 | Epoch Time: 0m 56s
	 Train Loss: 1.165 | Train Corr: 0.86
	 Val. Loss: 2.524 |  Val. Corr: 0.74
Epoch: 06 | Epoch Time: 0m 56s
	 Train Loss: 0.917 | Train Corr: 0.89
	 Val. Loss: 2.260 |  Val. Corr: 0.74
Epoch: 07 | Epoch Time: 0m 56s
	 Train Loss: 0.735 | Train Corr: 0.92
	 Val. Loss: 2.850 |  Val. Corr: 0.76
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 17.982 | Train Corr: 0.09
	 Val. Loss: 4.048 |  Val. Corr: 0.54
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 3.224 | Train Corr: 0.55
	 Val. Loss: 3.525 |  Val. Corr: 0.59
Epoch: 02 | Epoch Time: 0m 53s
	 Train Loss: 2.540 | Train Corr: 0.67
	 Val. Loss: 2.761 |  Val. Corr: 0.72
Epoch: 03 | Epoch Time: 0m 53s
	 Train Loss: 1.785 | Train Corr: 0.78
	 Val. Loss: 3.347 |  Val. Corr: 0.68
Epoch: 04 | Epoch Time: 0m 52s
	 Train Loss: 1.176 | Train Corr: 0.86
	 Val. Loss: 2.385 |  Val. Corr: 0.72
Epoch: 05 | Epoch Time: 0m 53s
	 Train Loss: 1.023 | Train Corr: 0.88
	 Val. Loss: 2.344 |  Val. Corr: 0.73
Epoch: 06 | Epoch Time: 0m 52s
	 Train Loss: 0.764 | Train Corr: 0.91
	 Val. Loss: 2.451 |  Val. Corr: 0.70
Epoch: 07 | Epoch Time: 0m 54s
	 Train Loss: 0.646 | Train Corr: 0.93
	 Val. Loss: 2.266 |  Val. Corr: 0.73
eid 7, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 512, 'lr': 5e-05, 'max_epochs': 10, 'num_layers':

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 1s
	 Train Loss: 25.200 | Train Corr: 0.05
	 Val. Loss: 4.413 |  Val. Corr: 0.49
updating saved weights of best model
Epoch: 01 | Epoch Time: 1m 1s
	 Train Loss: 4.234 | Train Corr: 0.32
	 Val. Loss: 4.324 |  Val. Corr: 0.38
Epoch: 02 | Epoch Time: 1m 1s
	 Train Loss: 3.416 | Train Corr: 0.53
	 Val. Loss: 4.996 |  Val. Corr: 0.41
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 1s
	 Train Loss: 2.680 | Train Corr: 0.65
	 Val. Loss: 3.159 |  Val. Corr: 0.54
Epoch: 04 | Epoch Time: 1m 1s
	 Train Loss: 1.910 | Train Corr: 0.77
	 Val. Loss: 3.468 |  Val. Corr: 0.58
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 2s
	 Train Loss: 1.424 | Train Corr: 0.83
	 Val. Loss: 2.980 |  Val. Corr: 0.64
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 2s
	 Train Loss: 24.681 | Train Corr: 0.06
	 Val. Loss: 4.696 |  Val. Corr: 0.52
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 3.699 | Train Corr: 0.44
	 Val. Loss: 5.774 |  Val. Corr: 0.63
Epoch: 02 | Epoch Time: 1m 2s
	 Train Loss: 3.016 | Train Corr: 0.58
	 Val. Loss: 4.314 |  Val. Corr: 0.70
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 2s
	 Train Loss: 2.397 | Train Corr: 0.69
	 Val. Loss: 2.918 |  Val. Corr: 0.71
updating saved weights of best model
Epoch: 04 | Epoch Time: 1m 2s
	 Train Loss: 1.722 | Train Corr: 0.79
	 Val. Loss: 2.748 |  Val. Corr: 0.71
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 3s
	 Train Loss: 1.286 | Train Corr: 0.85
	 Val. Loss: 2.524 |  Val. Corr: 0.69
Epoch: 06 | Epoch Time: 1m 2s
	 Train Loss: 1.128 | Train Corr: 0.87
	 Val. Loss: 2.766 |  Val. Corr: 0.69
Epoch: 07 | Epoch Time: 1m 1s
	 Train Loss: 0.900 | Train Corr: 0.90
	 Val. Loss: 3.214 |  Val. Corr: 0.70
Epoch: 08 | Epoch Time: 1m 1s
	 

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 1s
	 Train Loss: 25.054 | Train Corr: 0.04
	 Val. Loss: 4.084 |  Val. Corr: 0.44
Epoch: 01 | Epoch Time: 1m 1s
	 Train Loss: 4.471 | Train Corr: 0.28
	 Val. Loss: 4.309 |  Val. Corr: 0.56
Epoch: 02 | Epoch Time: 1m 2s
	 Train Loss: 3.619 | Train Corr: 0.49
	 Val. Loss: 5.085 |  Val. Corr: 0.58
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 0s
	 Train Loss: 22.906 | Train Corr: 0.05
	 Val. Loss: 4.990 |  Val. Corr: 0.40
Epoch: 01 | Epoch Time: 0m 59s
	 Train Loss: 3.704 | Train Corr: 0.45
	 Val. Loss: 4.159 |  Val. Corr: 0.53
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 58s
	 Train Loss: 23.911 | Train Corr: 0.02
	 Val. Loss: 5.160 |  Val. Corr: 0.48
Epoch: 01 | Epoch Time: 0m 58s
	 Train Loss: 3.811 | Train Corr: 0.41
	 Val. Loss: 4.509 |  Val. Corr: 0.41
eid 8, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 512, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 3}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 6s
	 Train Loss: 30.738 | Train Corr: 0.04
	 Val. Loss: 4.458 |  Val. Corr: 0.29
Epoch: 01 | Epoch Time: 1m 6s
	 Train Loss: 4.962 | Train Corr: 0.09
	 Val. Loss: 4.620 |  Val. Corr: 0.40
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 6s
	 Train Loss: 29.050 | Train Corr: 0.04
	 Val. Loss: 4.879 |  Val. Corr: 0.30
Epoch: 01 | Epoch Time: 1m 7s
	 Train Loss: 4.373 | Train Corr: 0.22
	 Val. Loss: 6.013 |  Val. Corr: 0.49
Epoch: 02 | Epoch Time: 1m 6s
	 Train Loss: 3.742 | Train Corr: 0.42
	 Val. Loss: 5.537 |  Val. Corr: 0.64
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 5s
	 Train Loss: 29.080 | Train Corr: 0.03
	 Val. Loss: 4.065 |  Val. Corr: 0.17
Epoch: 01 | Epoch Time: 1m 6s
	 Train Loss: 5.300 | Train Corr: -0.00
	 Val. Loss: 4.089 |  Val. Corr: 0.47
updating saved weights of best model
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 5s
	 Train Loss: 28.669 | Train Corr: 0.03
	 Val. Loss: 4.771 |  Val. Corr: 0.37
updating saved weights of best model
Epoch: 01 | Epoch Time: 1m 4s
	 Train Loss: 4.225 | Train Corr: 0.29
	 Val. Loss: 3.819 |  Val. Corr: 0.63
Epoch: 02 | Epoch Time: 1m 5s
	 Train Loss: 3.591 | Train Corr: 0.47
	 Val. Loss: 3.827 |  Val. Corr: 0.57
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 2s
	 Train Loss: 29.157 | Train Corr: 0.00
	 Val. Loss: 5.341 |  Val. Corr: 0.20
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 5.428 | Train Corr: -0.02
	 Val. Loss: 5.092 |  Val. Corr: 0.37
Epoch: 02 | Epoch Time: 1m 1s
	 Train Loss: 4.853 | Train Corr: 0.01
	 Val. Loss: 4.937 |  Val. Corr: 0.25
eid 9, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 768, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 1}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 2s
	 Train Loss: 14.143 | Train Corr: 0.12
	 Val. Loss: 4.136 |  Val. Corr: 0.30
updating saved weights of best model
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 3.270 | Train Corr: 0.55
	 Val. Loss: 3.760 |  Val. Corr: 0.59
Epoch: 02 | Epoch Time: 1m 2s
	 Train Loss: 2.549 | Train Corr: 0.68
	 Val. Loss: 3.918 |  Val. Corr: 0.62
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 2s
	 Train Loss: 1.449 | Train Corr: 0.83
	 Val. Loss: 3.150 |  Val. Corr: 0.59
updating saved weights of best model
Epoch: 04 | Epoch Time: 1m 1s
	 Train Loss: 1.054 | Train Corr: 0.88
	 Val. Loss: 2.669 |  Val. Corr: 0.64
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 2s
	 Train Loss: 15.465 | Train Corr: 0.13
	 Val. Loss: 6.054 |  Val. Corr: 0.46
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 3.358 | Train Corr: 0.52
	 Val. Loss: 5.788 |  Val. Corr: 0.64
Epoch: 02 | Epoch Time: 1m 2s
	 Train Loss: 2.607 | Train Corr: 0.65
	 Val. Loss: 4.344 |  Val. Corr: 0.69
Epoch: 03 | Epoch Time: 1m 3s
	 Train Loss: 1.745 | Train Corr: 0.78
	 Val. Loss: 3.577 |  Val. Corr: 0.63
Epoch: 04 | Epoch Time: 1m 3s
	 Train Loss: 1.106 | Train Corr: 0.87
	 Val. Loss: 3.199 |  Val. Corr: 0.65
Epoch: 05 | Epoch Time: 1m 3s
	 Train Loss: 0.849 | Train Corr: 0.90
	 Val. Loss: 2.842 |  Val. Corr: 0.70
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 1s
	 Train Loss: 16.312 | Train Corr: 0.13
	 Val. Loss: 3.774 |  Val. Corr: 0.49
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 3.207 | Train Corr: 0.57
	 Val. Loss: 2.773 |  Val. Corr: 0.73
updating saved weights of best model
Epoch: 02 | Epoch Time: 1m 2s
	 Train Loss: 2.245 | Train Corr: 0.73
	 Val. Loss: 2.111 |  Val. Corr: 0.73
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 2s
	 Train Loss: 1.783 | Train Corr: 0.79
	 Val. Loss: 2.057 |  Val. Corr: 0.74
Epoch: 04 | Epoch Time: 1m 1s
	 Train Loss: 1.286 | Train Corr: 0.86
	 Val. Loss: 2.184 |  Val. Corr: 0.76
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 3s
	 Train Loss: 0.839 | Train Corr: 0.91
	 Val. Loss: 2.040 |  Val. Corr: 0.76
Epoch: 06 | Epoch Time: 1m 2s
	 Train Loss: 0.624 | Train Corr: 0.93
	 Val. Loss: 2.264 |  Val. Corr: 0.76
Epoch: 07 | Epoch Time: 1m 2s
	 Train Loss: 0.527 | Train Corr: 0.94
	 Val. Loss: 2.828 |  Val. Corr: 0.77
updating saved weights of best m

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 1s
	 Train Loss: 14.518 | Train Corr: 0.13
	 Val. Loss: 4.301 |  Val. Corr: 0.51
Epoch: 01 | Epoch Time: 1m 0s
	 Train Loss: 3.233 | Train Corr: 0.54
	 Val. Loss: 3.553 |  Val. Corr: 0.63
Epoch: 02 | Epoch Time: 1m 1s
	 Train Loss: 2.595 | Train Corr: 0.66
	 Val. Loss: 3.655 |  Val. Corr: 0.72
Epoch: 03 | Epoch Time: 1m 0s
	 Train Loss: 1.821 | Train Corr: 0.78
	 Val. Loss: 2.355 |  Val. Corr: 0.75
Epoch: 04 | Epoch Time: 1m 0s
	 Train Loss: 1.355 | Train Corr: 0.84
	 Val. Loss: 2.291 |  Val. Corr: 0.77
Epoch: 05 | Epoch Time: 1m 1s
	 Train Loss: 0.937 | Train Corr: 0.89
	 Val. Loss: 2.160 |  Val. Corr: 0.77
Epoch: 06 | Epoch Time: 1m 1s
	 Train Loss: 0.654 | Train Corr: 0.93
	 Val. Loss: 1.975 |  Val. Corr: 0.77
Epoch: 07 | Epoch Time: 1m 1s
	 Train Loss: 0.537 | Train Corr: 0.94
	 Val. Loss: 1.973 |  Val. Corr: 0.78
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 58s
	 Train Loss: 14.932 | Train Corr: 0.12
	 Val. Loss: 5.007 |  Val. Corr: 0.44
Epoch: 01 | Epoch Time: 0m 59s
	 Train Loss: 3.108 | Train Corr: 0.57
	 Val. Loss: 3.595 |  Val. Corr: 0.66
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 2.550 | Train Corr: 0.67
	 Val. Loss: 2.885 |  Val. Corr: 0.68
Epoch: 03 | Epoch Time: 0m 58s
	 Train Loss: 1.834 | Train Corr: 0.77
	 Val. Loss: 3.021 |  Val. Corr: 0.67
Epoch: 04 | Epoch Time: 0m 57s
	 Train Loss: 1.175 | Train Corr: 0.86
	 Val. Loss: 2.697 |  Val. Corr: 0.66
Epoch: 05 | Epoch Time: 0m 59s
	 Train Loss: 0.869 | Train Corr: 0.90
	 Val. Loss: 2.572 |  Val. Corr: 0.71
Epoch: 06 | Epoch Time: 0m 57s
	 Train Loss: 0.748 | Train Corr: 0.92
	 Val. Loss: 2.485 |  Val. Corr: 0.68
Epoch: 07 | Epoch Time: 0m 59s
	 Train Loss: 0.670 | Train Corr: 0.92
	 Val. Loss: 2.749 |  Val. Corr: 0.66
Epoch: 08 | Epoch Time: 0m 57s
	 Train Loss: 0.512 | Train Corr: 0.94
	 Val. Loss: 2.909 |  Val. Corr: 0.68
eid 10, params {'batch_size

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 12s
	 Train Loss: 19.910 | Train Corr: 0.07
	 Val. Loss: 4.351 |  Val. Corr: 0.35
Epoch: 01 | Epoch Time: 1m 13s
	 Train Loss: 3.629 | Train Corr: 0.47
	 Val. Loss: 4.701 |  Val. Corr: 0.45
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 13s
	 Train Loss: 19.248 | Train Corr: 0.06
	 Val. Loss: 5.013 |  Val. Corr: 0.25
Epoch: 01 | Epoch Time: 1m 13s
	 Train Loss: 3.917 | Train Corr: 0.39
	 Val. Loss: 5.860 |  Val. Corr: 0.54
Epoch: 02 | Epoch Time: 1m 13s
	 Train Loss: 3.010 | Train Corr: 0.58
	 Val. Loss: 5.510 |  Val. Corr: 0.62
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 12s
	 Train Loss: 18.521 | Train Corr: 0.10
	 Val. Loss: 4.420 |  Val. Corr: 0.46
Epoch: 01 | Epoch Time: 1m 13s
	 Train Loss: 3.596 | Train Corr: 0.49
	 Val. Loss: 4.615 |  Val. Corr: 0.47
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 11s
	 Train Loss: 18.789 | Train Corr: 0.07
	 Val. Loss: 4.704 |  Val. Corr: 0.50
updating saved weights of best model
Epoch: 01 | Epoch Time: 1m 10s
	 Train Loss: 3.219 | Train Corr: 0.54
	 Val. Loss: 3.171 |  Val. Corr: 0.69
updating saved weights of best model
Epoch: 02 | Epoch Time: 1m 12s
	 Train Loss: 2.251 | Train Corr: 0.71
	 Val. Loss: 2.634 |  Val. Corr: 0.74
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 11s
	 Train Loss: 1.860 | Train Corr: 0.77
	 Val. Loss: 2.622 |  Val. Corr: 0.76
updating saved weights of best model
Epoch: 04 | Epoch Time: 1m 11s
	 Train Loss: 1.256 | Train Corr: 0.85
	 Val. Loss: 2.382 |  Val. Corr: 0.76
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 12s
	 Train Loss: 0.892 | Train Corr: 0.90
	 Val. Loss: 1.960 |  Val. Corr: 0.78
Epoch: 06 | Epoch Time: 1m 12s
	 Train Loss: 0.717 | Train Corr: 0.92
	 Val. Loss: 2.144 |  Val. Corr: 0.78
Epoch: 07 | Epoch Time: 1m 12s
	 Train Loss: 0.563 | Train

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 8s
	 Train Loss: 19.535 | Train Corr: 0.07
	 Val. Loss: 4.530 |  Val. Corr: 0.34
Epoch: 01 | Epoch Time: 1m 9s
	 Train Loss: 3.579 | Train Corr: 0.48
	 Val. Loss: 4.590 |  Val. Corr: 0.45
eid 11, params {'batch_size': 8, 'bidirectional': False, 'dropout': 0.2, 'hidden_dim': 768, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 3}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 22s
	 Train Loss: 22.350 | Train Corr: 0.03
	 Val. Loss: 4.549 |  Val. Corr: -0.25
Epoch: 01 | Epoch Time: 1m 22s
	 Train Loss: 5.053 | Train Corr: 0.02
	 Val. Loss: 4.565 |  Val. Corr: -0.24
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 23s
	 Train Loss: 23.865 | Train Corr: 0.04
	 Val. Loss: 5.025 |  Val. Corr: 0.31
Epoch: 01 | Epoch Time: 1m 23s
	 Train Loss: 4.706 | Train Corr: 0.11
	 Val. Loss: 5.033 |  Val. Corr: 0.55
Epoch: 02 | Epoch Time: 1m 23s
	 Train Loss: 3.579 | Train Corr: 0.46
	 Val. Loss: 4.839 |  Val. Corr: 0.61
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 22s
	 Train Loss: 23.264 | Train Corr: 0.03
	 Val. Loss: 4.071 |  Val. Corr: 0.17
Epoch: 01 | Epoch Time: 1m 22s
	 Train Loss: 5.113 | Train Corr: -0.04
	 Val. Loss: 4.079 |  Val. Corr: 0.32
updating saved weights of best model
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 20s
	 Train Loss: 22.430 | Train Corr: 0.04
	 Val. Loss: 4.777 |  Val. Corr: 0.36
Epoch: 01 | Epoch Time: 1m 20s
	 Train Loss: 4.798 | Train Corr: -0.01
	 Val. Loss: 4.844 |  Val. Corr: 0.23
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 17s
	 Train Loss: 22.314 | Train Corr: 0.02
	 Val. Loss: 4.769 |  Val. Corr: 0.33
Epoch: 01 | Epoch Time: 1m 18s
	 Train Loss: 4.954 | Train Corr: -0.01
	 Val. Loss: 5.070 |  Val. Corr: 0.36
eid 12, params {'batch_size': 8, 'bidirectional': True, 'dropout': 0.2, 'hidden_dim': 128, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 1}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 23.974 | Train Corr: 0.03
	 Val. Loss: 4.430 |  Val. Corr: 0.27
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 3.994 | Train Corr: 0.41
	 Val. Loss: 4.171 |  Val. Corr: 0.48
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 53s
	 Train Loss: 3.152 | Train Corr: 0.59
	 Val. Loss: 2.882 |  Val. Corr: 0.60
Epoch: 03 | Epoch Time: 0m 53s
	 Train Loss: 2.373 | Train Corr: 0.71
	 Val. Loss: 3.144 |  Val. Corr: 0.58
Epoch: 04 | Epoch Time: 0m 52s
	 Train Loss: 1.920 | Train Corr: 0.78
	 Val. Loss: 3.470 |  Val. Corr: 0.60
Epoch: 05 | Epoch Time: 0m 53s
	 Train Loss: 1.391 | Train Corr: 0.84
	 Val. Loss: 3.241 |  Val. Corr: 0.62
Epoch: 06 | Epoch Time: 0m 54s
	 Train Loss: 1.189 | Train Corr: 0.87
	 Val. Loss: 3.124 |  Val. Corr: 0.61
updating saved weights of best model
Epoch: 07 | Epoch Time: 0m 53s
	 Train Loss: 1.018 | Train Corr: 0.89
	 Val. Loss: 2.605 |  Val

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 17.903 | Train Corr: 0.07
	 Val. Loss: 6.510 |  Val. Corr: 0.53
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 3.955 | Train Corr: 0.41
	 Val. Loss: 6.501 |  Val. Corr: 0.62
Epoch: 02 | Epoch Time: 0m 53s
	 Train Loss: 3.558 | Train Corr: 0.49
	 Val. Loss: 7.055 |  Val. Corr: 0.59
Epoch: 03 | Epoch Time: 0m 54s
	 Train Loss: 3.170 | Train Corr: 0.57
	 Val. Loss: 4.165 |  Val. Corr: 0.58
Epoch: 04 | Epoch Time: 0m 54s
	 Train Loss: 2.604 | Train Corr: 0.66
	 Val. Loss: 3.506 |  Val. Corr: 0.67
Epoch: 05 | Epoch Time: 0m 54s
	 Train Loss: 2.232 | Train Corr: 0.72
	 Val. Loss: 2.813 |  Val. Corr: 0.70
Epoch: 06 | Epoch Time: 0m 54s
	 Train Loss: 1.687 | Train Corr: 0.80
	 Val. Loss: 4.015 |  Val. Corr: 0.66
updating saved weights of best model
Epoch: 07 | Epoch Time: 0m 53s
	 Train Loss: 1.440 | Train Corr: 0.83
	 Val. Loss: 2.590 |  Val. Corr: 0.70
Epoch: 08 | Epoch Time: 0m 53s
	 Train Loss: 1.159 | Train Corr: 0.87
	 Val. Loss: 2.790 |  Val. C

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 20.331 | Train Corr: 0.04
	 Val. Loss: 3.854 |  Val. Corr: 0.57
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 4.133 | Train Corr: 0.40
	 Val. Loss: 4.139 |  Val. Corr: 0.62
Epoch: 02 | Epoch Time: 0m 53s
	 Train Loss: 3.231 | Train Corr: 0.58
	 Val. Loss: 4.199 |  Val. Corr: 0.66
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 53s
	 Train Loss: 2.719 | Train Corr: 0.67
	 Val. Loss: 2.494 |  Val. Corr: 0.74
Epoch: 04 | Epoch Time: 0m 52s
	 Train Loss: 2.315 | Train Corr: 0.73
	 Val. Loss: 2.815 |  Val. Corr: 0.75
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 54s
	 Train Loss: 1.937 | Train Corr: 0.78
	 Val. Loss: 1.524 |  Val. Corr: 0.81
Epoch: 06 | Epoch Time: 0m 53s
	 Train Loss: 1.343 | Train Corr: 0.85
	 Val. Loss: 1.693 |  Val. Corr: 0.77
Epoch: 07 | Epoch Time: 0m 53s
	 Train Loss: 1.097 | Train Corr: 0.88
	 Val. Loss: 2.125 |  Val. Corr: 0.80
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 52s
	 Train Loss: 20.784 | Train Corr: 0.05
	 Val. Loss: 3.932 |  Val. Corr: 0.55
Epoch: 01 | Epoch Time: 0m 51s
	 Train Loss: 3.626 | Train Corr: 0.48
	 Val. Loss: 2.956 |  Val. Corr: 0.69
Epoch: 02 | Epoch Time: 0m 52s
	 Train Loss: 2.876 | Train Corr: 0.62
	 Val. Loss: 2.494 |  Val. Corr: 0.74
Epoch: 03 | Epoch Time: 0m 52s
	 Train Loss: 2.222 | Train Corr: 0.72
	 Val. Loss: 2.237 |  Val. Corr: 0.75
Epoch: 04 | Epoch Time: 0m 52s
	 Train Loss: 1.622 | Train Corr: 0.81
	 Val. Loss: 2.086 |  Val. Corr: 0.77
Epoch: 05 | Epoch Time: 0m 52s
	 Train Loss: 1.403 | Train Corr: 0.84
	 Val. Loss: 2.035 |  Val. Corr: 0.76
Epoch: 06 | Epoch Time: 0m 53s
	 Train Loss: 1.232 | Train Corr: 0.86
	 Val. Loss: 3.243 |  Val. Corr: 0.75
Epoch: 07 | Epoch Time: 0m 52s
	 Train Loss: 1.041 | Train Corr: 0.88
	 Val. Loss: 2.568 |  Val. Corr: 0.76
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 50s
	 Train Loss: 20.811 | Train Corr: 0.02
	 Val. Loss: 4.283 |  Val. Corr: 0.54
Epoch: 01 | Epoch Time: 0m 50s
	 Train Loss: 3.687 | Train Corr: 0.47
	 Val. Loss: 4.269 |  Val. Corr: 0.58
Epoch: 02 | Epoch Time: 0m 49s
	 Train Loss: 3.269 | Train Corr: 0.55
	 Val. Loss: 3.501 |  Val. Corr: 0.64
Epoch: 03 | Epoch Time: 0m 49s
	 Train Loss: 2.646 | Train Corr: 0.66
	 Val. Loss: 3.436 |  Val. Corr: 0.69
Epoch: 04 | Epoch Time: 0m 49s
	 Train Loss: 2.162 | Train Corr: 0.74
	 Val. Loss: 2.366 |  Val. Corr: 0.72
Epoch: 05 | Epoch Time: 0m 50s
	 Train Loss: 1.667 | Train Corr: 0.80
	 Val. Loss: 2.144 |  Val. Corr: 0.74
Epoch: 06 | Epoch Time: 0m 49s
	 Train Loss: 1.257 | Train Corr: 0.86
	 Val. Loss: 2.311 |  Val. Corr: 0.72
Epoch: 07 | Epoch Time: 0m 50s
	 Train Loss: 1.123 | Train Corr: 0.87
	 Val. Loss: 2.310 |  Val. Corr: 0.74
Epoch: 08 | Epoch Time: 0m 48s
	 Train Loss: 0.909 | Train Corr: 0.90
	 Val. Loss: 2.280 |  Val. Corr: 0.73
eid 13, params {'batch_size

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 59s
	 Train Loss: 32.316 | Train Corr: 0.02
	 Val. Loss: 4.603 |  Val. Corr: 0.33
Epoch: 01 | Epoch Time: 0m 59s
	 Train Loss: 4.066 | Train Corr: 0.39
	 Val. Loss: 4.741 |  Val. Corr: 0.29
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 59s
	 Train Loss: 3.273 | Train Corr: 0.55
	 Val. Loss: 4.369 |  Val. Corr: 0.36
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 0s
	 Train Loss: 37.585 | Train Corr: 0.00
	 Val. Loss: 4.214 |  Val. Corr: 0.47
Epoch: 01 | Epoch Time: 1m 0s
	 Train Loss: 4.579 | Train Corr: 0.25
	 Val. Loss: 6.098 |  Val. Corr: 0.36
Epoch: 02 | Epoch Time: 0m 59s
	 Train Loss: 3.396 | Train Corr: 0.52
	 Val. Loss: 7.014 |  Val. Corr: 0.44
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 59s
	 Train Loss: 45.469 | Train Corr: -0.00
	 Val. Loss: 3.678 |  Val. Corr: 0.54
Epoch: 01 | Epoch Time: 0m 59s
	 Train Loss: 4.616 | Train Corr: 0.29
	 Val. Loss: 4.158 |  Val. Corr: 0.51
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 58s
	 Train Loss: 34.382 | Train Corr: 0.00
	 Val. Loss: 4.433 |  Val. Corr: 0.49
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 3.952 | Train Corr: 0.41
	 Val. Loss: 4.021 |  Val. Corr: 0.58
updating saved weights of best model
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 56s
	 Train Loss: 39.885 | Train Corr: -0.03
	 Val. Loss: 4.655 |  Val. Corr: 0.43
Epoch: 01 | Epoch Time: 0m 56s
	 Train Loss: 4.430 | Train Corr: 0.29
	 Val. Loss: 4.097 |  Val. Corr: 0.39
Epoch: 02 | Epoch Time: 0m 55s
	 Train Loss: 3.576 | Train Corr: 0.49
	 Val. Loss: 3.506 |  Val. Corr: 0.55
eid 14, params {'batch_size': 8, 'bidirectional': True, 'dropout': 0.2, 'hidden_dim': 128, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 3}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 6s
	 Train Loss: 51.566 | Train Corr: 0.02
	 Val. Loss: 3.969 |  Val. Corr: 0.46
updating saved weights of best model
Epoch: 01 | Epoch Time: 1m 6s
	 Train Loss: 3.922 | Train Corr: 0.42
	 Val. Loss: 3.698 |  Val. Corr: 0.54
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 6s
	 Train Loss: 53.082 | Train Corr: -0.01
	 Val. Loss: 4.598 |  Val. Corr: 0.36
Epoch: 01 | Epoch Time: 1m 6s
	 Train Loss: 4.780 | Train Corr: 0.16
	 Val. Loss: 5.088 |  Val. Corr: 0.35
Epoch: 02 | Epoch Time: 1m 6s
	 Train Loss: 3.860 | Train Corr: 0.43
	 Val. Loss: 3.669 |  Val. Corr: 0.53
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 5s
	 Train Loss: 48.656 | Train Corr: -0.00
	 Val. Loss: 3.408 |  Val. Corr: 0.59
Epoch: 01 | Epoch Time: 1m 6s
	 Train Loss: 4.611 | Train Corr: 0.28
	 Val. Loss: 3.527 |  Val. Corr: 0.44
Epoch: 02 | Epoch Time: 1m 6s
	 Train Loss: 3.880 | Train Corr: 0.45
	 Val. Loss: 3.962 |  Val. Corr: 0.38
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 4s
	 Train Loss: 47.765 | Train Corr: -0.00
	 Val. Loss: 4.720 |  Val. Corr: 0.42
Epoch: 01 | Epoch Time: 1m 3s
	 Train Loss: 4.244 | Train Corr: 0.34
	 Val. Loss: 4.449 |  Val. Corr: 0.47
Epoch: 02 | Epoch Time: 1m 5s
	 Train Loss: 3.390 | Train Corr: 0.52
	 Val. Loss: 4.454 |  Val. Corr: 0.51
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 1s
	 Train Loss: 49.727 | Train Corr: -0.04
	 Val. Loss: 4.535 |  Val. Corr: 0.44
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 4.172 | Train Corr: 0.33
	 Val. Loss: 4.330 |  Val. Corr: 0.54
Epoch: 02 | Epoch Time: 1m 0s
	 Train Loss: 3.242 | Train Corr: 0.55
	 Val. Loss: 3.754 |  Val. Corr: 0.54
eid 15, params {'batch_size': 8, 'bidirectional': True, 'dropout': 0.2, 'hidden_dim': 256, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 1}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 0m 56s
	 Train Loss: 16.221 | Train Corr: 0.14
	 Val. Loss: 4.000 |  Val. Corr: 0.45
updating saved weights of best model
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 3.173 | Train Corr: 0.57
	 Val. Loss: 3.637 |  Val. Corr: 0.59
updating saved weights of best model
Epoch: 02 | Epoch Time: 0m 56s
	 Train Loss: 2.594 | Train Corr: 0.67
	 Val. Loss: 3.274 |  Val. Corr: 0.62
updating saved weights of best model
Epoch: 03 | Epoch Time: 0m 56s
	 Train Loss: 1.680 | Train Corr: 0.80
	 Val. Loss: 3.177 |  Val. Corr: 0.57
Epoch: 04 | Epoch Time: 0m 56s
	 Train Loss: 1.313 | Train Corr: 0.85
	 Val. Loss: 3.270 |  Val. Corr: 0.63
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 57s
	 Train Loss: 1.017 | Train Corr: 0.89
	 Val. Loss: 2.803 |  Val. Corr: 0.62
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 57s
	 Train Loss: 15.414 | Train Corr: 0.12
	 Val. Loss: 4.921 |  Val. Corr: 0.52
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 3.563 | Train Corr: 0.48
	 Val. Loss: 7.408 |  Val. Corr: 0.52
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 2.922 | Train Corr: 0.60
	 Val. Loss: 4.647 |  Val. Corr: 0.70
Epoch: 03 | Epoch Time: 0m 57s
	 Train Loss: 2.130 | Train Corr: 0.73
	 Val. Loss: 2.969 |  Val. Corr: 0.64
Epoch: 04 | Epoch Time: 0m 57s
	 Train Loss: 1.401 | Train Corr: 0.83
	 Val. Loss: 3.489 |  Val. Corr: 0.67
Epoch: 05 | Epoch Time: 0m 58s
	 Train Loss: 1.040 | Train Corr: 0.88
	 Val. Loss: 3.099 |  Val. Corr: 0.62
Epoch: 06 | Epoch Time: 0m 57s
	 Train Loss: 0.753 | Train Corr: 0.91
	 Val. Loss: 3.239 |  Val. Corr: 0.64
Epoch: 07 | Epoch Time: 0m 56s
	 Train Loss: 0.635 | Train Corr: 0.93
	 Val. Loss: 2.958 |  Val. Corr: 0.70
updating saved weights of best model
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 56s
	 Train Loss: 15.680 | Train Corr: 0.13
	 Val. Loss: 4.495 |  Val. Corr: 0.47
Epoch: 01 | Epoch Time: 0m 57s
	 Train Loss: 3.521 | Train Corr: 0.52
	 Val. Loss: 3.765 |  Val. Corr: 0.63
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 55s
	 Train Loss: 15.294 | Train Corr: 0.13
	 Val. Loss: 5.208 |  Val. Corr: 0.38
Epoch: 01 | Epoch Time: 0m 54s
	 Train Loss: 3.203 | Train Corr: 0.55
	 Val. Loss: 4.159 |  Val. Corr: 0.51
Epoch: 02 | Epoch Time: 0m 56s
	 Train Loss: 2.745 | Train Corr: 0.64
	 Val. Loss: 3.838 |  Val. Corr: 0.69
Epoch: 03 | Epoch Time: 0m 55s
	 Train Loss: 1.873 | Train Corr: 0.77
	 Val. Loss: 2.802 |  Val. Corr: 0.70
updating saved weights of best model
Epoch: 04 | Epoch Time: 0m 55s
	 Train Loss: 1.344 | Train Corr: 0.84
	 Val. Loss: 2.712 |  Val. Corr: 0.72
updating saved weights of best model
Epoch: 05 | Epoch Time: 0m 56s
	 Train Loss: 1.011 | Train Corr: 0.88
	 Val. Loss: 2.590 |  Val. Corr: 0.72
Epoch: 06 | Epoch Time: 0m 56s
	 Train Loss: 0.779 | Train Corr: 0.91
	 Val. Loss: 4.585 |  Val. Corr: 0.74
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 53s
	 Train Loss: 16.203 | Train Corr: 0.09
	 Val. Loss: 4.903 |  Val. Corr: 0.45
Epoch: 01 | Epoch Time: 0m 53s
	 Train Loss: 3.379 | Train Corr: 0.52
	 Val. Loss: 4.307 |  Val. Corr: 0.56
Epoch: 02 | Epoch Time: 0m 52s
	 Train Loss: 2.812 | Train Corr: 0.62
	 Val. Loss: 3.274 |  Val. Corr: 0.63
Epoch: 03 | Epoch Time: 0m 52s
	 Train Loss: 2.090 | Train Corr: 0.74
	 Val. Loss: 4.776 |  Val. Corr: 0.61
Epoch: 04 | Epoch Time: 0m 52s
	 Train Loss: 1.702 | Train Corr: 0.80
	 Val. Loss: 3.504 |  Val. Corr: 0.55
Epoch: 05 | Epoch Time: 0m 53s
	 Train Loss: 1.376 | Train Corr: 0.84
	 Val. Loss: 3.025 |  Val. Corr: 0.60
Epoch: 06 | Epoch Time: 0m 52s
	 Train Loss: 0.987 | Train Corr: 0.89
	 Val. Loss: 2.875 |  Val. Corr: 0.62
eid 16, params {'batch_size': 8, 'bidirectional': True, 'dropout': 0.2, 'hidden_dim': 256, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 2}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 5s
	 Train Loss: 25.267 | Train Corr: 0.07
	 Val. Loss: 4.903 |  Val. Corr: 0.20
Epoch: 01 | Epoch Time: 1m 6s
	 Train Loss: 3.807 | Train Corr: 0.46
	 Val. Loss: 5.182 |  Val. Corr: 0.20
updating saved weights of best model
Epoch: 02 | Epoch Time: 1m 6s
	 Train Loss: 2.768 | Train Corr: 0.64
	 Val. Loss: 4.707 |  Val. Corr: 0.46
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 6s
	 Train Loss: 2.178 | Train Corr: 0.73
	 Val. Loss: 3.131 |  Val. Corr: 0.56
Epoch: 04 | Epoch Time: 1m 5s
	 Train Loss: 1.532 | Train Corr: 0.82
	 Val. Loss: 3.244 |  Val. Corr: 0.57
Epoch: 05 | Epoch Time: 1m 6s
	 Train Loss: 1.294 | Train Corr: 0.85
	 Val. Loss: 3.227 |  Val. Corr: 0.56
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 6s
	 Train Loss: 24.454 | Train Corr: 0.08
	 Val. Loss: 4.753 |  Val. Corr: 0.55
Epoch: 01 | Epoch Time: 1m 6s
	 Train Loss: 3.667 | Train Corr: 0.46
	 Val. Loss: 4.267 |  Val. Corr: 0.60
Epoch: 02 | Epoch Time: 1m 6s
	 Train Loss: 3.083 | Train Corr: 0.57
	 Val. Loss: 5.664 |  Val. Corr: 0.63
Epoch: 03 | Epoch Time: 1m 7s
	 Train Loss: 2.645 | Train Corr: 0.65
	 Val. Loss: 3.155 |  Val. Corr: 0.65
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 5s
	 Train Loss: 22.700 | Train Corr: 0.09
	 Val. Loss: 4.486 |  Val. Corr: 0.57
Epoch: 01 | Epoch Time: 1m 6s
	 Train Loss: 3.567 | Train Corr: 0.51
	 Val. Loss: 3.745 |  Val. Corr: 0.64
updating saved weights of best model
Epoch: 02 | Epoch Time: 1m 6s
	 Train Loss: 3.114 | Train Corr: 0.59
	 Val. Loss: 2.804 |  Val. Corr: 0.80
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 6s
	 Train Loss: 2.388 | Train Corr: 0.71
	 Val. Loss: 2.510 |  Val. Corr: 0.78
updating saved weights of best model
Epoch: 04 | Epoch Time: 1m 5s
	 Train Loss: 1.603 | Train Corr: 0.82
	 Val. Loss: 2.237 |  Val. Corr: 0.81
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 7s
	 Train Loss: 1.232 | Train Corr: 0.86
	 Val. Loss: 1.525 |  Val. Corr: 0.81
Epoch: 06 | Epoch Time: 1m 6s
	 Train Loss: 0.819 | Train Corr: 0.91
	 Val. Loss: 1.740 |  Val. Corr: 0.79
Epoch: 07 | Epoch Time: 1m 5s
	 Train Loss: 0.709 | Train Corr: 0.92
	 Val. Loss: 1.646 |  Val. Corr: 

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 4s
	 Train Loss: 26.467 | Train Corr: 0.06
	 Val. Loss: 4.808 |  Val. Corr: 0.48
Epoch: 01 | Epoch Time: 1m 3s
	 Train Loss: 3.284 | Train Corr: 0.54
	 Val. Loss: 3.265 |  Val. Corr: 0.67
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 1s
	 Train Loss: 21.043 | Train Corr: 0.06
	 Val. Loss: 4.784 |  Val. Corr: 0.54
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 3.270 | Train Corr: 0.54
	 Val. Loss: 4.656 |  Val. Corr: 0.39
eid 17, params {'batch_size': 8, 'bidirectional': True, 'dropout': 0.2, 'hidden_dim': 256, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 3}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 15s
	 Train Loss: 33.241 | Train Corr: 0.04
	 Val. Loss: 3.948 |  Val. Corr: 0.50
Epoch: 01 | Epoch Time: 1m 15s
	 Train Loss: 3.651 | Train Corr: 0.47
	 Val. Loss: 5.294 |  Val. Corr: 0.17
updating saved weights of best model
Epoch: 02 | Epoch Time: 1m 15s
	 Train Loss: 2.842 | Train Corr: 0.63
	 Val. Loss: 3.396 |  Val. Corr: 0.57
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 15s
	 Train Loss: 1.864 | Train Corr: 0.77
	 Val. Loss: 2.893 |  Val. Corr: 0.61
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 16s
	 Train Loss: 32.127 | Train Corr: 0.03
	 Val. Loss: 5.800 |  Val. Corr: 0.44
Epoch: 01 | Epoch Time: 1m 16s
	 Train Loss: 3.843 | Train Corr: 0.41
	 Val. Loss: 5.810 |  Val. Corr: 0.51
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 14s
	 Train Loss: 32.095 | Train Corr: 0.02
	 Val. Loss: 3.977 |  Val. Corr: 0.47
Epoch: 01 | Epoch Time: 1m 15s
	 Train Loss: 3.988 | Train Corr: 0.40
	 Val. Loss: 4.590 |  Val. Corr: 0.37
Epoch: 02 | Epoch Time: 1m 15s
	 Train Loss: 3.481 | Train Corr: 0.53
	 Val. Loss: 4.294 |  Val. Corr: 0.55
Epoch: 03 | Epoch Time: 1m 15s
	 Train Loss: 2.655 | Train Corr: 0.67
	 Val. Loss: 2.764 |  Val. Corr: 0.64
updating saved weights of best model
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 13s
	 Train Loss: 28.849 | Train Corr: 0.04
	 Val. Loss: 5.243 |  Val. Corr: 0.24
Epoch: 01 | Epoch Time: 1m 12s
	 Train Loss: 3.976 | Train Corr: 0.41
	 Val. Loss: 4.000 |  Val. Corr: 0.61
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 10s
	 Train Loss: 30.387 | Train Corr: 0.02
	 Val. Loss: 4.878 |  Val. Corr: 0.44
Epoch: 01 | Epoch Time: 1m 11s
	 Train Loss: 3.347 | Train Corr: 0.52
	 Val. Loss: 4.350 |  Val. Corr: 0.53
eid 18, params {'batch_size': 8, 'bidirectional': True, 'dropout': 0.2, 'hidden_dim': 512, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 1}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 2s
	 Train Loss: 13.173 | Train Corr: 0.17
	 Val. Loss: 5.270 |  Val. Corr: 0.44
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 3.118 | Train Corr: 0.56
	 Val. Loss: 6.754 |  Val. Corr: 0.62
Epoch: 02 | Epoch Time: 1m 2s
	 Train Loss: 2.387 | Train Corr: 0.69
	 Val. Loss: 6.568 |  Val. Corr: 0.67
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 2s
	 Train Loss: 1.582 | Train Corr: 0.81
	 Val. Loss: 3.062 |  Val. Corr: 0.65
Epoch: 04 | Epoch Time: 1m 2s
	 Train Loss: 1.203 | Train Corr: 0.86
	 Val. Loss: 3.900 |  Val. Corr: 0.66
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 3s
	 Train Loss: 1.036 | Train Corr: 0.88
	 Val. Loss: 2.970 |  Val. Corr: 0.69
Epoch: 06 | Epoch Time: 1m 3s
	 Train Loss: 0.813 | Train Corr: 0.91
	 Val. Loss: 3.864 |  Val. Corr: 0.65
Epoch: 07 | Epoch Time: 1m 1s
	 Train Loss: 0.684 | Train Corr: 0.92
	 Val. Loss: 3.172 |  Val. Corr: 0.66
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 1s
	 Train Loss: 13.088 | Train Corr: 0.16
	 Val. Loss: 4.961 |  Val. Corr: 0.51
Epoch: 01 | Epoch Time: 1m 2s
	 Train Loss: 3.779 | Train Corr: 0.47
	 Val. Loss: 5.233 |  Val. Corr: 0.59
Epoch: 02 | Epoch Time: 1m 2s
	 Train Loss: 2.906 | Train Corr: 0.62
	 Val. Loss: 5.124 |  Val. Corr: 0.75
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 2s
	 Train Loss: 2.065 | Train Corr: 0.75
	 Val. Loss: 1.827 |  Val. Corr: 0.75
Epoch: 04 | Epoch Time: 1m 1s
	 Train Loss: 1.446 | Train Corr: 0.84
	 Val. Loss: 2.798 |  Val. Corr: 0.75
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 3s
	 Train Loss: 1.122 | Train Corr: 0.87
	 Val. Loss: 1.795 |  Val. Corr: 0.76
Epoch: 06 | Epoch Time: 1m 2s
	 Train Loss: 0.911 | Train Corr: 0.90
	 Val. Loss: 2.569 |  Val. Corr: 0.73
Epoch: 07 | Epoch Time: 1m 1s
	 Train Loss: 0.557 | Train Corr: 0.94
	 Val. Loss: 2.376 |  Val. Corr: 0.72
Epoch: 08 | Epoch Time: 1m 1s
	 Train Loss: 0.421 | Train Corr: 0.95


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 0s
	 Train Loss: 12.686 | Train Corr: 0.15
	 Val. Loss: 4.638 |  Val. Corr: 0.50
Epoch: 01 | Epoch Time: 0m 59s
	 Train Loss: 2.920 | Train Corr: 0.60
	 Val. Loss: 2.945 |  Val. Corr: 0.73
Epoch: 02 | Epoch Time: 1m 1s
	 Train Loss: 2.470 | Train Corr: 0.68
	 Val. Loss: 2.475 |  Val. Corr: 0.78
Epoch: 03 | Epoch Time: 1m 0s
	 Train Loss: 1.635 | Train Corr: 0.80
	 Val. Loss: 2.408 |  Val. Corr: 0.77
Epoch: 04 | Epoch Time: 1m 0s
	 Train Loss: 1.087 | Train Corr: 0.87
	 Val. Loss: 1.806 |  Val. Corr: 0.80
Epoch: 05 | Epoch Time: 1m 1s
	 Train Loss: 0.858 | Train Corr: 0.90
	 Val. Loss: 1.868 |  Val. Corr: 0.80
Epoch: 06 | Epoch Time: 1m 1s
	 Train Loss: 0.757 | Train Corr: 0.91
	 Val. Loss: 2.336 |  Val. Corr: 0.79
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 0m 58s
	 Train Loss: 13.542 | Train Corr: 0.14
	 Val. Loss: 4.525 |  Val. Corr: 0.46
Epoch: 01 | Epoch Time: 0m 58s
	 Train Loss: 3.180 | Train Corr: 0.56
	 Val. Loss: 3.773 |  Val. Corr: 0.63
Epoch: 02 | Epoch Time: 0m 57s
	 Train Loss: 2.673 | Train Corr: 0.65
	 Val. Loss: 4.178 |  Val. Corr: 0.60
Epoch: 03 | Epoch Time: 0m 57s
	 Train Loss: 1.897 | Train Corr: 0.76
	 Val. Loss: 3.998 |  Val. Corr: 0.71
Epoch: 04 | Epoch Time: 0m 57s
	 Train Loss: 1.146 | Train Corr: 0.87
	 Val. Loss: 2.453 |  Val. Corr: 0.71
Epoch: 05 | Epoch Time: 0m 58s
	 Train Loss: 0.937 | Train Corr: 0.89
	 Val. Loss: 2.164 |  Val. Corr: 0.74
Epoch: 06 | Epoch Time: 0m 57s
	 Train Loss: 0.632 | Train Corr: 0.93
	 Val. Loss: 2.353 |  Val. Corr: 0.72
Epoch: 07 | Epoch Time: 0m 58s
	 Train Loss: 0.477 | Train Corr: 0.95
	 Val. Loss: 2.349 |  Val. Corr: 0.72
Epoch: 08 | Epoch Time: 0m 57s
	 Train Loss: 0.421 | Train Corr: 0.95
	 Val. Loss: 2.460 |  Val. Corr: 0.72
Epoch: 09 | Epoch Time: 0m 

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 17s
	 Train Loss: 17.107 | Train Corr: 0.10
	 Val. Loss: 4.747 |  Val. Corr: 0.23
Epoch: 01 | Epoch Time: 1m 17s
	 Train Loss: 3.311 | Train Corr: 0.54
	 Val. Loss: 4.941 |  Val. Corr: 0.46
updating saved weights of best model
Epoch: 02 | Epoch Time: 1m 17s
	 Train Loss: 2.803 | Train Corr: 0.64
	 Val. Loss: 4.212 |  Val. Corr: 0.59
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 17s
	 Train Loss: 2.173 | Train Corr: 0.73
	 Val. Loss: 3.355 |  Val. Corr: 0.57
Epoch: 04 | Epoch Time: 1m 16s
	 Train Loss: 1.469 | Train Corr: 0.83
	 Val. Loss: 4.759 |  Val. Corr: 0.61
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 17s
	 Train Loss: 1.132 | Train Corr: 0.87
	 Val. Loss: 2.924 |  Val. Corr: 0.60
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 17s
	 Train Loss: 16.903 | Train Corr: 0.12
	 Val. Loss: 4.546 |  Val. Corr: 0.33
Epoch: 01 | Epoch Time: 1m 18s
	 Train Loss: 3.360 | Train Corr: 0.52
	 Val. Loss: 6.544 |  Val. Corr: 0.52
Epoch: 02 | Epoch Time: 1m 18s
	 Train Loss: 3.010 | Train Corr: 0.58
	 Val. Loss: 7.383 |  Val. Corr: 0.60
Epoch: 03 | Epoch Time: 1m 18s
	 Train Loss: 2.131 | Train Corr: 0.73
	 Val. Loss: 4.416 |  Val. Corr: 0.58
Epoch: 04 | Epoch Time: 1m 18s
	 Train Loss: 1.684 | Train Corr: 0.80
	 Val. Loss: 4.485 |  Val. Corr: 0.63
Epoch: 05 | Epoch Time: 1m 19s
	 Train Loss: 1.336 | Train Corr: 0.84
	 Val. Loss: 3.352 |  Val. Corr: 0.63
Epoch: 06 | Epoch Time: 1m 18s
	 Train Loss: 1.012 | Train Corr: 0.88
	 Val. Loss: 3.166 |  Val. Corr: 0.62
Epoch: 07 | Epoch Time: 1m 17s
	 Train Loss: 0.789 | Train Corr: 0.91
	 Val. Loss: 4.135 |  Val. Corr: 0.64
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 16s
	 Train Loss: 17.609 | Train Corr: 0.11
	 Val. Loss: 4.374 |  Val. Corr: 0.49
Epoch: 01 | Epoch Time: 1m 17s
	 Train Loss: 3.815 | Train Corr: 0.46
	 Val. Loss: 3.872 |  Val. Corr: 0.68
Epoch: 02 | Epoch Time: 1m 17s
	 Train Loss: 3.034 | Train Corr: 0.60
	 Val. Loss: 3.503 |  Val. Corr: 0.72
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 17s
	 Train Loss: 2.004 | Train Corr: 0.76
	 Val. Loss: 2.191 |  Val. Corr: 0.74
Epoch: 04 | Epoch Time: 1m 16s
	 Train Loss: 1.311 | Train Corr: 0.85
	 Val. Loss: 2.662 |  Val. Corr: 0.74
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 18s
	 Train Loss: 0.916 | Train Corr: 0.90
	 Val. Loss: 2.084 |  Val. Corr: 0.77
Epoch: 06 | Epoch Time: 1m 18s
	 Train Loss: 0.673 | Train Corr: 0.93
	 Val. Loss: 2.314 |  Val. Corr: 0.73
updating saved weights of best model
Epoch: 07 | Epoch Time: 1m 17s
	 Train Loss: 0.496 | Train Corr: 0.95
	 Val. Loss: 1.951 |  Val. Corr: 0.74
Epoch: 08 | Epoch Time: 

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 15s
	 Train Loss: 16.407 | Train Corr: 0.14
	 Val. Loss: 4.711 |  Val. Corr: 0.49
Epoch: 01 | Epoch Time: 1m 14s
	 Train Loss: 3.093 | Train Corr: 0.57
	 Val. Loss: 3.687 |  Val. Corr: 0.60
Epoch: 02 | Epoch Time: 1m 16s
	 Train Loss: 2.703 | Train Corr: 0.64
	 Val. Loss: 2.585 |  Val. Corr: 0.72
Epoch: 03 | Epoch Time: 1m 15s
	 Train Loss: 2.382 | Train Corr: 0.70
	 Val. Loss: 3.552 |  Val. Corr: 0.72
Epoch: 04 | Epoch Time: 1m 15s
	 Train Loss: 1.373 | Train Corr: 0.84
	 Val. Loss: 2.430 |  Val. Corr: 0.76
Epoch: 05 | Epoch Time: 1m 16s
	 Train Loss: 0.952 | Train Corr: 0.89
	 Val. Loss: 2.487 |  Val. Corr: 0.75
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 12s
	 Train Loss: 16.094 | Train Corr: 0.13
	 Val. Loss: 4.215 |  Val. Corr: 0.45
Epoch: 01 | Epoch Time: 1m 13s
	 Train Loss: 3.097 | Train Corr: 0.57
	 Val. Loss: 3.702 |  Val. Corr: 0.56
Epoch: 02 | Epoch Time: 1m 12s
	 Train Loss: 2.757 | Train Corr: 0.63
	 Val. Loss: 3.214 |  Val. Corr: 0.66
Epoch: 03 | Epoch Time: 1m 12s
	 Train Loss: 1.818 | Train Corr: 0.78
	 Val. Loss: 3.291 |  Val. Corr: 0.67
Epoch: 04 | Epoch Time: 1m 11s
	 Train Loss: 1.306 | Train Corr: 0.85
	 Val. Loss: 2.864 |  Val. Corr: 0.65
Epoch: 05 | Epoch Time: 1m 13s
	 Train Loss: 1.064 | Train Corr: 0.88
	 Val. Loss: 2.616 |  Val. Corr: 0.70
Epoch: 06 | Epoch Time: 1m 11s
	 Train Loss: 0.705 | Train Corr: 0.92
	 Val. Loss: 2.866 |  Val. Corr: 0.62
eid 20, params {'batch_size': 8, 'bidirectional': True, 'dropout': 0.2, 'hidden_dim': 512, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 3}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 33s
	 Train Loss: 20.477 | Train Corr: 0.03
	 Val. Loss: 4.114 |  Val. Corr: 0.43
Epoch: 01 | Epoch Time: 1m 33s
	 Train Loss: 3.749 | Train Corr: 0.44
	 Val. Loss: 4.286 |  Val. Corr: 0.40
Epoch: 02 | Epoch Time: 1m 33s
	 Train Loss: 3.318 | Train Corr: 0.54
	 Val. Loss: 5.586 |  Val. Corr: 0.39
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 33s
	 Train Loss: 20.651 | Train Corr: 0.05
	 Val. Loss: 5.521 |  Val. Corr: 0.38
Epoch: 01 | Epoch Time: 1m 33s
	 Train Loss: 3.706 | Train Corr: 0.44
	 Val. Loss: 7.047 |  Val. Corr: 0.40
Epoch: 02 | Epoch Time: 1m 33s
	 Train Loss: 3.210 | Train Corr: 0.54
	 Val. Loss: 7.648 |  Val. Corr: 0.31
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 32s
	 Train Loss: 20.302 | Train Corr: 0.08
	 Val. Loss: 3.876 |  Val. Corr: 0.53
Epoch: 01 | Epoch Time: 1m 33s
	 Train Loss: 4.008 | Train Corr: 0.42
	 Val. Loss: 4.462 |  Val. Corr: 0.58
Epoch: 02 | Epoch Time: 1m 33s
	 Train Loss: 3.304 | Train Corr: 0.55
	 Val. Loss: 4.211 |  Val. Corr: 0.72
updating saved weights of best model
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 30s
	 Train Loss: 20.163 | Train Corr: 0.08
	 Val. Loss: 4.305 |  Val. Corr: 0.56
Epoch: 01 | Epoch Time: 1m 29s
	 Train Loss: 3.311 | Train Corr: 0.53
	 Val. Loss: 3.566 |  Val. Corr: 0.64
Epoch: 02 | Epoch Time: 1m 31s
	 Train Loss: 2.938 | Train Corr: 0.60
	 Val. Loss: 3.581 |  Val. Corr: 0.69
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 31s
	 Train Loss: 1.911 | Train Corr: 0.76
	 Val. Loss: 2.488 |  Val. Corr: 0.72
updating saved weights of best model
Epoch: 04 | Epoch Time: 1m 30s
	 Train Loss: 1.403 | Train Corr: 0.83
	 Val. Loss: 2.077 |  Val. Corr: 0.76
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 32s
	 Train Loss: 1.028 | Train Corr: 0.88
	 Val. Loss: 2.058 |  Val. Corr: 0.76
Epoch: 06 | Epoch Time: 1m 32s
	 Train Loss: 0.719 | Train Corr: 0.92
	 Val. Loss: 2.502 |  Val. Corr: 0.76
Epoch: 07 | Epoch Time: 1m 32s
	 Train Loss: 0.530 | Train Corr: 0.94
	 Val. Loss: 2.304 |  Val. Corr: 0.74
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 27s
	 Train Loss: 19.962 | Train Corr: 0.10
	 Val. Loss: 4.440 |  Val. Corr: 0.50
Epoch: 01 | Epoch Time: 1m 28s
	 Train Loss: 3.283 | Train Corr: 0.53
	 Val. Loss: 3.710 |  Val. Corr: 0.54
Epoch: 02 | Epoch Time: 1m 26s
	 Train Loss: 2.726 | Train Corr: 0.64
	 Val. Loss: 3.072 |  Val. Corr: 0.64
Epoch: 03 | Epoch Time: 1m 27s
	 Train Loss: 1.925 | Train Corr: 0.76
	 Val. Loss: 3.089 |  Val. Corr: 0.69
Epoch: 04 | Epoch Time: 1m 26s
	 Train Loss: 1.535 | Train Corr: 0.82
	 Val. Loss: 2.676 |  Val. Corr: 0.70
Epoch: 05 | Epoch Time: 1m 28s
	 Train Loss: 1.048 | Train Corr: 0.88
	 Val. Loss: 2.386 |  Val. Corr: 0.71
Epoch: 06 | Epoch Time: 1m 26s
	 Train Loss: 0.704 | Train Corr: 0.92
	 Val. Loss: 2.717 |  Val. Corr: 0.65
Epoch: 07 | Epoch Time: 1m 28s
	 Train Loss: 0.572 | Train Corr: 0.94
	 Val. Loss: 2.563 |  Val. Corr: 0.69
Epoch: 08 | Epoch Time: 1m 26s
	 Train Loss: 0.461 | Train Corr: 0.95
	 Val. Loss: 2.415 |  Val. Corr: 0.72
Epoch: 09 | Epoch Time: 1m 

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 12s
	 Train Loss: 11.279 | Train Corr: 0.20
	 Val. Loss: 4.554 |  Val. Corr: 0.16
updating saved weights of best model
Epoch: 01 | Epoch Time: 1m 13s
	 Train Loss: 3.085 | Train Corr: 0.58
	 Val. Loss: 4.331 |  Val. Corr: 0.45
Epoch: 02 | Epoch Time: 1m 13s
	 Train Loss: 2.344 | Train Corr: 0.71
	 Val. Loss: 6.668 |  Val. Corr: 0.53
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 12s
	 Train Loss: 1.785 | Train Corr: 0.78
	 Val. Loss: 3.230 |  Val. Corr: 0.54
Epoch: 04 | Epoch Time: 1m 12s
	 Train Loss: 1.064 | Train Corr: 0.88
	 Val. Loss: 3.406 |  Val. Corr: 0.59
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 13s
	 Train Loss: 11.178 | Train Corr: 0.17
	 Val. Loss: 4.841 |  Val. Corr: 0.41
Epoch: 01 | Epoch Time: 1m 13s
	 Train Loss: 3.359 | Train Corr: 0.51
	 Val. Loss: 6.417 |  Val. Corr: 0.52
Epoch: 02 | Epoch Time: 1m 13s
	 Train Loss: 2.960 | Train Corr: 0.59
	 Val. Loss: 6.613 |  Val. Corr: 0.66
Epoch: 03 | Epoch Time: 1m 13s
	 Train Loss: 1.968 | Train Corr: 0.75
	 Val. Loss: 3.416 |  Val. Corr: 0.66
Epoch: 04 | Epoch Time: 1m 13s
	 Train Loss: 1.140 | Train Corr: 0.87
	 Val. Loss: 4.618 |  Val. Corr: 0.69
Epoch: 05 | Epoch Time: 1m 14s
	 Train Loss: 0.808 | Train Corr: 0.91
	 Val. Loss: 4.033 |  Val. Corr: 0.66
updating saved weights of best model
Epoch: 06 | Epoch Time: 1m 14s
	 Train Loss: 0.609 | Train Corr: 0.93
	 Val. Loss: 3.182 |  Val. Corr: 0.64
updating saved weights of best model
Epoch: 07 | Epoch Time: 1m 12s
	 Train Loss: 0.486 | Train Corr: 0.94
	 Val. Loss: 2.821 |  Val. Corr: 0.68
Epoch: 08 | Epoch Time: 1m 13s
	 Train Loss: 0.400 | Train Co

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 12s
	 Train Loss: 12.364 | Train Corr: 0.18
	 Val. Loss: 3.720 |  Val. Corr: 0.61
Epoch: 01 | Epoch Time: 1m 13s
	 Train Loss: 3.404 | Train Corr: 0.54
	 Val. Loss: 3.546 |  Val. Corr: 0.72
updating saved weights of best model
Epoch: 02 | Epoch Time: 1m 13s
	 Train Loss: 2.466 | Train Corr: 0.70
	 Val. Loss: 1.850 |  Val. Corr: 0.77
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 13s
	 Train Loss: 1.691 | Train Corr: 0.80
	 Val. Loss: 1.690 |  Val. Corr: 0.79
Epoch: 04 | Epoch Time: 1m 12s
	 Train Loss: 1.144 | Train Corr: 0.87
	 Val. Loss: 1.820 |  Val. Corr: 0.81
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 14s
	 Train Loss: 0.796 | Train Corr: 0.91
	 Val. Loss: 1.643 |  Val. Corr: 0.80
Epoch: 06 | Epoch Time: 1m 13s
	 Train Loss: 0.502 | Train Corr: 0.95
	 Val. Loss: 1.701 |  Val. Corr: 0.79
updating saved weights of best model
Epoch: 07 | Epoch Time: 1m 12s
	 Train Loss: 0.394 | Train Corr: 0.96
	 Val. Loss: 1.594 |  Val

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 11s
	 Train Loss: 11.204 | Train Corr: 0.21
	 Val. Loss: 3.874 |  Val. Corr: 0.53
Epoch: 01 | Epoch Time: 1m 10s
	 Train Loss: 2.750 | Train Corr: 0.63
	 Val. Loss: 2.046 |  Val. Corr: 0.76
Epoch: 02 | Epoch Time: 1m 11s
	 Train Loss: 2.075 | Train Corr: 0.74
	 Val. Loss: 2.664 |  Val. Corr: 0.75
Epoch: 03 | Epoch Time: 1m 11s
	 Train Loss: 1.387 | Train Corr: 0.83
	 Val. Loss: 1.998 |  Val. Corr: 0.78
Epoch: 04 | Epoch Time: 1m 11s
	 Train Loss: 0.808 | Train Corr: 0.91
	 Val. Loss: 2.002 |  Val. Corr: 0.77
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 8s
	 Train Loss: 11.496 | Train Corr: 0.19
	 Val. Loss: 3.930 |  Val. Corr: 0.60
Epoch: 01 | Epoch Time: 1m 8s
	 Train Loss: 2.902 | Train Corr: 0.60
	 Val. Loss: 3.669 |  Val. Corr: 0.67
Epoch: 02 | Epoch Time: 1m 7s
	 Train Loss: 2.149 | Train Corr: 0.73
	 Val. Loss: 2.696 |  Val. Corr: 0.71
Epoch: 03 | Epoch Time: 1m 8s
	 Train Loss: 1.482 | Train Corr: 0.82
	 Val. Loss: 2.549 |  Val. Corr: 0.70
Epoch: 04 | Epoch Time: 1m 7s
	 Train Loss: 0.878 | Train Corr: 0.90
	 Val. Loss: 3.904 |  Val. Corr: 0.66
Epoch: 05 | Epoch Time: 1m 8s
	 Train Loss: 0.700 | Train Corr: 0.92
	 Val. Loss: 2.281 |  Val. Corr: 0.72
Epoch: 06 | Epoch Time: 1m 7s
	 Train Loss: 0.610 | Train Corr: 0.93
	 Val. Loss: 2.710 |  Val. Corr: 0.70
Epoch: 07 | Epoch Time: 1m 9s
	 Train Loss: 0.480 | Train Corr: 0.95
	 Val. Loss: 2.398 |  Val. Corr: 0.70
Epoch: 08 | Epoch Time: 1m 7s
	 Train Loss: 0.355 | Train Corr: 0.96
	 Val. Loss: 2.669 |  Val. Corr: 0.68
Epoch: 09 | Epoch Time: 1m 9s
	 Trai

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 1m 41s
	 Train Loss: 13.890 | Train Corr: 0.19
	 Val. Loss: 4.120 |  Val. Corr: 0.33
Epoch: 01 | Epoch Time: 1m 41s
	 Train Loss: 3.407 | Train Corr: 0.53
	 Val. Loss: 4.965 |  Val. Corr: 0.42
updating saved weights of best model
Epoch: 02 | Epoch Time: 1m 41s
	 Train Loss: 2.802 | Train Corr: 0.63
	 Val. Loss: 3.837 |  Val. Corr: 0.59
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 41s
	 Train Loss: 2.029 | Train Corr: 0.75
	 Val. Loss: 3.029 |  Val. Corr: 0.59
Epoch: 04 | Epoch Time: 1m 40s
	 Train Loss: 1.123 | Train Corr: 0.87
	 Val. Loss: 3.283 |  Val. Corr: 0.64
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 41s
	 Train Loss: 0.844 | Train Corr: 0.91
	 Val. Loss: 2.807 |  Val. Corr: 0.64
Epoch: 06 | Epoch Time: 1m 42s
	 Train Loss: 0.687 | Train Corr: 0.92
	 Val. Loss: 2.975 |  Val. Corr: 0.63
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 42s
	 Train Loss: 14.259 | Train Corr: 0.15
	 Val. Loss: 5.593 |  Val. Corr: 0.26
Epoch: 01 | Epoch Time: 1m 42s
	 Train Loss: 3.474 | Train Corr: 0.50
	 Val. Loss: 7.470 |  Val. Corr: 0.60
Epoch: 02 | Epoch Time: 1m 41s
	 Train Loss: 2.650 | Train Corr: 0.65
	 Val. Loss: 6.289 |  Val. Corr: 0.68
Epoch: 03 | Epoch Time: 1m 42s
	 Train Loss: 1.698 | Train Corr: 0.79
	 Val. Loss: 3.254 |  Val. Corr: 0.64
Epoch: 04 | Epoch Time: 1m 42s
	 Train Loss: 1.126 | Train Corr: 0.87
	 Val. Loss: 3.428 |  Val. Corr: 0.69
Epoch: 05 | Epoch Time: 1m 43s
	 Train Loss: 0.845 | Train Corr: 0.90
	 Val. Loss: 4.971 |  Val. Corr: 0.69
Epoch: 06 | Epoch Time: 1m 43s
	 Train Loss: 0.676 | Train Corr: 0.92
	 Val. Loss: 3.332 |  Val. Corr: 0.66
Epoch: 07 | Epoch Time: 1m 41s
	 Train Loss: 0.470 | Train Corr: 0.95
	 Val. Loss: 4.072 |  Val. Corr: 0.68
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 40s
	 Train Loss: 13.902 | Train Corr: 0.15
	 Val. Loss: 3.697 |  Val. Corr: 0.50
Epoch: 01 | Epoch Time: 1m 41s
	 Train Loss: 3.473 | Train Corr: 0.52
	 Val. Loss: 4.315 |  Val. Corr: 0.47
Epoch: 02 | Epoch Time: 1m 42s
	 Train Loss: 2.982 | Train Corr: 0.61
	 Val. Loss: 5.274 |  Val. Corr: 0.70
updating saved weights of best model
Epoch: 03 | Epoch Time: 1m 41s
	 Train Loss: 2.172 | Train Corr: 0.74
	 Val. Loss: 2.160 |  Val. Corr: 0.76
Epoch: 04 | Epoch Time: 1m 40s
	 Train Loss: 1.483 | Train Corr: 0.83
	 Val. Loss: 2.744 |  Val. Corr: 0.77
updating saved weights of best model
Epoch: 05 | Epoch Time: 1m 43s
	 Train Loss: 1.010 | Train Corr: 0.89
	 Val. Loss: 1.941 |  Val. Corr: 0.77
Epoch: 06 | Epoch Time: 1m 42s
	 Train Loss: 0.761 | Train Corr: 0.92
	 Val. Loss: 2.206 |  Val. Corr: 0.75
Epoch: 07 | Epoch Time: 1m 40s
	 Train Loss: 0.558 | Train Corr: 0.94
	 Val. Loss: 2.523 |  Val. Corr: 0.74
Epoch: 08 | Epoch Time: 1m 40s
	 Train Loss: 0.360 | Train Co

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 39s
	 Train Loss: 13.947 | Train Corr: 0.17
	 Val. Loss: 3.794 |  Val. Corr: 0.54
Epoch: 01 | Epoch Time: 1m 37s
	 Train Loss: 3.087 | Train Corr: 0.57
	 Val. Loss: 3.702 |  Val. Corr: 0.66
Epoch: 02 | Epoch Time: 1m 40s
	 Train Loss: 2.594 | Train Corr: 0.66
	 Val. Loss: 2.653 |  Val. Corr: 0.74
Epoch: 03 | Epoch Time: 1m 39s
	 Train Loss: 1.979 | Train Corr: 0.75
	 Val. Loss: 2.354 |  Val. Corr: 0.76
Epoch: 04 | Epoch Time: 1m 39s
	 Train Loss: 1.207 | Train Corr: 0.86
	 Val. Loss: 2.123 |  Val. Corr: 0.76
Epoch: 05 | Epoch Time: 1m 40s
	 Train Loss: 1.041 | Train Corr: 0.88
	 Val. Loss: 2.171 |  Val. Corr: 0.77
Epoch: 06 | Epoch Time: 1m 40s
	 Train Loss: 0.703 | Train Corr: 0.92
	 Val. Loss: 2.236 |  Val. Corr: 0.75
Epoch: 07 | Epoch Time: 1m 40s
	 Train Loss: 0.521 | Train Corr: 0.94
	 Val. Loss: 2.314 |  Val. Corr: 0.78
Epoch: 08 | Epoch Time: 1m 37s
	 Train Loss: 0.447 | Train Corr: 0.95
	 Val. Loss: 2.179 |  Val. Corr: 0.77
Epoch: 09 | Epoch Time: 1m 

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 1m 35s
	 Train Loss: 14.068 | Train Corr: 0.15
	 Val. Loss: 4.684 |  Val. Corr: 0.43
Epoch: 01 | Epoch Time: 1m 36s
	 Train Loss: 3.170 | Train Corr: 0.56
	 Val. Loss: 3.753 |  Val. Corr: 0.58
Epoch: 02 | Epoch Time: 1m 34s
	 Train Loss: 2.704 | Train Corr: 0.65
	 Val. Loss: 3.139 |  Val. Corr: 0.71
Epoch: 03 | Epoch Time: 1m 34s
	 Train Loss: 1.759 | Train Corr: 0.78
	 Val. Loss: 3.400 |  Val. Corr: 0.69
Epoch: 04 | Epoch Time: 1m 34s
	 Train Loss: 1.076 | Train Corr: 0.87
	 Val. Loss: 2.774 |  Val. Corr: 0.67
eid 23, params {'batch_size': 8, 'bidirectional': True, 'dropout': 0.2, 'hidden_dim': 768, 'lr': 5e-05, 'max_epochs': 10, 'num_layers': 3}
training on fold 0


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


updating saved weights of best model
Epoch: 00 | Epoch Time: 2m 9s
	 Train Loss: 15.640 | Train Corr: 0.14
	 Val. Loss: 4.321 |  Val. Corr: 0.37
updating saved weights of best model
Epoch: 01 | Epoch Time: 2m 9s
	 Train Loss: 3.224 | Train Corr: 0.56
	 Val. Loss: 3.404 |  Val. Corr: 0.57
Epoch: 02 | Epoch Time: 2m 9s
	 Train Loss: 2.388 | Train Corr: 0.70
	 Val. Loss: 4.137 |  Val. Corr: 0.56
updating saved weights of best model
Epoch: 03 | Epoch Time: 2m 9s
	 Train Loss: 1.699 | Train Corr: 0.80
	 Val. Loss: 2.903 |  Val. Corr: 0.59
Epoch: 04 | Epoch Time: 2m 8s
	 Train Loss: 1.211 | Train Corr: 0.86
	 Val. Loss: 2.911 |  Val. Corr: 0.59
updating saved weights of best model
training on fold 1


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 2m 10s
	 Train Loss: 16.077 | Train Corr: 0.11
	 Val. Loss: 4.789 |  Val. Corr: 0.42
Epoch: 01 | Epoch Time: 2m 10s
	 Train Loss: 3.624 | Train Corr: 0.47
	 Val. Loss: 7.604 |  Val. Corr: 0.37
Epoch: 02 | Epoch Time: 2m 10s
	 Train Loss: 3.175 | Train Corr: 0.55
	 Val. Loss: 7.242 |  Val. Corr: 0.60
training on fold 2


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 2m 8s
	 Train Loss: 16.953 | Train Corr: 0.05
	 Val. Loss: 4.085 |  Val. Corr: 0.20
Epoch: 01 | Epoch Time: 2m 10s
	 Train Loss: 4.719 | Train Corr: 0.22
	 Val. Loss: 4.356 |  Val. Corr: 0.40
Epoch: 02 | Epoch Time: 2m 10s
	 Train Loss: 3.683 | Train Corr: 0.48
	 Val. Loss: 5.084 |  Val. Corr: 0.44
Epoch: 03 | Epoch Time: 2m 10s
	 Train Loss: 3.278 | Train Corr: 0.56
	 Val. Loss: 4.013 |  Val. Corr: 0.50
training on fold 3


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 2m 6s
	 Train Loss: 16.490 | Train Corr: 0.08
	 Val. Loss: 5.250 |  Val. Corr: 0.25
Epoch: 01 | Epoch Time: 2m 5s
	 Train Loss: 3.378 | Train Corr: 0.51
	 Val. Loss: 4.898 |  Val. Corr: 0.28
training on fold 4


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch: 00 | Epoch Time: 2m 2s
	 Train Loss: 16.431 | Train Corr: 0.07
	 Val. Loss: 5.157 |  Val. Corr: 0.44
Epoch: 01 | Epoch Time: 2m 3s
	 Train Loss: 3.308 | Train Corr: 0.53
	 Val. Loss: 4.068 |  Val. Corr: 0.49
Epoch: 02 | Epoch Time: 2m 1s
	 Train Loss: 2.607 | Train Corr: 0.66
	 Val. Loss: 4.563 |  Val. Corr: 0.59
Epoch: 03 | Epoch Time: 2m 1s
	 Train Loss: 1.976 | Train Corr: 0.75
	 Val. Loss: 5.861 |  Val. Corr: 0.55
Epoch: 04 | Epoch Time: 2m 1s
	 Train Loss: 1.308 | Train Corr: 0.85
	 Val. Loss: 3.163 |  Val. Corr: 0.63
updating saved weights of best model
Epoch: 05 | Epoch Time: 2m 3s
	 Train Loss: 0.973 | Train Corr: 0.89
	 Val. Loss: 2.697 |  Val. Corr: 0.70
('batch_size_8; bidirectional_True; dropout_0.2; hidden_dim_768; lr_5e-05; max_epochs_10; num_layers_1', 0.6996348896257089)


# Test the trained model on held-out dataset.

In [6]:
# Get a test iterator
# use batch size from best params!!
test_iterator = training.get_iterator(test_dataset, 8, device)

In [7]:
# load the best model saved
bert = DistilBertModel.from_pretrained(constants.WEIGHTS_NAME)
# use the params from the best model!!! 
model = models.BERTRNN(bert, constants.OUTPUT_DIM, 768, 1, True, 0.2)
model.load_state_dict(torch.load("21_best_valid_loss.pt"))
model.to(device)
model.eval()
# If you change the criterion, make sure it matches with the training criterion in training.py
criterion = nn.MSELoss(size_average=False)
criterion = criterion.to(device)
test_loss, test_corr = training.evaluate(model, test_iterator, criterion, debug=True)
print(test_loss)
print(test_corr)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


predictions: tensor([4.3092, 2.5972, 4.9821, 3.7691, 4.0425, 4.0092, 4.1976, 3.6279],
       device='cuda:0')
true labels: tensor([4.8000, 2.7500, 5.3250, 3.9250, 4.7250, 5.2000, 4.7500, 4.7250],
       device='cuda:0')
predictions: tensor([4.4185, 4.0181, 3.9511, 4.4189, 4.6874, 2.4615, 4.5022, 3.6038],
       device='cuda:0')
true labels: tensor([4.4500, 4.5000, 4.2750, 4.3750, 4.8750, 3.3250, 4.8000, 3.7250],
       device='cuda:0')
predictions: tensor([4.6404, 3.5096, 3.9365, 3.2364, 3.2019, 4.2997, 3.9174, 2.8325],
       device='cuda:0')
true labels: tensor([5.0000, 3.3500, 4.0250, 4.1000, 4.9000, 4.3000, 3.1750, 2.6750],
       device='cuda:0')
predictions: tensor([3.5482, 3.5646, 4.7370, 2.4181, 4.3563, 3.4085, 3.8179, 3.4318],
       device='cuda:0')
true labels: tensor([4.8250, 4.1500, 5.2750, 2.6500, 5.0000, 4.7000, 4.3000, 3.7750],
       device='cuda:0')
predictions: tensor([3.8663, 3.7502, 3.8978, 3.4236, 4.2574, 4.9229, 4.0129, 4.6737],
       device='cuda:0')
true label

# Misc other stuff

Link to the trainer class: https://huggingface.co/transformers/main_classes/trainer.html



Default training arguments: https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments

Batch size per device: 8

Epoch: 3



This should be the model I used to generate my initial results: https://huggingface.co/transformers/model_doc/distilbert.html#distilbertforsequenceclassification
"DistilBert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks."