<a href="https://colab.research.google.com/github/goerlitz/nlp-classification/blob/main/notebooks/10kGNAD/colab/21c_10kGNAD_huggingface_basic_optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter Optimization with HuggingFace Transformers

Adapted from https://huggingface.co/docs/transformers/custom_datasets#sequence-classification-with-imdb-reviews

Things we need
* a tokenizer
* tokenized input data
* a pretrained model
* evaluation metrics
* training parameters
* a Trainer instance

Notes
* [class labels can be included in the model config](https://github.com/huggingface/transformers/pull/2945#issuecomment-781986506) (a bit hacky)
* [fp16 is disabled on tesla P100 GPU in pytorch](https://discuss.pytorch.org/t/cnn-fp16-slower-than-fp32-on-tesla-p100/12146)

## Prerequisites

In [1]:
checkpoint = "distilbert-base-german-cased"

# checkpoint = "deepset/gbert-base"

# checkpoint = "deepset/gelectra-base"

project_name = f'10kgnad_hf__{checkpoint.replace("/", "_")}'

### Connect Google Drive

Will be used to save results

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [3]:
from pathlib import Path

# define model path
root_path = Path('/content/gdrive/My Drive/')
base_path = root_path / 'Colab Notebooks/nlp-classification/'
model_path = base_path / 'models'

## Check GPU

In [4]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Sat Dec 18 18:14:59 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Install Packages

In [5]:
%%time
!pip install -q -U transformers datasets >/dev/null
!pip install -q -U optuna >/dev/null

# check installed version
!pip freeze | grep optuna        # optuna==2.10.0
!pip freeze | grep transformers  # transformers==4.13.0
!pip freeze | grep torch         # torch==1.10.0+cu111

optuna==2.10.0
transformers==4.14.1
torch @ https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchaudio @ https://download.pytorch.org/whl/cu111/torchaudio-0.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchsummary==1.5.1
torchtext==0.11.0
torchvision @ https://download.pytorch.org/whl/cu111/torchvision-0.11.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
CPU times: user 126 ms, sys: 43.1 ms, total: 169 ms
Wall time: 19.1 s


In [6]:
from transformers import logging

# hide progress bar when downloading tokenizer and model (a workaround!)
logging.get_verbosity = lambda : logging.NOTSET

## Load Dataset

In [7]:
from datasets import load_dataset

gnad10k = load_dataset("gnad10")
label_names = gnad10k["train"].features["label"].names

Downloading:   0%|          | 0.00/1.50k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/987 [00:00<?, ?B/s]

Using custom data configuration default


Downloading and preparing dataset gnad10/default (download: 25.90 MiB, generated: 25.92 MiB, post-processed: Unknown size, total: 51.82 MiB) to /root/.cache/huggingface/datasets/gnad10/default/1.1.0/3a8445be65795ad88270af4d797034c3d99f70f8352ca658c586faf1cf960881...


Downloading:   0%|          | 0.00/9.67M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.09M [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

Dataset gnad10 downloaded and prepared to /root/.cache/huggingface/datasets/gnad10/default/1.1.0/3a8445be65795ad88270af4d797034c3d99f70f8352ca658c586faf1cf960881. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

In [8]:
print(gnad10k)
print("labels:", label_names)
gnad10k["train"][0]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 9245
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1028
    })
})
labels: ['Web', 'Panorama', 'International', 'Wirtschaft', 'Sport', 'Inland', 'Etat', 'Wissenschaft', 'Kultur']


{'label': 4,
 'text': '21-Jähriger fällt wohl bis Saisonende aus. Wien – Rapid muss wohl bis Saisonende auf Offensivspieler Thomas Murg verzichten. Der im Winter aus Ried gekommene 21-Jährige erlitt beim 0:4-Heimdebakel gegen Admira Wacker Mödling am Samstag einen Teilriss des Innenbandes im linken Knie, wie eine Magnetresonanz-Untersuchung am Donnerstag ergab. Murg erhielt eine Schiene, muss aber nicht operiert werden. Dennoch steht ihm eine mehrwöchige Pause bevor.'}

## Data Preprocessing

* Loading the same Tokenizer that was used with the pretrained model.
* Define function to tokenize the text (with truncation to max input length of model.
* Run the tokenization

In [9]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_gnad10k = gnad10k.map(preprocess_function, batched=True).remove_columns("text")

  0%|          | 0/10 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

### Use Dynamic Padding

Apply panding only on longest text in batch - this is more efficient than applying padding on the whole dataset.

In [10]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## Model Setup

We want to include the label names and save them together with the model.
The only way to do this is to create a Config and put them in. 

In [11]:
import optuna
from transformers import AutoConfig, AutoModelForSequenceClassification

config = AutoConfig.from_pretrained(
        checkpoint,
        num_labels=len(label_names),
        id2label={i: label for i, label in enumerate(label_names)},
        label2id={label: i for i, label in enumerate(label_names)},
        )

def model_init(trial: optuna.Trial):
    """A function that instantiates the model to be used."""
    return AutoModelForSequenceClassification.from_pretrained(checkpoint, config=config)

### Define Evaluation Metrics

The funtion that computes the metrics needs to be passed to the Trainer.

In [12]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score
import numpy as np
from typing import Dict

def compute_metrics(eval_preds):
    """The function that will be used to compute metrics at evaluation.
    Must take a :class:`~transformers.EvalPrediction` and return a dictionary
    string to metric values."""
    logits, labels = eval_preds
    preds = np.argmax(logits, axis=-1)
    return {
        "acc": accuracy_score(labels, preds),
        "f1": f1_score(labels, preds, average='macro'),
        "precision": precision_score(labels, preds, average='macro'),
        "recall": recall_score(labels, preds, average='macro'),
        }


def objective(metrics: Dict[str, float]):
    """A function computing the main optimization objective from the metrics
    returned by the :obj:`compute_metrics` method.
    To be used in :obj:`Trainer.hyperparameter_search`."""
    return metrics["eval_loss"]

## Hyperparameter Tuning

In [19]:
from transformers import TrainingArguments, Trainer

def hp_space(trial: optuna.Trial):
    """A function that defines the hyperparameter search space.
    To be used in :obj:`Trainer.hyperparameter_search`."""
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 1e-4, log=True),
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", [1]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [32]),
        "weight_decay": trial.suggest_float("weight_decay", 1e-3, 1e-2, log=True),
        "label_smoothing_factor": trial.suggest_float("label_smoothing_factor", 0.0, 0.1),
    }

def train(trial: optuna.Trial):

    # get hyperparameters choice
    hp = hp_space(trial)
    lr = hp["learning_rate"]
    bs = hp["per_device_train_batch_size"]
    epochs = hp["num_train_epochs"]
    weight_decay = hp["weight_decay"]
    label_smoothing_factor = hp["label_smoothing_factor"]

    eval_rounds_per_epoch = 5
    eval_steps = gnad10k["train"].num_rows / bs // eval_rounds_per_epoch

    training_args = TrainingArguments(
        output_dir=str(project_name),
        report_to=[],
        log_level="error",
        disable_tqdm=False,

        evaluation_strategy="steps",
        eval_steps=eval_steps,
        save_strategy="steps",
        save_steps=eval_steps,
        load_best_model_at_end=False,
        metric_for_best_model="eval_loss",
        greater_is_better=False,

        # hyperparameters
        num_train_epochs=epochs,
        learning_rate=lr,
        per_device_train_batch_size=bs,
        per_device_eval_batch_size=bs,
        weight_decay=weight_decay,
        label_smoothing_factor=label_smoothing_factor,

        # fp16=True,  # fp16 is disabled on Tesla P100 by pytorch
    )

    trainer = Trainer(
        model_init=model_init,
        args=training_args,
        train_dataset=tokenized_gnad10k["train"],
        eval_dataset=tokenized_gnad10k["test"],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
    )

    trainer.train()

    result = trainer.evaluate(eval_dataset=tokenized_gnad10k["test"])

    # store eval metrics in trial
    for key in result.keys():
        if key != "epoch":
            trial.set_user_attr(key, result[key])
            
    return result["eval_loss"]


db_path = "/content/gdrive/My Drive/Colab Notebooks/nlp-classification/"
db_name = "10kgnad_optuna"
study_name = checkpoint + "_bs32"

study = optuna.create_study(study_name=study_name,
                            storage=f"sqlite:///{db_path}{db_name}.db",
                            load_if_exists=True,)
study.optimize(train, n_trials=70)

study.best_params

[32m[I 2021-12-18 21:59:52,663][0m A new study created in RDB with name: distilbert-base-german-cased_bs32[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.706757,0.557393,0.411463,0.463815,0.432339
114,No log,1.236785,0.650778,0.547318,0.62314,0.549865
171,No log,1.012911,0.746109,0.705036,0.773022,0.698558
228,No log,0.910466,0.764591,0.739661,0.780489,0.724076
285,No log,0.869748,0.780156,0.757565,0.788226,0.743655



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-18 22:05:16,651][0m Trial 0 finished with value: 0.8694877028465271 and parameters: {'learning_rate': 1.2168529727787439e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0011016250761685322, 'label_smoothing_factor': 0.007080348635779443}. Best is trial 0 with value: 0.8694877028465271.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.624125,0.582685,0.433389,0.45597,0.457305
114,No log,1.142339,0.70428,0.635727,0.758159,0.628629
171,No log,0.936377,0.774319,0.756521,0.788786,0.752246
228,No log,0.838594,0.793774,0.779797,0.804628,0.766013
285,No log,0.801813,0.80642,0.791927,0.811348,0.781109



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-18 22:10:40,352][0m Trial 1 finished with value: 0.8016016483306885 and parameters: {'learning_rate': 1.4281810234926457e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0027276851772832927, 'label_smoothing_factor': 0.01804830526301555}. Best is trial 1 with value: 0.8016016483306885.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.407254,0.624514,0.464429,0.570854,0.493975
114,No log,0.945898,0.75,0.699986,0.769107,0.688898
171,No log,0.762753,0.816148,0.808797,0.821077,0.809081
228,No log,0.676838,0.831712,0.827574,0.841056,0.817653
285,No log,0.646585,0.830739,0.827523,0.83612,0.821071



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-18 22:16:03,405][0m Trial 2 finished with value: 0.6464439034461975 and parameters: {'learning_rate': 1.948765621949202e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.001887314639743122, 'label_smoothing_factor': 0.010433138633857076}. Best is trial 2 with value: 0.6464439034461975.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.016341,0.728599,0.671907,0.773226,0.650258
114,No log,0.699961,0.815175,0.808207,0.846586,0.791322
171,No log,0.561274,0.837549,0.83478,0.838086,0.840291
228,No log,0.509563,0.856031,0.8573,0.860322,0.856287
285,No log,0.488981,0.857004,0.857792,0.858496,0.85782


[32m[I 2021-12-18 22:21:26,467][0m Trial 3 finished with value: 0.4889587163925171 and parameters: {'learning_rate': 3.6058712298915115e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0030736627380008852, 'label_smoothing_factor': 0.007577974813035949}. Best is trial 3 with value: 0.4889587163925171.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.736774,0.550584,0.405265,0.462308,0.425731
114,No log,1.279546,0.642996,0.534565,0.618976,0.539904
171,No log,1.055506,0.738327,0.690623,0.765479,0.684554
228,No log,0.955021,0.7607,0.731925,0.776861,0.717179
285,No log,0.915169,0.772374,0.747679,0.781048,0.732878



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-18 22:26:49,473][0m Trial 4 finished with value: 0.914908766746521 and parameters: {'learning_rate': 1.1574919533404579e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0016466177516886436, 'label_smoothing_factor': 0.015826536419386406}. Best is trial 3 with value: 0.4889587163925171.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.942898,0.789883,0.78126,0.812923,0.775681
114,No log,0.757733,0.840467,0.84032,0.8577,0.829764
171,No log,0.719956,0.855058,0.852031,0.857109,0.8571
228,No log,0.672807,0.871595,0.873638,0.872842,0.874826
285,No log,0.662433,0.868677,0.869285,0.869769,0.869285


[32m[I 2021-12-18 22:32:12,838][0m Trial 5 finished with value: 0.6624715328216553 and parameters: {'learning_rate': 6.489079698919251e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0025777694254033065, 'label_smoothing_factor': 0.06869142376070314}. Best is trial 3 with value: 0.4889587163925171.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.780103,0.553502,0.406979,0.458962,0.428863
114,No log,1.37049,0.64786,0.545292,0.626252,0.548209
171,No log,1.175545,0.747082,0.70951,0.775524,0.700788
228,No log,1.089711,0.764591,0.741154,0.775361,0.726623
285,No log,1.055706,0.775292,0.751291,0.781817,0.7378



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-18 22:37:36,729][0m Trial 6 finished with value: 1.0554828643798828 and parameters: {'learning_rate': 1.1440070687323705e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0038683594078962333, 'label_smoothing_factor': 0.08179320783928892}. Best is trial 3 with value: 0.4889587163925171.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.991051,0.801556,0.798972,0.826133,0.791273
114,No log,0.83975,0.845331,0.844538,0.863981,0.832817
171,No log,0.791446,0.85214,0.849923,0.856002,0.853413
228,No log,0.745909,0.876459,0.875157,0.874787,0.875899
285,No log,0.738997,0.877432,0.876161,0.875758,0.877094


[32m[I 2021-12-18 22:43:00,325][0m Trial 7 finished with value: 0.739052414894104 and parameters: {'learning_rate': 6.117659613639623e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0021876790609985553, 'label_smoothing_factor': 0.09322342987320054}. Best is trial 3 with value: 0.4889587163925171.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.90065,0.791829,0.783648,0.814564,0.777279
114,No log,0.717382,0.840467,0.840633,0.858131,0.830111
171,No log,0.671838,0.854086,0.850453,0.856193,0.85466
228,No log,0.625034,0.86965,0.869765,0.869989,0.869788
285,No log,0.613449,0.873541,0.873513,0.87427,0.87322


[32m[I 2021-12-18 22:48:23,774][0m Trial 8 finished with value: 0.6134863495826721 and parameters: {'learning_rate': 6.704202068404093e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.003613717488550159, 'label_smoothing_factor': 0.05537260044610775}. Best is trial 3 with value: 0.4889587163925171.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.794949,0.785992,0.7737,0.808963,0.769833
114,No log,0.585317,0.845331,0.844029,0.858796,0.835196
171,No log,0.535841,0.858949,0.854358,0.858559,0.859816
228,No log,0.476965,0.871595,0.870221,0.8716,0.869196
285,No log,0.464814,0.873541,0.873024,0.873092,0.873384


[32m[I 2021-12-18 22:53:47,529][0m Trial 9 finished with value: 0.4648672044277191 and parameters: {'learning_rate': 7.179641265858097e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0018766389219752608, 'label_smoothing_factor': 0.0187773529700651}. Best is trial 9 with value: 0.4648672044277191.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.82131,0.780156,0.762364,0.811623,0.753574
114,No log,0.626035,0.847276,0.845869,0.857087,0.837951
171,No log,0.604982,0.848249,0.848981,0.857008,0.85342
228,No log,0.542101,0.875486,0.87209,0.872901,0.871971
285,No log,0.532722,0.876459,0.874956,0.875011,0.87526


[32m[I 2021-12-18 22:59:11,171][0m Trial 10 finished with value: 0.5326570272445679 and parameters: {'learning_rate': 9.410510319108799e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.009855122451910562, 'label_smoothing_factor': 0.03865862856508288}. Best is trial 9 with value: 0.4648672044277191.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.098779,0.723735,0.668441,0.766305,0.646613
114,No log,0.796612,0.812257,0.805277,0.843852,0.787823
171,No log,0.670973,0.837549,0.834886,0.838617,0.840165
228,No log,0.62225,0.855058,0.855019,0.857332,0.854402
285,No log,0.605398,0.853113,0.852132,0.852699,0.852609


[32m[I 2021-12-18 23:04:34,914][0m Trial 11 finished with value: 0.6053610444068909 and parameters: {'learning_rate': 3.39252287976858e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.005642004881053344, 'label_smoothing_factor': 0.03817256552649226}. Best is trial 9 with value: 0.4648672044277191.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.086739,0.72179,0.667907,0.772518,0.644397
114,No log,0.774418,0.812257,0.804488,0.841649,0.788076
171,No log,0.645889,0.837549,0.834996,0.83816,0.839737
228,No log,0.594939,0.853113,0.853594,0.855028,0.854225
285,No log,0.576782,0.853113,0.853427,0.854286,0.853578


[32m[I 2021-12-18 23:09:58,441][0m Trial 12 finished with value: 0.5767446756362915 and parameters: {'learning_rate': 3.3852785786583304e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0013409175251930675, 'label_smoothing_factor': 0.029814672330629552}. Best is trial 9 with value: 0.4648672044277191.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.890293,0.757782,0.725701,0.791479,0.704717
114,No log,0.605725,0.822957,0.821291,0.851139,0.804721
171,No log,0.503347,0.84144,0.839316,0.843437,0.84369
228,No log,0.448316,0.858949,0.860545,0.860213,0.862221
285,No log,0.431716,0.863813,0.864442,0.863783,0.865769


[32m[I 2021-12-18 23:15:22,380][0m Trial 13 finished with value: 0.43174442648887634 and parameters: {'learning_rate': 4.426721464918857e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0053684971209336995, 'label_smoothing_factor': 0.00033772098835564007}. Best is trial 13 with value: 0.43174442648887634.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.837576,0.762646,0.73315,0.789576,0.715263
114,No log,0.587315,0.827821,0.827313,0.854013,0.812513
171,No log,0.508609,0.847276,0.846427,0.852402,0.850501
228,No log,0.45111,0.855058,0.856549,0.856324,0.857978
285,No log,0.4323,0.861868,0.862464,0.862482,0.862917


[32m[I 2021-12-18 23:20:46,441][0m Trial 14 finished with value: 0.4322931170463562 and parameters: {'learning_rate': 4.852382389052887e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0056542068507408555, 'label_smoothing_factor': 0.0019549953663740164}. Best is trial 13 with value: 0.43174442648887634.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.846871,0.751946,0.721884,0.786281,0.703121
114,No log,0.580696,0.832685,0.834214,0.861197,0.818834
171,No log,0.507913,0.842412,0.841355,0.847867,0.845724
228,No log,0.447729,0.860895,0.862659,0.861337,0.865233
285,No log,0.429819,0.864786,0.864358,0.864317,0.864953


[32m[I 2021-12-18 23:26:10,492][0m Trial 15 finished with value: 0.42982402443885803 and parameters: {'learning_rate': 4.918834683120967e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.006037532521549883, 'label_smoothing_factor': 0.002290120991862342}. Best is trial 15 with value: 0.42982402443885803.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.859112,0.761673,0.734388,0.801545,0.714721
114,No log,0.590379,0.827821,0.82633,0.854176,0.81057
171,No log,0.503282,0.847276,0.846894,0.851692,0.851898
228,No log,0.444187,0.857004,0.858183,0.856786,0.860689
285,No log,0.425911,0.86284,0.86379,0.863746,0.864501


[32m[I 2021-12-18 23:31:34,189][0m Trial 16 finished with value: 0.42592647671699524 and parameters: {'learning_rate': 4.6907158520058296e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.007715250278488532, 'label_smoothing_factor': 9.932882079016056e-05}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.428464,0.639105,0.479954,0.547835,0.507993
114,No log,1.012016,0.759728,0.721726,0.775831,0.707092
171,No log,0.855977,0.821984,0.813853,0.824981,0.815334
228,No log,0.784871,0.84144,0.840803,0.849349,0.834105
285,No log,0.760058,0.835603,0.834207,0.841067,0.829077



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-18 23:36:57,817][0m Trial 17 finished with value: 0.7599374055862427 and parameters: {'learning_rate': 2.0423338564481728e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.009467661900034882, 'label_smoothing_factor': 0.05599772639347858}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.311573,0.648833,0.503384,0.629091,0.522607
114,No log,0.889244,0.781128,0.764945,0.805624,0.746363
171,No log,0.733012,0.830739,0.827386,0.834521,0.831039
228,No log,0.662369,0.847276,0.848817,0.85314,0.845642
285,No log,0.63776,0.840467,0.842692,0.848748,0.838242



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-18 23:42:21,553][0m Trial 18 finished with value: 0.6376599669456482 and parameters: {'learning_rate': 2.3233488022532473e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.007405584548088116, 'label_smoothing_factor': 0.027255114465888707}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.953458,0.755837,0.728197,0.788364,0.709377
114,No log,0.726611,0.828794,0.828344,0.854816,0.813442
171,No log,0.657435,0.845331,0.843741,0.849092,0.847961
228,No log,0.60951,0.859922,0.860348,0.85915,0.862609
285,No log,0.595993,0.864786,0.865076,0.864052,0.866925


[32m[I 2021-12-18 23:47:45,027][0m Trial 19 finished with value: 0.596027672290802 and parameters: {'learning_rate': 4.760544838883083e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.007252193766500143, 'label_smoothing_factor': 0.04534069960519184}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.24528,0.671206,0.552767,0.641545,0.557904
114,No log,0.849672,0.791829,0.776957,0.817332,0.758966
171,No log,0.700987,0.833658,0.829087,0.834527,0.834627
228,No log,0.635866,0.85214,0.855545,0.859177,0.853233
285,No log,0.613416,0.843385,0.845632,0.851251,0.841599



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-18 23:53:08,713][0m Trial 20 finished with value: 0.613343358039856 and parameters: {'learning_rate': 2.5779896765704475e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.004483854900043879, 'label_smoothing_factor': 0.02732347543664355}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.8823,0.756809,0.72607,0.788538,0.706832
114,No log,0.609417,0.82393,0.822113,0.853093,0.805673
171,No log,0.511567,0.84144,0.841406,0.844654,0.846284
228,No log,0.457535,0.861868,0.861753,0.860515,0.864081
285,No log,0.441486,0.866732,0.86723,0.867006,0.867954


[32m[I 2021-12-18 23:58:32,986][0m Trial 21 finished with value: 0.4415104389190674 and parameters: {'learning_rate': 4.54963682546076e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.005639406937623339, 'label_smoothing_factor': 0.0032834045581627788}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.922409,0.75,0.706623,0.781129,0.687809
114,No log,0.634546,0.821012,0.819074,0.853537,0.80045
171,No log,0.515007,0.84144,0.839917,0.842459,0.845677
228,No log,0.461131,0.856031,0.857671,0.858886,0.857717
285,No log,0.443059,0.86284,0.863914,0.864317,0.86411


[32m[I 2021-12-19 00:03:56,873][0m Trial 22 finished with value: 0.44305893778800964 and parameters: {'learning_rate': 4.1626961994329917e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.007160909136464858, 'label_smoothing_factor': 0.0012169637525147677}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.846204,0.775292,0.749333,0.802327,0.735846
114,No log,0.625664,0.835603,0.834089,0.861068,0.818514
171,No log,0.558734,0.849222,0.848246,0.85395,0.852424
228,No log,0.499794,0.86284,0.863786,0.861884,0.867036
285,No log,0.482105,0.867704,0.866758,0.867,0.866992


[32m[I 2021-12-19 00:09:20,592][0m Trial 23 finished with value: 0.4820971190929413 and parameters: {'learning_rate': 5.2499775728921325e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0045711255418751295, 'label_smoothing_factor': 0.01608820907968688}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.152479,0.689689,0.609123,0.76131,0.594428
114,No log,0.774223,0.803502,0.792173,0.829517,0.774187
171,No log,0.626523,0.836576,0.831897,0.836394,0.83823
228,No log,0.563248,0.855058,0.856708,0.859104,0.855686
285,No log,0.541517,0.851167,0.85085,0.854387,0.848327


[32m[I 2021-12-19 00:14:44,107][0m Trial 24 finished with value: 0.5414552688598633 and parameters: {'learning_rate': 2.8773523135577064e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.008277556150410861, 'label_smoothing_factor': 0.01225846272100717}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.787809,0.789883,0.780125,0.829011,0.767836
114,No log,0.58305,0.843385,0.844688,0.854532,0.837575
171,No log,0.563908,0.850195,0.851879,0.863781,0.853392
228,No log,0.499175,0.88035,0.87638,0.877429,0.875516
285,No log,0.488611,0.873541,0.871084,0.871555,0.870916


[32m[I 2021-12-19 00:20:08,180][0m Trial 25 finished with value: 0.4885106086730957 and parameters: {'learning_rate': 9.237731269323765e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.006172710183944598, 'label_smoothing_factor': 0.02673607838901969}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.940094,0.748054,0.70367,0.772842,0.683644
114,No log,0.64256,0.816148,0.813772,0.84881,0.795991
171,No log,0.517424,0.838521,0.83611,0.840448,0.841024
228,No log,0.464169,0.854086,0.854485,0.855469,0.854991
285,No log,0.446354,0.858949,0.859809,0.86021,0.860215


[32m[I 2021-12-19 00:25:31,653][0m Trial 26 finished with value: 0.4463348984718323 and parameters: {'learning_rate': 4.017677428215151e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0045095834811986375, 'label_smoothing_factor': 0.0007385953193358998}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.850292,0.76751,0.737775,0.795601,0.72392
114,No log,0.619893,0.838521,0.839303,0.862795,0.825046
171,No log,0.56476,0.848249,0.848639,0.853991,0.853085
228,No log,0.505724,0.866732,0.868186,0.867012,0.870385
285,No log,0.488203,0.867704,0.868717,0.868685,0.869132


[32m[I 2021-12-19 00:30:55,465][0m Trial 27 finished with value: 0.4881979525089264 and parameters: {'learning_rate': 5.438179511273629e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.006236995238465642, 'label_smoothing_factor': 0.01923292069443836}. Best is trial 16 with value: 0.42592647671699524.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.755646,0.789883,0.77937,0.816982,0.772431
114,No log,0.537394,0.843385,0.844316,0.857132,0.83652
171,No log,0.490452,0.857977,0.854746,0.858068,0.860587
228,No log,0.436133,0.868677,0.869608,0.870668,0.868838
285,No log,0.424363,0.870623,0.872138,0.871349,0.873428


[32m[I 2021-12-19 00:36:19,924][0m Trial 28 finished with value: 0.42434027791023254 and parameters: {'learning_rate': 7.490386315314444e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.008477294572143922, 'label_smoothing_factor': 0.008884041983425909}. Best is trial 28 with value: 0.42434027791023254.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.740768,0.790856,0.783747,0.810647,0.781405
114,No log,0.548632,0.842412,0.840085,0.859568,0.828851
171,No log,0.493413,0.855058,0.851947,0.858547,0.855175
228,No log,0.435024,0.86965,0.870031,0.870059,0.870206
285,No log,0.421299,0.868677,0.866195,0.866954,0.865926


[32m[I 2021-12-19 00:41:44,497][0m Trial 29 finished with value: 0.42126932740211487 and parameters: {'learning_rate': 7.45337603340065e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.008514913193257569, 'label_smoothing_factor': 0.008756157764046854}. Best is trial 29 with value: 0.42126932740211487.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.802954,0.796693,0.78647,0.821295,0.779496
114,No log,0.618384,0.845331,0.844574,0.856068,0.837047
171,No log,0.593909,0.857004,0.85454,0.862175,0.859583
228,No log,0.538528,0.871595,0.871825,0.873872,0.870055
285,No log,0.529323,0.873541,0.872783,0.873087,0.872948


[32m[I 2021-12-19 00:47:08,604][0m Trial 30 finished with value: 0.529251754283905 and parameters: {'learning_rate': 8.179568921195824e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.008254566669168139, 'label_smoothing_factor': 0.03459365678888454}. Best is trial 29 with value: 0.42126932740211487.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.749115,0.792802,0.780882,0.81641,0.774685
114,No log,0.547882,0.842412,0.840243,0.851478,0.83354
171,No log,0.509983,0.856031,0.85483,0.862283,0.85999
228,No log,0.455366,0.866732,0.86728,0.869232,0.865613
285,No log,0.445862,0.86965,0.871907,0.873034,0.871325


[32m[I 2021-12-19 00:52:32,259][0m Trial 31 finished with value: 0.4457281231880188 and parameters: {'learning_rate': 8.148546295658206e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.00866533209757696, 'label_smoothing_factor': 0.012685082738751373}. Best is trial 29 with value: 0.42126932740211487.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.802339,0.776265,0.755506,0.807816,0.741098
114,No log,0.588563,0.835603,0.836864,0.860425,0.823193
171,No log,0.521974,0.846304,0.843723,0.849333,0.848895
228,No log,0.461477,0.867704,0.867547,0.866307,0.869829
285,No log,0.444471,0.865759,0.867212,0.867504,0.867329


[32m[I 2021-12-19 00:57:56,081][0m Trial 32 finished with value: 0.44445914030075073 and parameters: {'learning_rate': 5.6927447382694176e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.006822242853185657, 'label_smoothing_factor': 0.008987514186661086}. Best is trial 29 with value: 0.42126932740211487.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.786069,0.792802,0.783189,0.816389,0.779017
114,No log,0.58868,0.843385,0.840623,0.854456,0.831822
171,No log,0.540152,0.856031,0.854292,0.858372,0.85998
228,No log,0.491514,0.86965,0.868769,0.870253,0.867422
285,No log,0.478629,0.865759,0.865203,0.86467,0.866294


[32m[I 2021-12-19 01:03:19,415][0m Trial 33 finished with value: 0.4786131680011749 and parameters: {'learning_rate': 7.611088203478889e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.009843052360608008, 'label_smoothing_factor': 0.02132318266791563}. Best is trial 29 with value: 0.42126932740211487.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.691553,0.804475,0.793371,0.832192,0.779697
114,No log,0.501339,0.850195,0.852358,0.855663,0.850911
171,No log,0.492203,0.850195,0.849309,0.860953,0.849414
228,No log,0.425113,0.870623,0.869906,0.872336,0.86875
285,No log,0.408669,0.875486,0.874401,0.877284,0.872101


[32m[I 2021-12-19 01:08:42,875][0m Trial 34 finished with value: 0.4084729850292206 and parameters: {'learning_rate': 9.993453790608077e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010256154555735243, 'label_smoothing_factor': 0.0063512306853832235}. Best is trial 34 with value: 0.4084729850292206.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.724589,0.789883,0.78343,0.823596,0.770675
114,No log,0.51988,0.848249,0.849517,0.853782,0.846349
171,No log,0.504136,0.850195,0.849735,0.862497,0.848884
228,No log,0.436013,0.870623,0.869407,0.870329,0.869143
285,No log,0.41816,0.875486,0.875584,0.877522,0.874156


[32m[I 2021-12-19 01:14:06,238][0m Trial 35 finished with value: 0.417968213558197 and parameters: {'learning_rate': 9.769116293390851e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010477428519993742, 'label_smoothing_factor': 0.00826806728251092}. Best is trial 34 with value: 0.4084729850292206.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.725151,0.786965,0.779925,0.824479,0.765679
114,No log,0.517768,0.850195,0.851581,0.857229,0.847437
171,No log,0.494288,0.845331,0.844781,0.855457,0.845115
228,No log,0.437627,0.868677,0.86734,0.869525,0.86631
285,No log,0.418763,0.877432,0.877701,0.880116,0.876048


[32m[I 2021-12-19 01:19:29,521][0m Trial 36 finished with value: 0.4186168909072876 and parameters: {'learning_rate': 9.900786024652101e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010932035242713395, 'label_smoothing_factor': 0.00809980207036672}. Best is trial 34 with value: 0.4084729850292206.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.764837,0.790856,0.779531,0.825322,0.766185
114,No log,0.568367,0.857004,0.857945,0.864018,0.853152
171,No log,0.546162,0.857004,0.854985,0.869316,0.852389
228,No log,0.494696,0.868677,0.865393,0.867829,0.864199
285,No log,0.477075,0.878405,0.876768,0.878913,0.875664


[32m[I 2021-12-19 01:24:53,010][0m Trial 37 finished with value: 0.47695791721343994 and parameters: {'learning_rate': 9.963621913380234e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010621726231993312, 'label_smoothing_factor': 0.023345832675897116}. Best is trial 34 with value: 0.4084729850292206.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.693833,0.804475,0.796895,0.82919,0.788364
114,No log,0.537547,0.835603,0.835105,0.851761,0.823737
171,No log,0.489783,0.857004,0.855148,0.861461,0.859993
228,No log,0.432331,0.874514,0.873452,0.875018,0.872129
285,No log,0.419188,0.874514,0.872574,0.872032,0.873442


[32m[I 2021-12-19 01:30:16,993][0m Trial 38 finished with value: 0.4191163182258606 and parameters: {'learning_rate': 8.775761134215727e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0012597251601471965, 'label_smoothing_factor': 0.008243047886242763}. Best is trial 34 with value: 0.4084729850292206.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.740079,0.785992,0.768262,0.817612,0.758138
114,No log,0.553628,0.846304,0.844433,0.856931,0.835752
171,No log,0.515838,0.854086,0.851667,0.85753,0.857058
228,No log,0.448469,0.879377,0.878391,0.878321,0.878756
285,No log,0.437966,0.879377,0.878826,0.879592,0.878658


[32m[I 2021-12-19 01:35:41,148][0m Trial 39 finished with value: 0.4378919303417206 and parameters: {'learning_rate': 8.820430991323442e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.001257829494881968, 'label_smoothing_factor': 0.014236481513535229}. Best is trial 34 with value: 0.4084729850292206.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.905782,0.79572,0.782825,0.823643,0.767095
114,No log,0.736887,0.847276,0.850228,0.850181,0.851002
171,No log,0.726547,0.848249,0.846661,0.858839,0.846015
228,No log,0.678761,0.870623,0.869926,0.871131,0.870179
285,No log,0.664043,0.874514,0.872614,0.875646,0.870593


[32m[I 2021-12-19 01:41:04,805][0m Trial 40 finished with value: 0.6638225317001343 and parameters: {'learning_rate': 9.919461697947092e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0012922469966618374, 'label_smoothing_factor': 0.06920759304752172}. Best is trial 34 with value: 0.4084729850292206.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.719654,0.797665,0.79093,0.826191,0.781563
114,No log,0.526956,0.843385,0.841857,0.855035,0.834038
171,No log,0.488932,0.855058,0.856528,0.863792,0.861029
228,No log,0.428918,0.868677,0.867644,0.868865,0.866571
285,No log,0.419594,0.871595,0.871897,0.872281,0.871826


[32m[I 2021-12-19 01:46:28,336][0m Trial 41 finished with value: 0.4192184805870056 and parameters: {'learning_rate': 8.410724600460766e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0015110751277992809, 'label_smoothing_factor': 0.007428968789454836}. Best is trial 34 with value: 0.4084729850292206.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.686676,0.808366,0.80499,0.834411,0.796678
114,No log,0.529694,0.84144,0.838681,0.856266,0.828532
171,No log,0.491558,0.857004,0.857813,0.865945,0.861789
228,No log,0.421301,0.868677,0.868764,0.870726,0.866942
285,No log,0.407248,0.872568,0.874357,0.876419,0.872703


[32m[I 2021-12-19 01:51:52,514][0m Trial 42 finished with value: 0.40703555941581726 and parameters: {'learning_rate': 8.567722979991723e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010003963359782894, 'label_smoothing_factor': 0.006466751607772679}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.788979,0.789883,0.776126,0.807474,0.772056
114,No log,0.576614,0.845331,0.844311,0.858637,0.835351
171,No log,0.526916,0.859922,0.856988,0.86276,0.861679
228,No log,0.470218,0.870623,0.870901,0.871522,0.870402
285,No log,0.457613,0.871595,0.871028,0.870667,0.871849


[32m[I 2021-12-19 01:57:16,991][0m Trial 43 finished with value: 0.4576463997364044 and parameters: {'learning_rate': 7.056830565909492e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010029018392979597, 'label_smoothing_factor': 0.016330609047599143}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.782126,0.792802,0.787621,0.821659,0.778708
114,No log,0.548475,0.840467,0.841661,0.859936,0.829905
171,No log,0.492568,0.853113,0.8507,0.854954,0.856295
228,No log,0.4338,0.866732,0.865504,0.865012,0.866193
285,No log,0.41981,0.871595,0.871078,0.871701,0.870967


[32m[I 2021-12-19 02:02:41,483][0m Trial 44 finished with value: 0.4198647737503052 and parameters: {'learning_rate': 6.277456193778829e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.001180034686859075, 'label_smoothing_factor': 0.0056800216140093975}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.776174,0.785019,0.766879,0.813199,0.75839
114,No log,0.587862,0.847276,0.848591,0.860472,0.842784
171,No log,0.559698,0.85214,0.853207,0.862313,0.856511
228,No log,0.493749,0.871595,0.870068,0.872942,0.86797
285,No log,0.480268,0.877432,0.876315,0.8781,0.875013


[32m[I 2021-12-19 02:08:06,025][0m Trial 45 finished with value: 0.48014014959335327 and parameters: {'learning_rate': 8.819559845260083e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.001475938662721681, 'label_smoothing_factor': 0.022589634366085756}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.631912,0.60214,0.450761,0.567179,0.475869
114,No log,1.214939,0.72179,0.667468,0.763786,0.654149
171,No log,1.0495,0.788911,0.775657,0.799086,0.77331
228,No log,0.973034,0.816148,0.810666,0.826987,0.799578
285,No log,0.945471,0.815175,0.809503,0.823894,0.799375



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-19 02:13:30,670][0m Trial 46 finished with value: 0.9453240036964417 and parameters: {'learning_rate': 1.5433406550805553e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0011232523078250575, 'label_smoothing_factor': 0.09351247988819009}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.791426,0.525292,0.376568,0.456094,0.401652
114,No log,1.358916,0.61965,0.496978,0.62891,0.508591
171,No log,1.125221,0.723735,0.67438,0.762904,0.663307
228,No log,1.024885,0.733463,0.69215,0.757972,0.677728
285,No log,0.983766,0.749027,0.711053,0.760782,0.694442



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-19 02:18:55,831][0m Trial 47 finished with value: 0.9834896922111511 and parameters: {'learning_rate': 1.0260245085413009e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0017900779428969885, 'label_smoothing_factor': 0.011318282246023355}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,1.017632,0.790856,0.781782,0.812233,0.776417
114,No log,0.855433,0.842412,0.84228,0.859923,0.831615
171,No log,0.816487,0.855058,0.851741,0.856835,0.855898
228,No log,0.773779,0.868677,0.869525,0.868261,0.871343
285,No log,0.764978,0.874514,0.874752,0.875231,0.874752


[32m[I 2021-12-19 02:24:19,787][0m Trial 48 finished with value: 0.7650218605995178 and parameters: {'learning_rate': 6.507488337700906e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.001429505351289955, 'label_smoothing_factor': 0.09938369904713035}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.784269,0.796693,0.790835,0.831744,0.777759
114,No log,0.60438,0.859922,0.863214,0.866859,0.861319
171,No log,0.57574,0.857004,0.855819,0.866952,0.855903
228,No log,0.530327,0.874514,0.871207,0.872876,0.870266
285,No log,0.515358,0.875486,0.872136,0.875252,0.870036


[32m[I 2021-12-19 02:29:43,995][0m Trial 49 finished with value: 0.5152388215065002 and parameters: {'learning_rate': 9.996807450505676e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0021597256452135644, 'label_smoothing_factor': 0.03198608645134342}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.713411,0.79572,0.781135,0.812428,0.775693
114,No log,0.524839,0.840467,0.838764,0.851519,0.831122
171,No log,0.485515,0.853113,0.850757,0.858346,0.856256
228,No log,0.4201,0.870623,0.869936,0.871569,0.868639
285,No log,0.411975,0.873541,0.874072,0.875331,0.873298


[32m[I 2021-12-19 02:35:08,716][0m Trial 50 finished with value: 0.4117865562438965 and parameters: {'learning_rate': 8.097474441110994e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0016390441564169214, 'label_smoothing_factor': 0.005376122816437847}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.719263,0.790856,0.78464,0.814754,0.780259
114,No log,0.532172,0.842412,0.841574,0.856596,0.833016
171,No log,0.480714,0.856031,0.852288,0.85624,0.859037
228,No log,0.421818,0.86965,0.867759,0.870717,0.865316
285,No log,0.408035,0.875486,0.873572,0.872865,0.874795


[32m[I 2021-12-19 02:40:33,334][0m Trial 51 finished with value: 0.4080692529678345 and parameters: {'learning_rate': 7.978786605659688e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010022750025750682, 'label_smoothing_factor': 0.005382224605316044}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.728076,0.798638,0.790531,0.823375,0.783291
114,No log,0.531683,0.84144,0.839236,0.852135,0.831008
171,No log,0.482158,0.851167,0.848397,0.853118,0.853018


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.728076,0.798638,0.790531,0.823375,0.783291
114,No log,0.531683,0.84144,0.839236,0.852135,0.831008
171,No log,0.482158,0.851167,0.848397,0.853118,0.853018
228,No log,0.424506,0.868677,0.868903,0.870553,0.86786
285,No log,0.410671,0.873541,0.872811,0.872201,0.873697


[32m[I 2021-12-19 02:45:58,091][0m Trial 52 finished with value: 0.41060197353363037 and parameters: {'learning_rate': 7.943336512154693e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010100743432846168, 'label_smoothing_factor': 0.004887320646947286}. Best is trial 42 with value: 0.40703555941581726.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.716235,0.797665,0.78901,0.817863,0.783323
114,No log,0.531431,0.838521,0.838024,0.852826,0.829409
171,No log,0.478633,0.854086,0.850125,0.854575,0.856028
228,No log,0.421361,0.873541,0.875131,0.8779,0.872859
285,No log,0.405592,0.876459,0.877343,0.877753,0.877324


[32m[I 2021-12-19 02:51:23,201][0m Trial 53 finished with value: 0.4056026339530945 and parameters: {'learning_rate': 7.928657974273968e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010004765361006196, 'label_smoothing_factor': 0.005138958994368323}. Best is trial 53 with value: 0.4056026339530945.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.772792,0.797665,0.794831,0.829364,0.782458
114,No log,0.551389,0.842412,0.843316,0.863247,0.831899
171,No log,0.48562,0.846304,0.844475,0.849834,0.848209
228,No log,0.425312,0.870623,0.871289,0.871546,0.87125
285,No log,0.412231,0.873541,0.873385,0.873867,0.873339


[32m[I 2021-12-19 02:56:48,538][0m Trial 54 finished with value: 0.4122443199157715 and parameters: {'learning_rate': 6.017828424283103e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.001636543716863074, 'label_smoothing_factor': 0.004359947754859264}. Best is trial 53 with value: 0.4056026339530945.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.796755,0.789883,0.780653,0.80987,0.776132
114,No log,0.585795,0.839494,0.838606,0.8547,0.829375
171,No log,0.534027,0.857004,0.852984,0.857062,0.858344
228,No log,0.478222,0.872568,0.872557,0.873099,0.872153
285,No log,0.465237,0.871595,0.872831,0.873001,0.873136


[32m[I 2021-12-19 03:02:13,818][0m Trial 55 finished with value: 0.4652772545814514 and parameters: {'learning_rate': 6.857680443381228e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0011752924280491151, 'label_smoothing_factor': 0.017586301726865747}. Best is trial 53 with value: 0.4056026339530945.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.713342,0.798638,0.788657,0.819855,0.782041
114,No log,0.518672,0.846304,0.846596,0.858315,0.839181
171,No log,0.479941,0.856031,0.853693,0.858794,0.859588
228,No log,0.424731,0.864786,0.865206,0.866866,0.863899
285,No log,0.41256,0.870623,0.870087,0.871234,0.86936


[32m[I 2021-12-19 03:07:38,655][0m Trial 56 finished with value: 0.4125162661075592 and parameters: {'learning_rate': 8.039824873288829e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.002934122482891072, 'label_smoothing_factor': 0.0051538229420066825}. Best is trial 53 with value: 0.4056026339530945.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.917576,0.793774,0.785931,0.817192,0.780122
114,No log,0.746812,0.846304,0.846124,0.857755,0.838504
171,No log,0.718593,0.859922,0.856197,0.860512,0.861712
228,No log,0.673527,0.86965,0.871285,0.873229,0.869875
285,No log,0.664629,0.870623,0.870198,0.871469,0.869434


[32m[I 2021-12-19 03:13:03,511][0m Trial 57 finished with value: 0.6646230220794678 and parameters: {'learning_rate': 7.712051145604682e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0013697320264742339, 'label_smoothing_factor': 0.07075824785068309}. Best is trial 53 with value: 0.4056026339530945.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.786001,0.79572,0.789432,0.818717,0.783169
114,No log,0.582153,0.837549,0.836072,0.856458,0.824502
171,No log,0.521166,0.853113,0.849889,0.855611,0.853697
228,No log,0.466391,0.872568,0.872219,0.872752,0.871799
285,No log,0.452748,0.873541,0.873286,0.873305,0.87354


[32m[I 2021-12-19 03:18:28,943][0m Trial 58 finished with value: 0.4528005123138428 and parameters: {'learning_rate': 6.739254368033661e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.002124766167303326, 'label_smoothing_factor': 0.014174545078156412}. Best is trial 53 with value: 0.4056026339530945.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.885141,0.785019,0.770361,0.809659,0.756794
114,No log,0.712473,0.840467,0.840525,0.862872,0.827144
171,No log,0.658068,0.845331,0.844101,0.847913,0.850627
228,No log,0.606345,0.871595,0.87156,0.870637,0.873471
285,No log,0.591698,0.868677,0.870599,0.871405,0.870244


[32m[I 2021-12-19 03:23:54,327][0m Trial 59 finished with value: 0.5916937589645386 and parameters: {'learning_rate': 5.900004299356959e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010036546695475687, 'label_smoothing_factor': 0.04755012581400559}. Best is trial 53 with value: 0.4056026339530945.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.703934,0.808366,0.797618,0.835995,0.78216
114,No log,0.521169,0.837549,0.837333,0.842912,0.834643
171,No log,0.496595,0.839494,0.836779,0.847133,0.840721
228,No log,0.423369,0.86284,0.861057,0.860578,0.862093
285,No log,0.407391,0.871595,0.870151,0.872649,0.868281


[32m[I 2021-12-19 03:29:19,233][0m Trial 60 finished with value: 0.40713202953338623 and parameters: {'learning_rate': 9.00727390976241e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.002529830679246514, 'label_smoothing_factor': 0.0033855390898417477}. Best is trial 53 with value: 0.4056026339530945.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.677012,0.799611,0.789513,0.825175,0.781017
114,No log,0.515164,0.84144,0.840182,0.854316,0.830807
171,No log,0.475158,0.851167,0.849549,0.8553,0.855548
228,No log,0.412392,0.868677,0.867102,0.869449,0.865063
285,No log,0.399255,0.874514,0.872825,0.873283,0.873006


[32m[I 2021-12-19 03:34:43,854][0m Trial 61 finished with value: 0.3991892635822296 and parameters: {'learning_rate': 8.750523197903139e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.002617658542008943, 'label_smoothing_factor': 0.003560027682929669}. Best is trial 61 with value: 0.3991892635822296.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.712521,0.779183,0.760646,0.80807,0.751216
114,No log,0.492998,0.842412,0.843063,0.853548,0.835963
171,No log,0.455926,0.851167,0.850924,0.858011,0.855672
228,No log,0.383867,0.876459,0.871863,0.872995,0.871222
285,No log,0.372548,0.877432,0.875441,0.874948,0.876465


[32m[I 2021-12-19 03:40:08,751][0m Trial 62 finished with value: 0.3723418116569519 and parameters: {'learning_rate': 9.058093909738025e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0023961632517904207, 'label_smoothing_factor': 0.00015659030080603532}. Best is trial 62 with value: 0.3723418116569519.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.694576,0.790856,0.779434,0.823831,0.769149
114,No log,0.500444,0.845331,0.84482,0.860175,0.835086
171,No log,0.459069,0.85214,0.852281,0.861586,0.854968
228,No log,0.3921,0.873541,0.872983,0.873629,0.872687
285,No log,0.376665,0.873541,0.872078,0.872302,0.872174


[32m[I 2021-12-19 03:45:33,068][0m Trial 63 finished with value: 0.37659749388694763 and parameters: {'learning_rate': 9.105561271666804e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.002436453047855, 'label_smoothing_factor': 0.0005930500192666737}. Best is trial 62 with value: 0.3723418116569519.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.698443,0.789883,0.783777,0.82893,0.771146
114,No log,0.491475,0.846304,0.847831,0.859455,0.840339
171,No log,0.450446,0.854086,0.854367,0.862105,0.85751
228,No log,0.388545,0.873541,0.872299,0.872992,0.872013
285,No log,0.375687,0.874514,0.874372,0.874459,0.874693


[32m[I 2021-12-19 03:50:57,608][0m Trial 64 finished with value: 0.3755095899105072 and parameters: {'learning_rate': 9.141671856275465e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0025642564356591112, 'label_smoothing_factor': 0.0005157913033401974}. Best is trial 62 with value: 0.3723418116569519.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.710569,0.787938,0.775652,0.825775,0.761029
114,No log,0.510132,0.844358,0.844295,0.850402,0.841602
171,No log,0.48471,0.84144,0.841017,0.850964,0.843321
228,No log,0.401998,0.866732,0.865786,0.866596,0.865795
285,No log,0.384202,0.878405,0.877425,0.879,0.876486


[32m[I 2021-12-19 03:56:21,902][0m Trial 65 finished with value: 0.3840334415435791 and parameters: {'learning_rate': 8.988378054038654e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0025631421566836583, 'label_smoothing_factor': 0.0002623337936037578}. Best is trial 62 with value: 0.3723418116569519.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.680859,0.790856,0.784656,0.826806,0.774204
114,No log,0.493124,0.842412,0.842738,0.85488,0.835557
171,No log,0.449299,0.857977,0.856529,0.863594,0.860304
228,No log,0.383144,0.873541,0.870351,0.872464,0.86852
285,No log,0.371759,0.876459,0.874977,0.877185,0.873415


[32m[I 2021-12-19 04:01:46,709][0m Trial 66 finished with value: 0.3716256022453308 and parameters: {'learning_rate': 9.11342199505515e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0034013030976917115, 'label_smoothing_factor': 0.00047461372913629315}. Best is trial 66 with value: 0.3716256022453308.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.71297,0.776265,0.762039,0.813132,0.751896
114,No log,0.497572,0.84144,0.841282,0.852797,0.833425
171,No log,0.449285,0.851167,0.851155,0.856613,0.856922
228,No log,0.380057,0.875486,0.873099,0.874166,0.87248
285,No log,0.371599,0.88035,0.879368,0.880247,0.879079


[32m[I 2021-12-19 04:07:11,635][0m Trial 67 finished with value: 0.3714544475078583 and parameters: {'learning_rate': 9.150963671552329e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0033286616471942198, 'label_smoothing_factor': 0.0006215711074007643}. Best is trial 67 with value: 0.3714544475078583.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.729736,0.792802,0.784154,0.815694,0.780914
114,No log,0.525915,0.838521,0.83656,0.854283,0.826588
171,No log,0.45661,0.856031,0.853028,0.860399,0.855629
228,No log,0.40631,0.866732,0.866726,0.867264,0.866398
285,No log,0.395436,0.870623,0.871641,0.872043,0.871679


[32m[I 2021-12-19 04:12:37,287][0m Trial 68 finished with value: 0.3954123854637146 and parameters: {'learning_rate': 7.260504836144708e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.003471205960034998, 'label_smoothing_factor': 0.0006153059202235781}. Best is trial 67 with value: 0.3714544475078583.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall
57,No log,0.729646,0.792802,0.781452,0.810605,0.777881
114,No log,0.519405,0.84144,0.839489,0.856722,0.828888
171,No log,0.462622,0.858949,0.855244,0.859315,0.860843
228,No log,0.401042,0.868677,0.867619,0.867623,0.867881
285,No log,0.387938,0.872568,0.87354,0.873623,0.873861


[32m[I 2021-12-19 04:18:02,500][0m Trial 69 finished with value: 0.387963205575943 and parameters: {'learning_rate': 7.179665081185553e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0034439659966747365, 'label_smoothing_factor': 0.00017667829270411072}. Best is trial 67 with value: 0.3714544475078583.[0m


{'label_smoothing_factor': 0.0006215711074007643,
 'learning_rate': 9.150963671552329e-05,
 'num_train_epochs': 1,
 'per_device_train_batch_size': 32,
 'weight_decay': 0.0033286616471942198}

## Hyperparameter Tuning

https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer.hyperparameter_search

In [None]:
# disable transformer warnings like "Some weights of the model checkpoint ..."
logging.set_verbosity_error()


training_args = TrainingArguments(
    output_dir=str(project_name),
    report_to=[],
    log_level="error",
    disable_tqdm=False,

    evaluation_strategy="steps",
    # eval_steps=eval_steps,
    save_strategy="steps",
    # save_steps=eval_steps,
    # load_best_model_at_end=False,
    # metric_for_best_model="eval_loss",
    # greater_is_better=False,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_gnad10k["train"],
    eval_dataset=tokenized_gnad10k["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)


# Default objective is the sum of all metrics
# when metrics are provided, so we have to maximize it.
# best = trainer.hyperparameter_search(
#     hp_space=hp_space,
#     compute_objective=objective,
#     n_trials=2
# )