<a href="https://colab.research.google.com/github/goerlitz/nlp-classification/blob/main/notebooks/10kGNAD/colab/21c_10kGNAD_huggingface_basic_optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter Optimization with HuggingFace Transformers

Adapted from https://huggingface.co/docs/transformers/custom_datasets#sequence-classification-with-imdb-reviews

Things we need
* a tokenizer
* tokenized input data
* a pretrained model
* evaluation metrics
* training parameters
* a Trainer instance

Notes
* [class labels can be included in the model config](https://github.com/huggingface/transformers/pull/2945#issuecomment-781986506) (a bit hacky)
* [fp16 is disabled on tesla P100 GPU in pytorch](https://discuss.pytorch.org/t/cnn-fp16-slower-than-fp32-on-tesla-p100/12146)
* [comparison of GPUS (K80, T4, P100, V100)](https://www.kaggle.com/general/198232)
* [GPU benchmark, mixed precision](https://medium.com/the-artificial-impostor/mixed-precision-training-on-tesla-t4-and-p100-d82e5d3b987d)

## Prerequisites

In [1]:
checkpoint = "distilbert-base-german-cased"
# checkpoint = "deepset/gbert-base"
# checkpoint = "deepset/gelectra-base"
# checkpoint = "deepset/gelectra-large"

project_name = f'10kgnad_hf__{checkpoint.replace("/", "_")}'

### Connect Google Drive

Will be used to save results

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [3]:
from pathlib import Path

# define model path
root_path = Path('/content/gdrive/My Drive/')
base_path = root_path / 'Colab Notebooks/nlp-classification/'
model_path = base_path / 'models'

## Check GPU

In [4]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Sat Jan 15 13:45:04 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   62C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Install APEX

https://stackoverflow.com/questions/57284345/how-to-install-nvidia-apex-on-google-colab

In [5]:
%%writefile setup.sh

git clone https://github.com/NVIDIA/apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex

Writing setup.sh


In [6]:
%%time
!sh setup.sh

Cloning into 'apex'...
remote: Enumerating objects: 8835, done.[K
remote: Counting objects: 100% (68/68), done.[K
remote: Compressing objects: 100% (45/45), done.[K
remote: Total 8835 (delta 28), reused 47 (delta 23), pack-reused 8767[K
Receiving objects: 100% (8835/8835), 14.50 MiB | 20.60 MiB/s, done.
Resolving deltas: 100% (6009/6009), done.
  cmdoptions.check_install_build_global(options)
Using pip 21.1.3 from /usr/local/lib/python3.7/dist-packages/pip (python 3.7)
Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/9617>
distutils: /usr/local/lib/python3.7/dist-packages
sysconfig: /usr/lib/python3.7/site-packages
Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/9617>
distutils: /usr/local/lib/python3.7/dist-packages
sysconfig: /usr/lib/python3.7/site-packages
Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/9617>
distutils: /usr/loc

In [5]:
import apex

### Install Packages

In [5]:
%%time
!pip install -q -U transformers datasets >/dev/null
!pip install -q -U optuna >/dev/null

# check installed version
!pip freeze | grep optuna        # optuna==2.10.0
!pip freeze | grep transformers  # transformers==4.15.0
!pip freeze | grep "torch "      # torch==1.10.0+cu111

optuna==2.10.0
transformers==4.15.0
torch @ https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
CPU times: user 73.7 ms, sys: 43.5 ms, total: 117 ms
Wall time: 9.2 s


In [6]:
from transformers import logging

# hide progress bar when downloading tokenizer and model (a workaround!)
logging.get_verbosity = lambda : logging.NOTSET

## Load Dataset

In [7]:
from datasets import load_dataset

base_url = "https://raw.githubusercontent.com/tblock/10kGNAD/master/{}.csv"
data_files = {x: base_url.format(x) for x in ["train", "test"]}
dataset = (load_dataset('csv',
                        data_files=data_files,
                        sep=";",
                        quotechar="'",
                        names=["label", "text"]).
           class_encode_column("label"))

label_names = dataset["train"].features["label"].names

Using custom data configuration default-0e1a53e9f937c1cf
Reusing dataset csv (/root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e)


  0%|          | 0/2 [00:00<?, ?it/s]

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-59d70429eaf283cf.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-a2cd8f79c68f7005.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-5dec691e070de180.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-7ee1cd4bfba8d7d3.arrow


In [8]:
print(dataset)
print("labels:", label_names)
dataset["train"][0]

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 9245
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 1028
    })
})
labels: ['Etat', 'Inland', 'International', 'Kultur', 'Panorama', 'Sport', 'Web', 'Wirtschaft', 'Wissenschaft']


{'label': 5,
 'text': '21-Jähriger fällt wohl bis Saisonende aus. Wien – Rapid muss wohl bis Saisonende auf Offensivspieler Thomas Murg verzichten. Der im Winter aus Ried gekommene 21-Jährige erlitt beim 0:4-Heimdebakel gegen Admira Wacker Mödling am Samstag einen Teilriss des Innenbandes im linken Knie, wie eine Magnetresonanz-Untersuchung am Donnerstag ergab. Murg erhielt eine Schiene, muss aber nicht operiert werden. Dennoch steht ihm eine mehrwöchige Pause bevor.'}

## Data Preprocessing

* Loading the same Tokenizer that was used with the pretrained model.
* Define function to tokenize the text (with truncation to max input length of model.
* Run the tokenization

In [9]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def preprocess_function(examples):
    # return tokenizer(examples["text"], truncation=True, max_length=128)
    return tokenizer(examples["text"], truncation=True)

tokenized_dataset = dataset.map(preprocess_function, batched=True).remove_columns("text")

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-9978cdff5e07f8bd.arrow


  0%|          | 0/2 [00:00<?, ?ba/s]

### Use Dynamic Padding

Apply panding only on longest text in batch - this is more efficient than applying padding on the whole dataset.

In [10]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

### Define Evaluation Metrics

The funtion that computes the metrics needs to be passed to the Trainer.

In [11]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score, matthews_corrcoef
import numpy as np
from typing import Dict

def compute_metrics(eval_preds):
    """The function that will be used to compute metrics at evaluation.
    Must take a :class:`~transformers.EvalPrediction` and return a dictionary
    string to metric values."""
    logits, labels = eval_preds
    preds = np.argmax(logits, axis=-1)
    return {
        "acc": accuracy_score(labels, preds),
        "f1": f1_score(labels, preds, average='macro'),
        "precision": precision_score(labels, preds, average='macro'),
        "recall": recall_score(labels, preds, average='macro'),
        "mcc": matthews_corrcoef(labels, preds),
        }

## Hyperparameter Tuning

In [12]:
from transformers import TrainerCallback
from optuna.trial import Trial, TrialState
from optuna.study._study_direction import StudyDirection
import pandas as pd

# https://github.com/huggingface/transformers/blob/v4.14.1/src/transformers/trainer_callback.py#L505
# https://huggingface.co/docs/transformers/main_classes/callback#transformers.TrainerCallback

import logging
logging.getLogger(__name__).setLevel(logging.INFO)
log = logging.getLogger(__name__)

class TrialLogAndPruningCallback(TrainerCallback):
    """Stores eval metrics at each evaluation step in the trial user attrs."""
    def __init__(self, trial: Trial, objectives=None, warmup_steps=0, min_trials=7):
        self.study = trial.study
        self.trial = trial
        self.param_keys = ["num_train_epochs", "per_device_train_batch_size"]
        self.param_vals = [trial.params[k] for k in self.param_keys]

        log.warning(f"fixed params: {list(zip(self.param_keys, self.param_vals))}")

        if objectives == None:
            self.objectives = ["eval_loss"]
        else:
            self.objectives = objectives
        self._warmup_steps = warmup_steps
        self._min_trials = max(1, int(min_trials))

        log.warning(f"objectives: {self.objectives}, directions: {self.study.directions}, warmup={self._warmup_steps}, min_trials={self._min_trials}")
        log.warning(f"params: {trial.params}")
        

    def _filter_trials(self, complete_trials):
        """Select only trials with same parameter values"""
        # values = [self.trial.params[k] for k in keys]
        return [t for t in complete_trials if self.param_vals == [t.params[k] for k in self.param_keys]]

    def _prune(self, step: int, metrics) -> bool:
        """Median Pruning on multiple objectives."""
        if step < self._warmup_steps:
            # log.warning(f"less than warmup steps {step}<{self._warmup_steps}")
            return False

        # get all completed trials
        complete_trials = self.study.get_trials(deepcopy=False,
                                                states=[TrialState.COMPLETE])
        # only compare trials with same batch size and epochs
        complete_trials = self._filter_trials(complete_trials)
        n_trials = len(complete_trials)

        # check minimal number of trial required
        if n_trials < self._min_trials:
            # log.warning(f"less than min trials {n_trials}<{self._min_trials}")
            return False

        # log.warning(f"checking {step}: {metrics}")

        # sanity check
        has_metrics = [o in metrics.keys() for o in self.objectives]
        if not all(has_metrics):
            log.warning(f"missing objective metrics {list(zip(self.objectives, has_metrics))}")

        # extract metrics from trials
        # print(f"fetching metrics of {n_trials} complete trials")
        trial_metrics = []
        for t in complete_trials:
            # print(str(step), "in keys?", str(step) in t.user_attrs.keys(), t.user_attrs.keys())
            if str(step) in t.user_attrs.keys():
                trial_metrics.append(t.user_attrs[str(step)])
        n_metrics = len(trial_metrics)

        # compute median for each metric over all trials
        median = pd.DataFrame(trial_metrics).median()

        # log.warning(f"median of {n_metrics}/{n_trials}: {median.to_dict()}")

        # compare current metric value with median
        prune_state = []
        for i, o in enumerate(self.objectives):
            if self.study.directions[i] == StudyDirection.MAXIMIZE:
                prune_state.append(metrics[o] <= median[o])
            else:
                prune_state.append(metrics[o] > median[o])
        
        met = ",".join([f"{m}={metrics[m]:.4}/{median[m]:.4}" for m in self.objectives])
        print(f"prune? step={step}, warmup={self._warmup_steps}, complete_trials={n_trials}, metrics={n_metrics} -> {met}; {prune_state}")

        # all metrics must be marked for pruning
        return all(prune_state)
    
    def on_evaluate(self, args, state, control, lr_scheduler, metrics, **kwargs):
        step = state.global_step
        values = {**metrics, "lr": lr_scheduler.get_last_lr()[-1]}
        self.trial.set_user_attr(str(step), values)

        # pruning
        if self._prune(step, metrics):
            print(f"pruning trial at step {step}")
            # control.should_training_stop = True  # not needed
            raise optuna.TrialPruned()

In [16]:
from transformers import AutoConfig, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
import shutil

def hp_space(trial: Trial):
    """A function that defines the hyperparameter search space.
    To be used in :obj:`Trainer.hyperparameter_search`."""
    return {
        # "learning_rate": trial.suggest_float("learning_rate", 1e-5, 1e-4, log=True),  # distilbert/bert
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 4e-4, log=True),  # distilbert 1 epoch
        # "learning_rate": trial.suggest_float("learning_rate", 6e-5, 2e-4, log=True),  # electra
        # "num_train_epochs": trial.suggest_categorical("num_train_epochs", [1]),
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", [2]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [8,16,32]),
        "weight_decay": trial.suggest_float("weight_decay", 1e-3, 1e-2, log=True),
        # "weight_decay": trial.suggest_categorical("weight_decay", [1e-3, 0.0]),
    }

best_model_dir = "best_model_trainer"

def best_model_callback(study, trial):
    """Save the model from a best trial"""
    for t in study.best_trials:
        if t.number == trial.number:
            print("This is a new besttrial", trial.number)
        
            out_filename = model_path / f"{project_name}_t{trial.number}"
            shutil.make_archive(out_filename, 'zip', f"{project_name}/{best_model_dir}")

def model_init(trial: Trial):
    """A function that instantiates the model to be used."""

    # We want to include the label names and save them together with the model.
    # The only way to do this is to create a Config and put them in. 
    config = AutoConfig.from_pretrained(
            checkpoint,
            num_labels=len(label_names),
            id2label={i: label for i, label in enumerate(label_names)},
            label2id={label: i for i, label in enumerate(label_names)},
            )

    return AutoModelForSequenceClassification.from_pretrained(checkpoint, config=config)

def objective(trial: Trial):

    # get hyperparameters choice
    hp = hp_space(trial)
    lr = hp["learning_rate"]
    bs = hp["per_device_train_batch_size"]
    epochs = hp["num_train_epochs"]
    weight_decay = hp["weight_decay"]
    # label_smoothing_factor = hp["label_smoothing_factor"]

    eval_rounds_per_epoch = 5
    eval_steps = dataset["train"].num_rows / bs // eval_rounds_per_epoch

    training_args = TrainingArguments(
        output_dir=str(project_name),
        report_to=[],
        log_level="error",
        disable_tqdm=False,

        evaluation_strategy="steps",
        eval_steps=eval_steps,
        logging_steps=eval_steps,
        save_strategy="steps",
        save_steps=eval_steps,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,

        # hyperparameters
        num_train_epochs=epochs,
        learning_rate=lr,
        per_device_train_batch_size=bs,
        per_device_eval_batch_size=bs,
        weight_decay=weight_decay,
        # label_smoothing_factor=label_smoothing_factor,

        fp16=True,  # fp16 needs apex. but disabled on Tesla P100 by pytorch
    )

    trainer = Trainer(
        model_init=model_init,
        args=training_args,
        train_dataset=tokenized_dataset["train"],
        eval_dataset=tokenized_dataset["test"],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
        callbacks=[TrialLogAndPruningCallback(trial, objectives=["eval_loss", "eval_f1"], min_trials=700, warmup_steps=eval_steps*3)]
        # callbacks=[TrialPruningCallback(trial)]
    )

    # train model and save best model from evaluations
    # needs 'load_best_model_at_end=True'
    trainer.train()
    trainer.save_model(f"{project_name}/{best_model_dir}")

    result = trainer.evaluate(eval_dataset=tokenized_dataset["test"])

    # store eval metrics in trial
    trial.set_user_attr("eval_result", result)
    
    # return result["eval_loss"]
    return result["eval_loss"], result["eval_f1"]

## Hyperparameter Tuning

In [17]:
# run_study(
#     dataset=dataset,
#     max_seq_length=None,
#     checkpoint="distilbert-base-german-cased",
#     batch_size=32,
#     train_epochs=[2,3]
#     learning_rate=MinMax(4e-5, 4e-4, log=True),
#     weight_decay=MinMax(1e-3, 1e-2, log=True),
#     eval_metrics={"loss":"minimize", "f1":"maximize"}
# )

In [None]:
import optuna
from optuna.storages import RDBStorage

db_path = "/content/gdrive/My Drive/Colab Notebooks/nlp-classification/"
db_name = "10kgnad_optuna"
# study_name = checkpoint + "_multi_epoch234"
# study_name = checkpoint + "_loss-f1_bs32_epoch23"
study_name = checkpoint + "_loss-f1_bs8-16-32_ep2"

# automatically change the state of a stale trial to TrialState.FAIL from TrialState.RUNNING
storage = RDBStorage(url=f"sqlite:///{db_path}{db_name}.db", heartbeat_interval=60, grace_period=120)

# https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
import torch
torch.cuda.empty_cache()
import gc
gc.collect()

# multi objective study
# https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/002_multi_objective.html#sphx-glr-tutorial-20-recipes-002-multi-objective-py
study = optuna.create_study(study_name=study_name,
                            directions=["minimize", "maximize"],
                            # storage=f"sqlite:///{db_path}{db_name}.db",
                            storage=storage,
                            load_if_exists=True,)

# give some hyperparameters that are presumably good
# study.enqueue_trial(
#     {
#         "learning_rate": 8e-5,
#         "weight_decay": 1e-3,
#     }
# )


study.optimize(objective, n_trials=200, callbacks=[best_model_callback])

# study.best_params

[32m[I 2022-01-15 15:07:38,250][0m Using an existing study with name 'distilbert-base-german-cased_loss-f1_bs8-16-32_ep2' instead of creating a new one.[0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 1.5841336575732623e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0012759610636928096}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,1.5421,1.002562,0.740272,0.703826,0.796412,0.677431,0.704291
230,0.8555,0.662495,0.825875,0.823235,0.846766,0.808833,0.800715
345,0.6303,0.568661,0.831712,0.830955,0.83712,0.837478,0.810011
460,0.5634,0.484695,0.85214,0.851814,0.850074,0.856939,0.830884
575,0.5202,0.448346,0.860895,0.864824,0.864596,0.867329,0.840889
690,0.4606,0.441752,0.857977,0.861155,0.869312,0.855649,0.837579
805,0.4552,0.42188,0.868677,0.865653,0.866656,0.865927,0.849601
920,0.3883,0.40748,0.871595,0.873175,0.877211,0.870309,0.852883
1035,0.3781,0.416235,0.871595,0.871517,0.872104,0.873035,0.853223
1150,0.3609,0.403248,0.878405,0.8788,0.879782,0.878504,0.860689


[32m[I 2022-01-15 15:15:46,398][0m Trial 2 finished with values: [0.40324777364730835, 0.8788004284023483] and parameters: {'learning_rate': 1.5841336575732623e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0012759610636928096}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 0.00013921866503418376, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.006937397778793355}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,0.9441,0.592739,0.827821,0.822868,0.850284,0.811279,0.803951
114,0.6589,0.50877,0.84144,0.841007,0.855554,0.833603,0.819027
171,0.4945,0.442673,0.857004,0.851598,0.855903,0.854771,0.837236
228,0.4883,0.437384,0.850195,0.850337,0.848005,0.856762,0.828839
285,0.4277,0.417069,0.859922,0.861117,0.860701,0.865723,0.840363
342,0.294,0.381199,0.889105,0.886385,0.890807,0.882892,0.87296
399,0.2883,0.369089,0.893969,0.890636,0.895015,0.888161,0.878585
456,0.2652,0.371665,0.888132,0.888563,0.891101,0.886935,0.871874
513,0.2031,0.387259,0.88716,0.886977,0.886576,0.890186,0.871255
570,0.2341,0.352662,0.894942,0.89286,0.88961,0.896527,0.879753


[32m[I 2022-01-15 15:23:36,411][0m Trial 3 finished with values: [0.35266202688217163, 0.8928603247918959] and parameters: {'learning_rate': 0.00013921866503418376, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.006937397778793355}. [0m


This is a new besttrial 3


fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 0.00014063364262324643, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.00296704952613227}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.9159,0.681232,0.81323,0.810397,0.822192,0.807367,0.786575
462,0.7667,0.697604,0.82393,0.824307,0.847926,0.815153,0.799442
693,0.6514,0.667346,0.827821,0.829677,0.841243,0.833384,0.805941
924,0.6033,0.562319,0.839494,0.830874,0.839803,0.837245,0.817122
1155,0.5791,0.508945,0.826848,0.829239,0.854919,0.817929,0.803763
1386,0.375,0.494879,0.864786,0.860105,0.867653,0.856015,0.845159
1617,0.3695,0.465094,0.88035,0.87521,0.8816,0.872194,0.863141
1848,0.3063,0.517342,0.881323,0.878606,0.884524,0.87537,0.864453
2079,0.2913,0.485435,0.891051,0.890385,0.891929,0.889493,0.875235
2310,0.2921,0.458661,0.892023,0.889141,0.89044,0.889255,0.876468


[32m[I 2022-01-15 15:32:48,203][0m Trial 4 finished with values: [0.45866096019744873, 0.8891411903283714] and parameters: {'learning_rate': 0.00014063364262324643, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.00296704952613227}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 1.1215224879217934e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.001013396313750041}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.9143,1.567321,0.632296,0.470045,0.554629,0.500064,0.584082
114,1.3385,1.086347,0.7393,0.691241,0.810986,0.673843,0.701631
171,0.9947,0.865754,0.803502,0.795669,0.830489,0.786594,0.777216
228,0.8163,0.71351,0.828794,0.823609,0.840308,0.815818,0.803926
285,0.7181,0.634594,0.839494,0.833317,0.8386,0.832623,0.816346
342,0.6421,0.592506,0.843385,0.84053,0.848495,0.835097,0.820493
399,0.6267,0.572791,0.838521,0.834418,0.840867,0.831228,0.815019
456,0.5739,0.547073,0.840467,0.838174,0.844231,0.833477,0.817143
513,0.5414,0.547708,0.846304,0.842706,0.850227,0.839308,0.824313
570,0.5323,0.532855,0.84144,0.839954,0.846794,0.834864,0.818237



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-15 15:40:38,096][0m Trial 5 finished with values: [0.5328550934791565, 0.8399541957806403] and parameters: {'learning_rate': 1.1215224879217934e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.001013396313750041}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 0.00012097340144023097, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0013319302163508427}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.9131,0.620147,0.81323,0.801366,0.832292,0.789106,0.786102
230,0.6784,0.568376,0.835603,0.8314,0.861427,0.816812,0.811873
345,0.53,0.61967,0.807393,0.807788,0.841678,0.809948,0.786653
460,0.4941,0.418288,0.861868,0.857809,0.85937,0.863944,0.842449
575,0.4598,0.435852,0.857977,0.85739,0.858076,0.862403,0.838349
690,0.3076,0.436424,0.873541,0.872449,0.884094,0.864841,0.855305
805,0.3068,0.405524,0.86965,0.858916,0.866832,0.859629,0.85091
920,0.2552,0.40366,0.88716,0.886731,0.887702,0.887639,0.871008
1035,0.234,0.393457,0.891051,0.890699,0.888367,0.894333,0.875424
1150,0.2338,0.375814,0.894942,0.89548,0.893036,0.898721,0.879726


[32m[I 2022-01-15 15:48:49,493][0m Trial 6 finished with values: [0.37581440806388855, 0.8954802040108044] and parameters: {'learning_rate': 0.00012097340144023097, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0013319302163508427}. [0m


This is a new besttrial 6


fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 1.1460144814510901e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010703820995073172}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.9083,1.554612,0.635214,0.474039,0.55598,0.502979,0.587275
114,1.3261,1.07274,0.742218,0.695923,0.812814,0.678511,0.70504
171,0.982,0.85381,0.804475,0.795927,0.830195,0.787255,0.778345
228,0.8056,0.703818,0.825875,0.82161,0.838722,0.813495,0.80061
285,0.7088,0.626407,0.839494,0.833792,0.838985,0.833178,0.816354
342,0.6339,0.585168,0.845331,0.843825,0.852371,0.837469,0.822694
399,0.6193,0.566043,0.840467,0.837339,0.842099,0.836124,0.817323
456,0.5666,0.540662,0.84144,0.839505,0.845073,0.835135,0.818266
513,0.5342,0.541816,0.846304,0.842706,0.850227,0.839308,0.824313
570,0.5259,0.526739,0.843385,0.841324,0.848237,0.836239,0.820488



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-15 15:56:54,446][0m Trial 7 finished with values: [0.5267385840415955, 0.8413238432014317] and parameters: {'learning_rate': 1.1460144814510901e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010703820995073172}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 6.924723978057984e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0015853679871704336}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.9643,0.594847,0.818093,0.809937,0.822885,0.811075,0.792755
230,0.6355,0.488926,0.849222,0.849926,0.864075,0.840637,0.827344
345,0.493,0.596119,0.816148,0.820091,0.842033,0.82678,0.79664
460,0.4917,0.406989,0.873541,0.873864,0.875372,0.874831,0.855252
575,0.4376,0.40311,0.868677,0.871187,0.872031,0.873669,0.850078
690,0.3057,0.426951,0.873541,0.874107,0.881848,0.868906,0.855277
805,0.3194,0.40173,0.876459,0.873599,0.878475,0.872544,0.858622
920,0.2444,0.405606,0.88035,0.882085,0.880814,0.884227,0.863131
1035,0.2465,0.39628,0.884241,0.884186,0.882205,0.886923,0.867517
1150,0.2455,0.385615,0.885214,0.884864,0.883353,0.886959,0.868566


[32m[I 2022-01-15 16:05:06,297][0m Trial 8 finished with values: [0.3856145441532135, 0.8848639164807881] and parameters: {'learning_rate': 6.924723978057984e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0015853679871704336}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 7.596812815056411e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.003435155217993448}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.8774,0.609471,0.829767,0.830092,0.849029,0.819266,0.805201
462,0.6688,0.54811,0.845331,0.84268,0.858415,0.837019,0.823691
693,0.5303,0.631318,0.820039,0.815993,0.841476,0.813093,0.797745
924,0.5255,0.44682,0.859922,0.855572,0.853528,0.864314,0.840176
1155,0.4735,0.462202,0.842412,0.841647,0.847493,0.845776,0.821476
1386,0.3051,0.465213,0.866732,0.866297,0.873106,0.863822,0.847896
1617,0.3484,0.42828,0.876459,0.871571,0.877428,0.868792,0.858548
1848,0.2501,0.442944,0.88716,0.885023,0.884503,0.886009,0.870747
2079,0.242,0.446499,0.890078,0.888442,0.891342,0.886097,0.874038
2310,0.2644,0.436088,0.889105,0.883595,0.883113,0.885346,0.873057


[32m[I 2022-01-15 16:14:06,335][0m Trial 9 finished with values: [0.42827969789505005, 0.8715708207781042] and parameters: {'learning_rate': 7.596812815056411e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.003435155217993448}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 2.05020111802864e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0024517221217234446}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.6973,1.159347,0.713035,0.619041,0.808942,0.622218,0.67332
114,1.0041,0.765967,0.815175,0.804902,0.846506,0.787803,0.788956
171,0.7071,0.611603,0.833658,0.834069,0.844404,0.834277,0.811158
228,0.6002,0.528855,0.842412,0.841577,0.852439,0.838073,0.819678
285,0.539,0.478806,0.855058,0.855364,0.855346,0.858031,0.834306
342,0.4725,0.453425,0.86284,0.861997,0.871817,0.854758,0.842756
399,0.4751,0.449538,0.863813,0.861339,0.862738,0.863037,0.844215
456,0.4223,0.418842,0.870623,0.869308,0.872045,0.866941,0.851731
513,0.3851,0.432161,0.864786,0.861149,0.864501,0.862482,0.845786
570,0.3954,0.412572,0.871595,0.869262,0.872049,0.86757,0.852871


[32m[I 2022-01-15 16:21:56,612][0m Trial 10 finished with values: [0.41257205605506897, 0.8692619684282517] and parameters: {'learning_rate': 2.05020111802864e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0024517221217234446}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 1.2704801981181204e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0014895289033024606}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.876,1.488217,0.64786,0.497069,0.562725,0.519361,0.601215
114,1.2644,1.010103,0.763619,0.738391,0.819924,0.71521,0.729564
171,0.9245,0.800784,0.81323,0.808065,0.832156,0.802891,0.788305
228,0.7587,0.663933,0.830739,0.826299,0.838747,0.821767,0.80636
285,0.6685,0.593174,0.844358,0.841154,0.843346,0.841925,0.82199
342,0.5989,0.554682,0.845331,0.843353,0.851415,0.837242,0.822646
399,0.5878,0.540176,0.843385,0.840691,0.843878,0.84101,0.820751
456,0.5366,0.513807,0.848249,0.846576,0.852034,0.842135,0.826067
513,0.5034,0.517153,0.850195,0.846703,0.851656,0.845626,0.828809
570,0.498,0.501137,0.849222,0.848471,0.85343,0.84456,0.82718



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-15 16:29:47,565][0m Trial 11 finished with values: [0.5011371970176697, 0.8484711124934788] and parameters: {'learning_rate': 1.2704801981181204e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0014895289033024606}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 3.0474125288942175e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0022908354717135117}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,1.0424,0.608554,0.821984,0.813164,0.845643,0.794078,0.796014
462,0.643,0.550639,0.838521,0.838372,0.859425,0.8283,0.815268
693,0.5058,0.535378,0.832685,0.835812,0.85691,0.835243,0.812518
924,0.503,0.408455,0.873541,0.870978,0.870921,0.874049,0.855292
1155,0.4491,0.422927,0.853113,0.851753,0.852164,0.856391,0.832638
1386,0.342,0.424317,0.873541,0.87012,0.876779,0.865935,0.855237
1617,0.35,0.415456,0.88035,0.878023,0.882298,0.875076,0.862978
1848,0.2784,0.420441,0.882296,0.879743,0.885872,0.874774,0.865107
2079,0.2777,0.410699,0.891051,0.888092,0.8882,0.888612,0.875278
2310,0.2813,0.407294,0.886187,0.88391,0.885419,0.882787,0.869571


[32m[I 2022-01-15 16:38:48,861][0m Trial 12 finished with values: [0.40729448199272156, 0.8839101562323708] and parameters: {'learning_rate': 3.0474125288942175e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0022908354717135117}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 2.713573369744954e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.002280287902704474}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,1.0888,0.60851,0.821012,0.814703,0.851201,0.79294,0.794726
462,0.6508,0.550261,0.839494,0.839617,0.85839,0.831477,0.816575
693,0.513,0.520358,0.838521,0.841548,0.859374,0.84229,0.818872
924,0.5012,0.411378,0.874514,0.874073,0.87356,0.876727,0.856367
1155,0.4513,0.422157,0.861868,0.86297,0.863605,0.866504,0.842456
1386,0.3522,0.427412,0.870623,0.866092,0.87493,0.859993,0.851791
1617,0.3578,0.413228,0.877432,0.875968,0.878524,0.87448,0.859652
1848,0.2826,0.427041,0.88035,0.880105,0.885604,0.876117,0.862885
2079,0.2856,0.416877,0.885214,0.882801,0.882487,0.883807,0.868608
2310,0.2917,0.414107,0.88035,0.878164,0.878714,0.878052,0.862909


[32m[I 2022-01-15 16:47:51,777][0m Trial 13 finished with values: [0.41137784719467163, 0.8740729911962348] and parameters: {'learning_rate': 2.713573369744954e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.002280287902704474}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 0.00011881550622928129, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.007702261636565342}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.9044,0.841432,0.753891,0.728915,0.810031,0.706569,0.724118
230,0.6626,0.549158,0.847276,0.844229,0.864372,0.832145,0.824984
345,0.5324,0.661467,0.809339,0.81411,0.847593,0.815722,0.789376
460,0.5194,0.453587,0.867704,0.864153,0.865035,0.870149,0.849022
575,0.4704,0.435179,0.86284,0.862548,0.856608,0.870946,0.843475
690,0.3136,0.426543,0.873541,0.872116,0.873379,0.871901,0.855213
805,0.3118,0.402001,0.876459,0.867159,0.8744,0.863092,0.858394
920,0.2381,0.426927,0.883268,0.882034,0.879458,0.885017,0.866374
1035,0.2207,0.40562,0.892996,0.893444,0.891562,0.89618,0.877536
1150,0.2404,0.396063,0.892023,0.892994,0.891415,0.89531,0.876386


[32m[I 2022-01-15 16:56:05,335][0m Trial 14 finished with values: [0.39606335759162903, 0.8929941853464306] and parameters: {'learning_rate': 0.00011881550622928129, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.007702261636565342}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 4.032817587985909e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0026007797026749648}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.9631,0.57615,0.835603,0.832082,0.861086,0.813768,0.811719
462,0.6392,0.545739,0.842412,0.84007,0.860629,0.830946,0.819711
693,0.5033,0.55436,0.83463,0.837696,0.859317,0.83914,0.815146
924,0.5134,0.409071,0.875486,0.872061,0.871705,0.875697,0.857597
1155,0.4534,0.433146,0.854086,0.853915,0.85279,0.860809,0.833917
1386,0.3157,0.447363,0.876459,0.876109,0.88121,0.873494,0.858667
1617,0.3496,0.438245,0.876459,0.872376,0.875738,0.871,0.858531
1848,0.2699,0.441131,0.876459,0.875778,0.879589,0.872472,0.85839
2079,0.2557,0.426522,0.883268,0.882148,0.883485,0.881192,0.866263
2310,0.2769,0.424725,0.884241,0.882619,0.884215,0.881448,0.867357


[32m[I 2022-01-15 17:05:07,985][0m Trial 15 finished with values: [0.4090711176395416, 0.8720614437738435] and parameters: {'learning_rate': 4.032817587985909e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0026007797026749648}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 1.2550276911957455e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.001150460084740775}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.8801,1.496177,0.646887,0.496574,0.562372,0.518699,0.600098
114,1.2709,1.016783,0.761673,0.734069,0.818308,0.711603,0.727327
171,0.9307,0.806484,0.814202,0.809371,0.834235,0.803553,0.789343
228,0.7638,0.668409,0.829767,0.825356,0.837571,0.820979,0.805249
285,0.673,0.596886,0.844358,0.841154,0.843346,0.841925,0.82199
342,0.6028,0.55801,0.845331,0.843353,0.851415,0.837242,0.822646
399,0.5913,0.542884,0.842412,0.839665,0.842911,0.839921,0.819634
456,0.5398,0.516673,0.847276,0.84588,0.851428,0.841399,0.824962
513,0.5069,0.519939,0.851167,0.847621,0.85223,0.846715,0.829899
570,0.5012,0.503927,0.848249,0.84699,0.852298,0.842902,0.826067



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-15 17:12:59,774][0m Trial 16 finished with values: [0.50392746925354, 0.8469901270190693] and parameters: {'learning_rate': 1.2550276911957455e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.001150460084740775}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 1.8449039088393867e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0049310745514131995}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,1.4724,0.912029,0.768482,0.742368,0.807068,0.718377,0.735418
230,0.7982,0.617502,0.837549,0.835026,0.853401,0.82386,0.814203
345,0.5919,0.545129,0.835603,0.837071,0.847076,0.843284,0.815303
460,0.5374,0.46357,0.860895,0.860427,0.855931,0.8679,0.840977
575,0.4958,0.429114,0.867704,0.870297,0.870609,0.872372,0.848718
690,0.4313,0.422684,0.871595,0.871478,0.878083,0.867253,0.853149
805,0.4297,0.407648,0.872568,0.869739,0.871571,0.869592,0.85411
920,0.3616,0.391178,0.878405,0.879025,0.882399,0.876964,0.86073
1035,0.3494,0.401907,0.874514,0.87427,0.874203,0.876496,0.856601
1150,0.3386,0.388998,0.883268,0.8821,0.881592,0.883356,0.866319


[32m[I 2022-01-15 17:21:14,044][0m Trial 17 finished with values: [0.3889982998371124, 0.8820996622363566] and parameters: {'learning_rate': 1.8449039088393867e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0049310745514131995}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 5.749682854258281e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0023660924514645612}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.8939,0.6004,0.828794,0.824385,0.856313,0.808093,0.803863
462,0.6538,0.580582,0.839494,0.838003,0.847957,0.835444,0.817049
693,0.5421,0.600321,0.821984,0.82557,0.847017,0.827275,0.801187
924,0.5119,0.426378,0.874514,0.870095,0.872058,0.8766,0.856978
1155,0.4641,0.44302,0.857004,0.855328,0.858692,0.859035,0.837277
1386,0.3071,0.477775,0.873541,0.869149,0.875813,0.864523,0.855141
1617,0.3536,0.458968,0.875486,0.872621,0.873034,0.873914,0.85755
1848,0.2578,0.469087,0.883268,0.880753,0.884134,0.878021,0.866202
2079,0.2477,0.449516,0.886187,0.883167,0.885603,0.881177,0.869568
2310,0.278,0.442118,0.88716,0.882788,0.882802,0.883331,0.870736


[32m[I 2022-01-15 17:30:15,820][0m Trial 18 finished with values: [0.42637771368026733, 0.8700953988700026] and parameters: {'learning_rate': 5.749682854258281e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0023660924514645612}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 0.0003355274672670145, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.002987271061042641}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,0.94,0.730445,0.794747,0.783917,0.841615,0.757233,0.769138
114,0.7497,0.606436,0.811284,0.812678,0.813355,0.818257,0.784986
171,0.6176,0.715555,0.756809,0.74578,0.787661,0.750381,0.731361
228,0.571,0.479522,0.849222,0.850794,0.854997,0.85411,0.828456
285,0.5102,0.466681,0.835603,0.834413,0.832388,0.838896,0.812083
342,0.3361,0.450744,0.861868,0.859386,0.866455,0.855456,0.842024
399,0.3083,0.405104,0.867704,0.869015,0.87447,0.865403,0.848584
456,0.2672,0.414945,0.882296,0.881,0.882005,0.88226,0.865542
513,0.2393,0.395568,0.874514,0.87637,0.875602,0.879081,0.85665
570,0.2437,0.373773,0.890078,0.891184,0.892715,0.890594,0.874195


[32m[I 2022-01-15 17:38:05,112][0m Trial 19 finished with values: [0.37377315759658813, 0.8911836868441486] and parameters: {'learning_rate': 0.0003355274672670145, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.002987271061042641}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 0.00013256304149500438, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.004967929661974403}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.9053,0.616256,0.815175,0.807995,0.827515,0.807081,0.789181
462,0.7617,0.934078,0.752918,0.741419,0.792525,0.730469,0.722784
693,0.6061,0.57947,0.833658,0.835201,0.836996,0.85196,0.813477
924,0.5797,0.505464,0.847276,0.837812,0.84211,0.850603,0.826551
1155,0.5209,0.476141,0.855058,0.849523,0.85578,0.850913,0.83475
1386,0.3317,0.439108,0.875486,0.872029,0.875612,0.87133,0.857842
1617,0.3619,0.49035,0.867704,0.863087,0.882998,0.851281,0.849114
1848,0.2801,0.504467,0.883268,0.881526,0.88402,0.880967,0.866537
2079,0.2773,0.471403,0.888132,0.886829,0.891166,0.883619,0.871812
2310,0.276,0.430787,0.892023,0.891275,0.891063,0.892695,0.876368


[32m[I 2022-01-15 17:47:05,692][0m Trial 20 finished with values: [0.43078696727752686, 0.8912745499821985] and parameters: {'learning_rate': 0.00013256304149500438, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.004967929661974403}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 0.000172238623504376, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010268870328223733}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,0.9151,0.7012,0.789883,0.780546,0.829865,0.76376,0.762472
114,0.6704,0.568634,0.817121,0.817396,0.839618,0.810888,0.793032
171,0.529,0.439198,0.86284,0.85894,0.859749,0.864707,0.844031
228,0.5012,0.416514,0.864786,0.863048,0.857093,0.87452,0.845862
285,0.4353,0.414534,0.855058,0.858709,0.858823,0.862271,0.834672
342,0.2906,0.425467,0.865759,0.863748,0.871769,0.859499,0.846709
399,0.2867,0.399118,0.875486,0.873859,0.889797,0.863585,0.857815
456,0.2552,0.377181,0.886187,0.887519,0.889909,0.886347,0.869771
513,0.2062,0.380009,0.877432,0.880341,0.881235,0.882025,0.860038
570,0.21,0.355923,0.892023,0.893,0.891272,0.895258,0.876443


[32m[I 2022-01-15 17:54:56,685][0m Trial 21 finished with values: [0.3559230864048004, 0.8930002200433143] and parameters: {'learning_rate': 0.000172238623504376, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010268870328223733}. [0m


This is a new besttrial 21


fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 0.00020044673810961543, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0018181021763725067}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,0.9322,0.683724,0.785992,0.78754,0.824484,0.780037,0.759157
114,0.6861,0.56861,0.822957,0.829179,0.846525,0.827148,0.799942
171,0.5238,0.492227,0.849222,0.848602,0.856158,0.852179,0.829142
228,0.5171,0.427252,0.859922,0.86304,0.860407,0.868289,0.840075
285,0.437,0.431562,0.85214,0.856351,0.862184,0.858154,0.831978
342,0.3012,0.436321,0.874514,0.872154,0.874422,0.87351,0.856815
399,0.3176,0.39442,0.874514,0.870545,0.878218,0.870189,0.856909
456,0.26,0.397601,0.878405,0.878759,0.884019,0.876387,0.861072
513,0.2297,0.377479,0.882296,0.884438,0.882903,0.888388,0.865596
570,0.2365,0.364962,0.893969,0.892,0.891834,0.893483,0.878659


[32m[I 2022-01-15 18:03:02,635][0m Trial 22 finished with values: [0.3649615943431854, 0.8919997097201267] and parameters: {'learning_rate': 0.00020044673810961543, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0018181021763725067}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 0.00032246164665402163, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.005347133411657606}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,1.544,1.643452,0.354086,0.255534,0.272384,0.312795,0.332284
462,1.6536,1.843878,0.270428,0.166358,0.219502,0.227055,0.247623
693,2.0036,2.177122,0.116732,0.023229,0.01297,0.111111,0.0
924,2.1477,2.13927,0.163424,0.031215,0.018158,0.111111,0.0
1155,2.1306,2.122054,0.163424,0.031215,0.018158,0.111111,0.0
1386,2.1327,2.124609,0.163424,0.031215,0.018158,0.111111,0.0
1617,2.1297,2.120152,0.163424,0.031215,0.018158,0.111111,0.0
1848,2.1106,2.120849,0.163424,0.031215,0.018158,0.111111,0.0
2079,2.1256,2.120331,0.163424,0.031215,0.018158,0.111111,0.0
2310,2.1261,2.119646,0.163424,0.031215,0.018158,0.111111,0.0



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.

[32m[I 2022-01-15 18:11:56,372][0m Trial 23 finished with values: [1.6434515714645386, 0.25553365022421176] and parameters: {'learning_rate': 0.00032246164665402163, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.005347133411657606}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 4.760541377079855e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.001090572094988277}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.9287,0.622772,0.816148,0.803434,0.832459,0.78996,0.789659
462,0.6317,0.529448,0.851167,0.847798,0.863244,0.841721,0.829586
693,0.5051,0.540765,0.84144,0.841627,0.860558,0.839824,0.821463
924,0.5164,0.407167,0.876459,0.873398,0.870491,0.880775,0.858943
1155,0.4588,0.435573,0.846304,0.845538,0.845965,0.852785,0.825315
1386,0.3173,0.433326,0.88035,0.879842,0.882986,0.879301,0.863254
1617,0.3451,0.423613,0.875486,0.871147,0.877377,0.868151,0.85761
1848,0.2667,0.424348,0.878405,0.87697,0.878864,0.875498,0.860678
2079,0.2485,0.414805,0.890078,0.888397,0.887583,0.889376,0.874075
2310,0.2684,0.412844,0.888132,0.887359,0.887615,0.887529,0.871838


[32m[I 2022-01-15 18:20:57,986][0m Trial 24 finished with values: [0.40716665983200073, 0.8733980821080762] and parameters: {'learning_rate': 4.760541377079855e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.001090572094988277}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 1.083703964439121e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.004508058716933232}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.9236,1.587157,0.628405,0.465702,0.560322,0.496298,0.579845
114,1.3571,1.109181,0.73249,0.676031,0.807331,0.662351,0.693847
171,1.0137,0.882955,0.801556,0.792182,0.830066,0.782859,0.775077
228,0.8332,0.729481,0.824903,0.818719,0.837725,0.8096,0.799448
285,0.733,0.647767,0.835603,0.830773,0.836727,0.829776,0.811903
342,0.6549,0.604157,0.84144,0.839062,0.847485,0.833093,0.818218
399,0.6379,0.583692,0.838521,0.832991,0.839458,0.829832,0.815027
456,0.5853,0.557259,0.840467,0.837234,0.844141,0.831653,0.817078
513,0.5526,0.556954,0.843385,0.839249,0.847559,0.835256,0.820956
570,0.5428,0.542699,0.84144,0.839954,0.846794,0.834864,0.818237



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-15 18:28:50,104][0m Trial 25 finished with values: [0.5426994562149048, 0.8399541957806403] and parameters: {'learning_rate': 1.083703964439121e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.004508058716933232}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 2.1419267534062015e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.006235933483089628}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.6778,1.127669,0.724708,0.641461,0.814624,0.639648,0.686553
114,0.9821,0.747526,0.816148,0.807007,0.847752,0.790006,0.790068
171,0.6921,0.59887,0.838521,0.839129,0.849754,0.838999,0.816653
228,0.5895,0.52098,0.845331,0.844256,0.854495,0.841602,0.823027
285,0.5314,0.471814,0.857977,0.858783,0.858646,0.861139,0.837603
342,0.4634,0.447169,0.865759,0.86522,0.875016,0.85794,0.846087
399,0.4668,0.444335,0.865759,0.862464,0.864746,0.863113,0.846384
456,0.4144,0.413282,0.868677,0.868232,0.870683,0.866297,0.849531
513,0.3773,0.427342,0.863813,0.860152,0.863427,0.861694,0.844707
570,0.3876,0.407264,0.872568,0.869697,0.872064,0.868232,0.853984


[32m[I 2022-01-15 18:36:42,052][0m Trial 26 finished with values: [0.40726438164711, 0.8696965843950681] and parameters: {'learning_rate': 2.1419267534062015e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.006235933483089628}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 2.152701219734218e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0022509707331832057}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.6759,1.125436,0.727626,0.647813,0.808705,0.644044,0.689783
114,0.9793,0.74467,0.817121,0.80868,0.84612,0.793522,0.791154
171,0.6896,0.597855,0.839494,0.839799,0.850503,0.83966,0.817779
228,0.5875,0.519978,0.845331,0.843755,0.853953,0.841527,0.823077
285,0.5298,0.471038,0.857004,0.857641,0.857043,0.860403,0.836505
342,0.462,0.446795,0.865759,0.86522,0.875016,0.85794,0.846087
399,0.4658,0.44402,0.865759,0.862464,0.864746,0.863113,0.846384
456,0.4131,0.412773,0.86965,0.868937,0.871406,0.866958,0.850637
513,0.3762,0.427063,0.86284,0.859483,0.862749,0.861033,0.843592
570,0.3869,0.406827,0.86965,0.866842,0.868722,0.86582,0.850634


[32m[I 2022-01-15 18:44:34,059][0m Trial 27 finished with values: [0.4068266451358795, 0.8668419143133054] and parameters: {'learning_rate': 2.152701219734218e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0022509707331832057}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 5.4633369842517265e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0011929246103941017}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.903,0.646402,0.816148,0.808256,0.843595,0.789894,0.789773
462,0.6588,0.614913,0.825875,0.823903,0.842189,0.819458,0.802158
693,0.5206,0.558496,0.830739,0.832902,0.848873,0.835623,0.809949
924,0.5037,0.421128,0.875486,0.873855,0.873557,0.877806,0.857633
1155,0.4527,0.460451,0.853113,0.850697,0.854393,0.855916,0.833199
1386,0.3133,0.468755,0.874514,0.87128,0.877987,0.868359,0.856721
1617,0.3456,0.435139,0.883268,0.878851,0.881192,0.87903,0.866453
1848,0.2617,0.452224,0.881323,0.87972,0.882396,0.877866,0.864034
2079,0.2442,0.435129,0.891051,0.890913,0.891023,0.890871,0.875155
2310,0.2618,0.425438,0.892023,0.890258,0.889595,0.891361,0.876359


[32m[I 2022-01-15 18:53:36,190][0m Trial 28 finished with values: [0.42112767696380615, 0.8738545203062507] and parameters: {'learning_rate': 5.4633369842517265e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0011929246103941017}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 1.4685466002872386e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.00199180792849344}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.8275,1.390078,0.670233,0.542478,0.690773,0.55586,0.625933
114,1.1853,0.93089,0.782101,0.766623,0.830727,0.742826,0.750808
171,0.8505,0.732326,0.82393,0.820615,0.837278,0.817742,0.800218
228,0.7006,0.613637,0.836576,0.832137,0.842906,0.828298,0.813059
285,0.621,0.552259,0.84144,0.842554,0.844123,0.843829,0.818679
342,0.5555,0.517539,0.850195,0.846487,0.854737,0.840009,0.82824
399,0.5489,0.507413,0.842412,0.839568,0.841706,0.841317,0.819719
456,0.4971,0.479553,0.854086,0.853613,0.858192,0.84985,0.832769
513,0.463,0.486403,0.853113,0.849945,0.854257,0.849822,0.832251
570,0.4635,0.469061,0.850195,0.84885,0.854871,0.844135,0.828273



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-15 19:01:27,769][0m Trial 29 finished with values: [0.4690606892108917, 0.8488503305804552] and parameters: {'learning_rate': 1.4685466002872386e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.00199180792849344}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 3.1847163973100386e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0011189452385178128}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.4966,0.885745,0.786965,0.768499,0.832201,0.744819,0.756285
114,0.8147,0.61755,0.826848,0.821434,0.844926,0.809867,0.801975
171,0.581,0.516845,0.84144,0.838685,0.8473,0.839862,0.819941
228,0.5185,0.462976,0.858949,0.858285,0.864527,0.858423,0.838811
285,0.4686,0.432187,0.864786,0.866684,0.863958,0.872129,0.845572
342,0.3909,0.416823,0.870623,0.871399,0.881139,0.864517,0.851923
399,0.402,0.407278,0.864786,0.862588,0.866798,0.86132,0.845329
456,0.347,0.380765,0.876459,0.874951,0.878555,0.872869,0.858439
513,0.3119,0.400832,0.870623,0.868573,0.871438,0.870556,0.852573
570,0.3275,0.374927,0.88035,0.8774,0.879092,0.876488,0.862898


[32m[I 2022-01-15 19:09:19,402][0m Trial 30 finished with values: [0.3749266266822815, 0.8773995537145849] and parameters: {'learning_rate': 3.1847163973100386e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0011189452385178128}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 2.5967960105621963e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0011191875225470549}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.594,1.011364,0.756809,0.710316,0.82267,0.692102,0.722195
114,0.8958,0.681089,0.819066,0.811484,0.843386,0.797167,0.793202
171,0.6319,0.552899,0.84144,0.83925,0.848088,0.841287,0.820076
228,0.5509,0.488491,0.857977,0.855459,0.861905,0.855614,0.837603
285,0.4973,0.447965,0.858949,0.860826,0.861236,0.86285,0.838775
342,0.4252,0.428985,0.870623,0.870285,0.880672,0.862251,0.851675
399,0.4331,0.425012,0.86284,0.860717,0.86407,0.859569,0.843013
456,0.3813,0.394524,0.875486,0.874849,0.879192,0.871429,0.857285
513,0.3439,0.411852,0.866732,0.86292,0.867343,0.863905,0.848133
570,0.3563,0.389227,0.873541,0.871033,0.874307,0.868747,0.855062


[32m[I 2022-01-15 19:17:10,834][0m Trial 31 finished with values: [0.3892270028591156, 0.8710325129072138] and parameters: {'learning_rate': 2.5967960105621963e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0011191875225470549}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 1.2048980331680778e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0018551872089604595}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.893,1.522434,0.642996,0.488095,0.557354,0.512483,0.595686
114,1.2959,1.042003,0.756809,0.723501,0.81873,0.703414,0.72176
171,0.9534,0.827569,0.811284,0.805427,0.834824,0.797869,0.786005
228,0.7818,0.684922,0.830739,0.827068,0.840641,0.821114,0.806309
285,0.6887,0.610344,0.838521,0.833292,0.837899,0.83326,0.815353
342,0.6161,0.57015,0.842412,0.840557,0.849291,0.834112,0.81931
399,0.6034,0.552866,0.84144,0.836989,0.840684,0.836837,0.818498
456,0.5518,0.527424,0.847276,0.84588,0.851428,0.841399,0.824962
513,0.5191,0.529723,0.849222,0.845889,0.852658,0.842896,0.827624
570,0.5122,0.514074,0.846304,0.844334,0.85073,0.839585,0.823834



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-15 19:25:02,705][0m Trial 32 finished with values: [0.514074444770813, 0.8443344877096152] and parameters: {'learning_rate': 1.2048980331680778e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0018551872089604595}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 6.351197562683082e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.007977327790294097}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.8845,0.619292,0.828794,0.827044,0.849025,0.813808,0.804208
462,0.6608,0.587462,0.837549,0.830756,0.85139,0.821148,0.814286
693,0.529,0.536484,0.84144,0.840628,0.856681,0.839483,0.821111
924,0.5185,0.439193,0.861868,0.853326,0.849079,0.86753,0.842962
1155,0.4682,0.457888,0.849222,0.846033,0.85166,0.850377,0.828668
1386,0.309,0.500009,0.864786,0.862047,0.868205,0.860671,0.845668
1617,0.3443,0.42466,0.883268,0.879783,0.880671,0.879873,0.866351
1848,0.2571,0.454368,0.884241,0.881963,0.884789,0.880178,0.867402
2079,0.2327,0.446791,0.886187,0.883656,0.882976,0.884661,0.869652
2310,0.2737,0.436206,0.886187,0.883035,0.881196,0.885384,0.869675


[32m[I 2022-01-15 19:34:05,989][0m Trial 33 finished with values: [0.4246603548526764, 0.8797830172290745] and parameters: {'learning_rate': 6.351197562683082e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.007977327790294097}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 0.0003677108673523658, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0037933405191830382}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,1.7922,1.950184,0.407588,0.260351,0.247627,0.322383,0.359141
462,2.1184,2.126381,0.163424,0.031215,0.018158,0.111111,0.0
693,2.1385,2.149251,0.116732,0.023229,0.01297,0.111111,0.0
924,2.1469,2.13982,0.163424,0.031215,0.018158,0.111111,0.0
1155,2.1317,2.121396,0.163424,0.031215,0.018158,0.111111,0.0
1386,2.1303,2.122501,0.146887,0.028461,0.016321,0.111111,0.0
1617,2.1308,2.121039,0.163424,0.031215,0.018158,0.111111,0.0
1848,2.1089,2.120683,0.163424,0.031215,0.018158,0.111111,0.0
2079,2.1266,2.120622,0.163424,0.031215,0.018158,0.111111,0.0
2310,2.1283,2.119699,0.163424,0.031215,0.018158,0.111111,0.0



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.

[32m[I 2022-01-15 19:42:59,810][0m Trial 34 finished with values: [1.950183629989624, 0.260351105624894] and parameters: {'learning_rate': 0.0003677108673523658, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0037933405191830382}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 0.00031593983477889575, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.001219998614440266}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,1.0289,0.882733,0.7607,0.720182,0.774615,0.730294,0.728672
230,0.8538,0.722972,0.812257,0.814176,0.823513,0.809805,0.785383
345,0.7982,0.727318,0.801556,0.789356,0.791362,0.808126,0.776591
460,0.7411,0.60669,0.816148,0.808047,0.804118,0.826672,0.791245
575,0.7251,0.670443,0.801556,0.784376,0.811423,0.791296,0.776312
690,0.4639,0.687381,0.828794,0.813418,0.835409,0.806764,0.804634
805,0.464,0.53046,0.853113,0.849024,0.861955,0.845241,0.832578
920,0.3604,0.580701,0.840467,0.834678,0.850517,0.828296,0.818269
1035,0.3594,0.460575,0.866732,0.865113,0.867799,0.863738,0.847448
1150,0.349,0.44499,0.86965,0.8693,0.868403,0.871055,0.850761


[32m[I 2022-01-15 19:51:10,955][0m Trial 35 finished with values: [0.4449896812438965, 0.8692995446928214] and parameters: {'learning_rate': 0.00031593983477889575, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.001219998614440266}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 0.000256147424345981, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.00761271716050967}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,1.1,1.267265,0.594358,0.491476,0.648555,0.561822,0.552591
462,1.02,0.787246,0.788911,0.763502,0.803556,0.754618,0.759686
693,0.8564,1.096997,0.726654,0.709583,0.778743,0.726428,0.70197
924,0.8683,0.707941,0.828794,0.818175,0.824273,0.81561,0.804099
1155,0.7127,0.629557,0.82393,0.816507,0.852908,0.804695,0.799372
1386,0.5725,0.739032,0.821984,0.805944,0.823808,0.815344,0.799027
1617,0.5607,0.73747,0.829767,0.812012,0.859871,0.791778,0.806255
1848,0.4376,0.602457,0.865759,0.860939,0.870229,0.855164,0.846385
2079,0.4181,0.519117,0.879377,0.87513,0.875077,0.876457,0.861944
2310,0.4342,0.50201,0.881323,0.87902,0.8821,0.877762,0.864228


[32m[I 2022-01-15 20:00:09,383][0m Trial 36 finished with values: [0.5020099878311157, 0.8790195874714518] and parameters: {'learning_rate': 0.000256147424345981, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.00761271716050967}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 0.00013769914151498766, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0072034774212240355}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.8979,0.660486,0.814202,0.804454,0.830171,0.793669,0.788105
230,0.6834,0.549637,0.839494,0.83648,0.863593,0.8203,0.816098
345,0.5172,0.708106,0.804475,0.803913,0.829679,0.802633,0.78124
460,0.5278,0.430301,0.863813,0.865839,0.869416,0.866788,0.844832
575,0.4538,0.446196,0.856031,0.858991,0.863366,0.861067,0.836344
690,0.3166,0.461163,0.872568,0.869078,0.877946,0.863279,0.854077
805,0.3082,0.410681,0.885214,0.876562,0.881132,0.875102,0.868541
920,0.2408,0.42447,0.886187,0.886525,0.885351,0.88962,0.869996
1035,0.2267,0.40025,0.890078,0.890078,0.891415,0.889947,0.874281
1150,0.237,0.387753,0.890078,0.886138,0.885272,0.888548,0.874199


[32m[I 2022-01-15 20:08:24,110][0m Trial 37 finished with values: [0.3877527713775635, 0.8861382268524379] and parameters: {'learning_rate': 0.00013769914151498766, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0072034774212240355}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 0.00010015213255870536, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.001099418066261719}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.9038,0.558834,0.831712,0.824398,0.839711,0.819332,0.807343
230,0.659,0.547987,0.835603,0.834915,0.866644,0.817532,0.811964
345,0.5196,0.57971,0.81323,0.815146,0.841453,0.818552,0.792941
460,0.5041,0.431107,0.865759,0.862954,0.862445,0.870996,0.84679
575,0.4475,0.429648,0.863813,0.862441,0.867479,0.863834,0.844879
690,0.2867,0.467448,0.863813,0.861207,0.869443,0.856281,0.844353
805,0.3192,0.406677,0.885214,0.882346,0.885593,0.881984,0.868655
920,0.2414,0.40951,0.881323,0.880979,0.881122,0.882299,0.864305
1035,0.232,0.39987,0.884241,0.883428,0.882091,0.886878,0.867771
1150,0.2346,0.377444,0.892023,0.891124,0.887718,0.895326,0.876444


[32m[I 2022-01-15 20:16:39,162][0m Trial 38 finished with values: [0.37744390964508057, 0.8911237125335929] and parameters: {'learning_rate': 0.00010015213255870536, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.001099418066261719}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 2.8308697844372543e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.003617725521752757}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.5531,0.962958,0.764591,0.721998,0.818902,0.699246,0.730571
114,0.8606,0.652388,0.824903,0.818169,0.846866,0.805262,0.79993
171,0.6104,0.535856,0.840467,0.837412,0.845669,0.839629,0.818995
228,0.5373,0.477606,0.85214,0.849908,0.857245,0.849945,0.830982
285,0.4848,0.441251,0.865759,0.8679,0.866596,0.871846,0.846599
342,0.4102,0.424407,0.86965,0.870384,0.880046,0.863261,0.850652
399,0.4196,0.419295,0.863813,0.862131,0.868085,0.859262,0.8442
456,0.3671,0.38872,0.875486,0.874676,0.878983,0.87125,0.857271
513,0.3306,0.406048,0.86965,0.865519,0.867825,0.868056,0.851462
570,0.3436,0.382956,0.873541,0.871465,0.874274,0.869443,0.855066


[32m[I 2022-01-15 20:24:31,912][0m Trial 39 finished with values: [0.3829563558101654, 0.8714652198717863] and parameters: {'learning_rate': 2.8308697844372543e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.003617725521752757}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 3.156553455846164e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.007850001510971899}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.5007,0.888741,0.785992,0.767644,0.829408,0.744232,0.75505
114,0.8191,0.621745,0.822957,0.818379,0.842398,0.806592,0.797608
171,0.5836,0.518237,0.84144,0.839767,0.849922,0.84029,0.820075
228,0.5195,0.464458,0.857004,0.855115,0.861216,0.85575,0.836584
285,0.4698,0.432379,0.864786,0.866684,0.863958,0.872129,0.845572
342,0.392,0.417535,0.868677,0.869876,0.879884,0.862941,0.849731
399,0.4034,0.408368,0.865759,0.864068,0.867563,0.863377,0.846445
456,0.3485,0.381761,0.875486,0.872644,0.875662,0.870811,0.857316
513,0.313,0.400949,0.872568,0.870275,0.87279,0.872132,0.854725
570,0.3285,0.375761,0.879377,0.876698,0.87845,0.875752,0.861786


[32m[I 2022-01-15 20:32:27,445][0m Trial 40 finished with values: [0.3757607936859131, 0.8766981943741903] and parameters: {'learning_rate': 3.156553455846164e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.007850001510971899}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 0.00019711341917185828, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.00434606717303551}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.9015,0.665829,0.789883,0.775314,0.795842,0.778472,0.760829
230,0.7307,0.594342,0.836576,0.831359,0.840064,0.827125,0.813189
345,0.5522,0.670097,0.800584,0.802674,0.830056,0.803608,0.778052
460,0.5349,0.508813,0.836576,0.82569,0.836,0.836556,0.814229
575,0.5068,0.456953,0.849222,0.845919,0.858457,0.844604,0.828448
690,0.3228,0.495104,0.863813,0.85961,0.870978,0.852103,0.844212
805,0.3074,0.432666,0.857977,0.843262,0.85689,0.842675,0.837752
920,0.2456,0.445056,0.873541,0.872686,0.872445,0.874818,0.855492
1035,0.2425,0.408766,0.879377,0.875489,0.874526,0.878036,0.862114
1150,0.2164,0.394392,0.892023,0.887051,0.885896,0.890121,0.876523


[32m[I 2022-01-15 20:40:41,694][0m Trial 41 finished with values: [0.39439237117767334, 0.8870514994307198] and parameters: {'learning_rate': 0.00019711341917185828, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.00434606717303551}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 2.3479128801130728e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0013700440605169657}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,1.3582,0.787339,0.800584,0.782926,0.830783,0.760313,0.771839
230,0.7278,0.560475,0.842412,0.841,0.858953,0.830222,0.819896
345,0.5407,0.541202,0.82393,0.83147,0.848375,0.837175,0.803882
460,0.5089,0.441246,0.865759,0.864196,0.859125,0.871541,0.846476
575,0.4674,0.417371,0.865759,0.868218,0.869651,0.87122,0.846874
690,0.3905,0.409215,0.876459,0.877215,0.880974,0.875448,0.858655
805,0.3972,0.399932,0.866732,0.864935,0.871194,0.861502,0.847444
920,0.3235,0.379454,0.878405,0.880622,0.884996,0.877776,0.860672
1035,0.3116,0.394314,0.876459,0.876772,0.877681,0.879188,0.859042
1150,0.3104,0.379236,0.876459,0.876965,0.879993,0.874807,0.858433


[32m[I 2022-01-15 20:48:58,264][0m Trial 42 finished with values: [0.3792363107204437, 0.8769651192212583] and parameters: {'learning_rate': 2.3479128801130728e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0013700440605169657}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 6.783683409239324e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.002820596819365308}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.8835,0.628293,0.826848,0.819797,0.848,0.806906,0.801973
462,0.6708,0.518261,0.85214,0.845236,0.855097,0.841501,0.830739
693,0.5181,0.601773,0.829767,0.822934,0.847525,0.815415,0.806885
924,0.5256,0.461832,0.857004,0.850237,0.848747,0.861417,0.837348
1155,0.4667,0.451528,0.851167,0.851859,0.854591,0.855788,0.830715
1386,0.3105,0.504191,0.858949,0.858901,0.87286,0.851167,0.839138
1617,0.3472,0.452528,0.882296,0.87851,0.87964,0.878708,0.865304
1848,0.2618,0.469258,0.883268,0.880453,0.88313,0.878038,0.866184
2079,0.2428,0.459104,0.888132,0.88587,0.886435,0.886022,0.871901
2310,0.2563,0.456497,0.88716,0.883895,0.884641,0.883733,0.870728


[32m[I 2022-01-15 20:58:01,937][0m Trial 43 finished with values: [0.4515276551246643, 0.8518589789852986] and parameters: {'learning_rate': 6.783683409239324e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.002820596819365308}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 4.977485338291715e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0025767844011602447}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.9207,0.607538,0.820039,0.813893,0.83979,0.798992,0.793986
462,0.6518,0.537461,0.848249,0.848092,0.861521,0.84068,0.826605
693,0.5124,0.546819,0.832685,0.833716,0.856212,0.830085,0.811596
924,0.5149,0.426686,0.870623,0.865718,0.860677,0.875906,0.852374
1155,0.4607,0.456704,0.847276,0.845073,0.851138,0.84797,0.826578
1386,0.3153,0.461862,0.874514,0.872534,0.884507,0.865046,0.856719
1617,0.3472,0.443765,0.875486,0.871244,0.879136,0.866823,0.857541
1848,0.2726,0.451167,0.881323,0.877243,0.880679,0.874374,0.864003
2079,0.2527,0.436894,0.888132,0.885457,0.885397,0.886011,0.871855
2310,0.2709,0.431388,0.884241,0.881755,0.881011,0.882844,0.867402


[32m[I 2022-01-15 21:07:05,650][0m Trial 44 finished with values: [0.42668649554252625, 0.8657177816120842] and parameters: {'learning_rate': 4.977485338291715e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0025767844011602447}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 0.00011154131574157555, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.003862071128417358}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.9158,0.616075,0.817121,0.812547,0.82485,0.820014,0.792792
230,0.6612,0.501558,0.847276,0.846967,0.863158,0.837068,0.82497
345,0.5248,0.644376,0.798638,0.8046,0.833152,0.811855,0.779029
460,0.5068,0.410746,0.866732,0.867299,0.864606,0.872355,0.847601
575,0.4581,0.407682,0.86965,0.868682,0.863664,0.877388,0.851331
690,0.2952,0.459798,0.868677,0.868321,0.876638,0.865433,0.850161
805,0.3102,0.402271,0.879377,0.874842,0.877056,0.875078,0.861863
920,0.2453,0.410862,0.883268,0.885438,0.885933,0.88647,0.866527
1035,0.2304,0.409196,0.889105,0.889256,0.888062,0.892279,0.873306
1150,0.2302,0.382799,0.890078,0.888017,0.886188,0.890852,0.87417


[32m[I 2022-01-15 21:15:20,914][0m Trial 45 finished with values: [0.38279905915260315, 0.8880169915853844] and parameters: {'learning_rate': 0.00011154131574157555, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.003862071128417358}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 0.00016978342703561063, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.003446054733435517}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.9442,0.705407,0.804475,0.786093,0.820149,0.778965,0.776384
462,0.8161,0.667948,0.827821,0.819172,0.834031,0.810867,0.803055
693,0.6426,0.607259,0.839494,0.832888,0.848497,0.831087,0.81823
924,0.6219,0.575584,0.827821,0.818122,0.819532,0.836359,0.804712
1155,0.5347,0.482022,0.850195,0.847625,0.853475,0.846673,0.829321
1386,0.4046,0.566252,0.846304,0.844311,0.869328,0.833512,0.825724
1617,0.3757,0.55544,0.855058,0.844151,0.859084,0.839087,0.834677
1848,0.283,0.558895,0.877432,0.87216,0.874243,0.871302,0.859688
2079,0.3013,0.522092,0.882296,0.875256,0.877163,0.874343,0.865128
2310,0.2992,0.492093,0.883268,0.877539,0.877159,0.879505,0.866345


[32m[I 2022-01-15 21:24:23,062][0m Trial 46 finished with values: [0.4820215106010437, 0.8476252807930564] and parameters: {'learning_rate': 0.00016978342703561063, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.003446054733435517}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 3.3780419094054394e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0018251677024837111}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.4674,0.856522,0.790856,0.77423,0.835616,0.749999,0.760807
114,0.7929,0.597949,0.829767,0.824545,0.846315,0.813425,0.805194
171,0.5678,0.510513,0.84144,0.837934,0.846436,0.840417,0.820205
228,0.5104,0.456519,0.855058,0.853137,0.860651,0.853444,0.834453
285,0.4627,0.428997,0.86284,0.865798,0.862796,0.871819,0.843414
342,0.3828,0.413817,0.868677,0.86847,0.877193,0.862588,0.849686
399,0.3931,0.401852,0.864786,0.862953,0.867315,0.86132,0.845302
456,0.3388,0.376877,0.877432,0.87433,0.877888,0.872209,0.859545
513,0.3021,0.398117,0.871595,0.869567,0.871408,0.872312,0.853662
570,0.3205,0.371092,0.882296,0.879576,0.880858,0.879099,0.865132


[32m[I 2022-01-15 21:32:14,061][0m Trial 47 finished with values: [0.3710920810699463, 0.879575913721539] and parameters: {'learning_rate': 3.3780419094054394e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.0018251677024837111}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 5.18776645418506e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0011790761190837989}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,1.0343,0.614489,0.818093,0.809462,0.829388,0.803107,0.792443
230,0.6343,0.506813,0.848249,0.849186,0.868844,0.836953,0.826784
345,0.4805,0.576566,0.810311,0.814402,0.841962,0.820205,0.790932
460,0.4917,0.40686,0.874514,0.874911,0.873002,0.879006,0.856504
575,0.4305,0.39649,0.866732,0.868793,0.867519,0.872378,0.847753
690,0.3139,0.40211,0.879377,0.877894,0.880506,0.877358,0.862041
805,0.3172,0.397448,0.88035,0.87445,0.876621,0.875333,0.863185
920,0.2517,0.39487,0.889105,0.889979,0.890334,0.891013,0.873192
1035,0.2435,0.398824,0.885214,0.88435,0.882965,0.887974,0.868894
1150,0.2499,0.373212,0.892996,0.892674,0.890367,0.895518,0.877514


[32m[I 2022-01-15 21:40:28,947][0m Trial 48 finished with values: [0.3732120990753174, 0.8926737367602071] and parameters: {'learning_rate': 5.18776645418506e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0011790761190837989}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 7.353538095255206e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.001085155447679503}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.8816,0.553728,0.84144,0.840238,0.847434,0.837015,0.818602
462,0.6727,0.552383,0.843385,0.840817,0.844219,0.841285,0.82118
693,0.5246,0.639638,0.821012,0.817872,0.851779,0.812921,0.799651
924,0.5219,0.443398,0.865759,0.860909,0.858652,0.873336,0.847319
1155,0.476,0.462459,0.854086,0.850956,0.858351,0.85383,0.834487
1386,0.3175,0.464771,0.877432,0.876262,0.882856,0.872334,0.859941
1617,0.3275,0.447832,0.888132,0.886505,0.886361,0.887988,0.87196
1848,0.2542,0.475394,0.892023,0.890855,0.890868,0.891218,0.876343
2079,0.2608,0.449864,0.888132,0.886787,0.887656,0.886994,0.871895
2310,0.2709,0.443282,0.892996,0.890072,0.889448,0.891942,0.87752


[32m[I 2022-01-15 21:49:30,053][0m Trial 49 finished with values: [0.4432816803455353, 0.890072294493938] and parameters: {'learning_rate': 7.353538095255206e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.001085155447679503}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 1.3421076713704256e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.009881136915710437}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.8583,1.45123,0.654669,0.512234,0.677436,0.530043,0.60842
114,1.2368,0.981038,0.775292,0.757791,0.827371,0.734055,0.742935
171,0.8966,0.775405,0.817121,0.81378,0.835063,0.809475,0.792726
228,0.7369,0.645209,0.83463,0.829726,0.841727,0.825484,0.810854
285,0.6508,0.577375,0.84144,0.839747,0.84227,0.839769,0.818567
342,0.5823,0.540937,0.846304,0.842071,0.84988,0.83599,0.823769
399,0.5728,0.52741,0.845331,0.843346,0.845863,0.844298,0.822985
456,0.5213,0.500924,0.849222,0.848859,0.853433,0.845144,0.8272
513,0.4875,0.505788,0.854086,0.851352,0.855355,0.851331,0.833343
570,0.4849,0.489167,0.847276,0.845615,0.851628,0.840952,0.824935



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-15 21:57:22,555][0m Trial 50 finished with values: [0.48916664719581604, 0.8456147188897272] and parameters: {'learning_rate': 1.3421076713704256e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.009881136915710437}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 8)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=693.0, min_trials=700
params: {'learning_rate': 5.749682854258281e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0023660924514645612}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
231,0.8939,0.6004,0.828794,0.824385,0.856313,0.808093,0.803863
462,0.6538,0.580582,0.839494,0.838003,0.847957,0.835444,0.817049
693,0.5421,0.600321,0.821984,0.82557,0.847017,0.827275,0.801187
924,0.5119,0.426378,0.874514,0.870095,0.872058,0.8766,0.856978
1155,0.4641,0.44302,0.857004,0.855328,0.858692,0.859035,0.837277
1386,0.3071,0.477775,0.873541,0.869149,0.875813,0.864523,0.855141
1617,0.3536,0.458968,0.875486,0.872621,0.873034,0.873914,0.85755
1848,0.2578,0.469087,0.883268,0.880753,0.884134,0.878021,0.866202
2079,0.2477,0.449516,0.886187,0.883167,0.885603,0.881177,0.869568
2310,0.278,0.442118,0.88716,0.882788,0.882802,0.883331,0.870736


[32m[I 2022-01-15 22:06:24,436][0m Trial 51 finished with values: [0.42637771368026733, 0.8700953988700026] and parameters: {'learning_rate': 5.749682854258281e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 8, 'weight_decay': 0.0023660924514645612}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 0.0003355274672670145, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.002987271061042641}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,0.94,0.730445,0.794747,0.783917,0.841615,0.757233,0.769138
114,0.7497,0.606436,0.811284,0.812678,0.813355,0.818257,0.784986
171,0.6176,0.715555,0.756809,0.74578,0.787661,0.750381,0.731361
228,0.571,0.479522,0.849222,0.850794,0.854997,0.85411,0.828456
285,0.5102,0.466681,0.835603,0.834413,0.832388,0.838896,0.812083
342,0.3361,0.450744,0.861868,0.859386,0.866455,0.855456,0.842024
399,0.3083,0.405104,0.867704,0.869015,0.87447,0.865403,0.848584
456,0.2672,0.414945,0.882296,0.881,0.882005,0.88226,0.865542
513,0.2393,0.395568,0.874514,0.87637,0.875602,0.879081,0.85665
570,0.2437,0.373773,0.890078,0.891184,0.892715,0.890594,0.874195


[32m[I 2022-01-15 22:14:14,265][0m Trial 52 finished with values: [0.37377315759658813, 0.8911836868441486] and parameters: {'learning_rate': 0.0003355274672670145, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.002987271061042641}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 32)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=171.0, min_trials=700
params: {'learning_rate': 3.3780419094054394e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.00199180792849344}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,1.4675,0.856826,0.790856,0.77423,0.835616,0.749999,0.760807
114,0.7931,0.597858,0.828794,0.822912,0.84548,0.811767,0.804078
171,0.5677,0.50992,0.843385,0.839913,0.848734,0.841814,0.822363
228,0.5104,0.456393,0.856031,0.854594,0.861941,0.854534,0.835529
285,0.4625,0.428853,0.86284,0.865798,0.862796,0.871819,0.843414
342,0.3827,0.41339,0.868677,0.86847,0.877193,0.862588,0.849686
399,0.3929,0.401583,0.864786,0.862953,0.867315,0.86132,0.845302
456,0.3385,0.37699,0.877432,0.874833,0.877667,0.873206,0.859561
513,0.3021,0.397914,0.871595,0.869567,0.871408,0.872312,0.853662
570,0.3206,0.371048,0.881323,0.878747,0.879879,0.878437,0.864028


[32m[I 2022-01-15 22:22:05,779][0m Trial 53 finished with values: [0.3710477948188782, 0.8787473073057632] and parameters: {'learning_rate': 3.3780419094054394e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 32, 'weight_decay': 0.00199180792849344}. [0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=700
params: {'learning_rate': 0.00012952108947609435, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.004050497059826461}


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.8916,0.694186,0.798638,0.791896,0.819441,0.786111,0.771607
230,0.6733,0.589224,0.824903,0.815928,0.834841,0.806692,0.799738
345,0.5325,0.56221,0.824903,0.816604,0.841623,0.814764,0.802841
460,0.5114,0.428867,0.86284,0.862604,0.86409,0.865869,0.843232
575,0.4646,0.442442,0.848249,0.849544,0.851079,0.857312,0.827937
690,0.3245,0.431394,0.881323,0.879675,0.878921,0.882416,0.864421
805,0.307,0.418317,0.870623,0.854641,0.865376,0.853774,0.852008
920,0.2492,0.411139,0.885214,0.883369,0.885369,0.883398,0.868838


In [None]:
!ls -lahtr $project_name

## Hyperparameter Tuning

https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer.hyperparameter_search

In [None]:
# disable transformer warnings like "Some weights of the model checkpoint ..."
logging.set_verbosity_error()


training_args = TrainingArguments(
    output_dir=str(project_name),
    report_to=[],
    log_level="error",
    disable_tqdm=False,

    evaluation_strategy="steps",
    # eval_steps=eval_steps,
    save_strategy="steps",
    # save_steps=eval_steps,
    # load_best_model_at_end=False,
    # metric_for_best_model="eval_loss",
    # greater_is_better=False,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)


# Default objective is the sum of all metrics
# when metrics are provided, so we have to maximize it.
# best = trainer.hyperparameter_search(
#     hp_space=hp_space,
#     compute_objective=objective,
#     n_trials=2
# )