<a href="https://colab.research.google.com/github/goerlitz/nlp-classification/blob/main/notebooks/10kGNAD/colab/21c_10kGNAD_huggingface_basic_optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter Optimization with HuggingFace Transformers

Adapted from https://huggingface.co/docs/transformers/custom_datasets#sequence-classification-with-imdb-reviews

Things we need
* a tokenizer
* tokenized input data
* a pretrained model
* evaluation metrics
* training parameters
* a Trainer instance

Notes
* [class labels can be included in the model config](https://github.com/huggingface/transformers/pull/2945#issuecomment-781986506) (a bit hacky)
* [fp16 is disabled on tesla P100 GPU in pytorch](https://discuss.pytorch.org/t/cnn-fp16-slower-than-fp32-on-tesla-p100/12146)

## Prerequisites

In [1]:
# checkpoint = "distilbert-base-german-cased"

# checkpoint = "deepset/gbert-base"

checkpoint = "deepset/gelectra-base"

project_name = f'10kgnad_hf__{checkpoint.replace("/", "_")}'

### Connect Google Drive

Will be used to save results

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [3]:
from pathlib import Path

# define model path
root_path = Path('/content/gdrive/My Drive/')
base_path = root_path / 'Colab Notebooks/nlp-classification/'
model_path = base_path / 'models'

## Check GPU

In [4]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Sat Jan  1 14:56:30 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0    41W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Install Packages

In [5]:
%%time
!pip install -q -U transformers datasets >/dev/null
!pip install -q -U optuna >/dev/null

# check installed version
!pip freeze | grep optuna        # optuna==2.10.0
!pip freeze | grep transformers  # transformers==4.15.0
!pip freeze | grep torch         # torch==1.10.0+cu111

optuna==2.10.0
transformers==4.15.0
torch @ https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchaudio @ https://download.pytorch.org/whl/cu111/torchaudio-0.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchsummary==1.5.1
torchtext==0.11.0
torchvision @ https://download.pytorch.org/whl/cu111/torchvision-0.11.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
CPU times: user 114 ms, sys: 37 ms, total: 151 ms
Wall time: 16.1 s


In [6]:
from transformers import logging

# hide progress bar when downloading tokenizer and model (a workaround!)
logging.get_verbosity = lambda : logging.NOTSET

## Load Dataset

In [7]:
from datasets import load_dataset

gnad10k = load_dataset("gnad10")
label_names = gnad10k["train"].features["label"].names

Using custom data configuration default
Reusing dataset gnad10 (/root/.cache/huggingface/datasets/gnad10/default/1.1.0/3a8445be65795ad88270af4d797034c3d99f70f8352ca658c586faf1cf960881)


  0%|          | 0/2 [00:00<?, ?it/s]

In [8]:
print(gnad10k)
print("labels:", label_names)
gnad10k["train"][0]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 9245
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1028
    })
})
labels: ['Web', 'Panorama', 'International', 'Wirtschaft', 'Sport', 'Inland', 'Etat', 'Wissenschaft', 'Kultur']


{'label': 4,
 'text': '21-Jähriger fällt wohl bis Saisonende aus. Wien – Rapid muss wohl bis Saisonende auf Offensivspieler Thomas Murg verzichten. Der im Winter aus Ried gekommene 21-Jährige erlitt beim 0:4-Heimdebakel gegen Admira Wacker Mödling am Samstag einen Teilriss des Innenbandes im linken Knie, wie eine Magnetresonanz-Untersuchung am Donnerstag ergab. Murg erhielt eine Schiene, muss aber nicht operiert werden. Dennoch steht ihm eine mehrwöchige Pause bevor.'}

## Data Preprocessing

* Loading the same Tokenizer that was used with the pretrained model.
* Define function to tokenize the text (with truncation to max input length of model.
* Run the tokenization

In [9]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_gnad10k = gnad10k.map(preprocess_function, batched=True).remove_columns("text")

Loading cached processed dataset at /root/.cache/huggingface/datasets/gnad10/default/1.1.0/3a8445be65795ad88270af4d797034c3d99f70f8352ca658c586faf1cf960881/cache-0fd4592ef5024dbb.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/gnad10/default/1.1.0/3a8445be65795ad88270af4d797034c3d99f70f8352ca658c586faf1cf960881/cache-21154c2b9fecc562.arrow


### Use Dynamic Padding

Apply panding only on longest text in batch - this is more efficient than applying padding on the whole dataset.

In [10]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## Model Setup

We want to include the label names and save them together with the model.
The only way to do this is to create a Config and put them in. 

In [11]:
import optuna
from transformers import AutoConfig, AutoModelForSequenceClassification

config = AutoConfig.from_pretrained(
        checkpoint,
        num_labels=len(label_names),
        id2label={i: label for i, label in enumerate(label_names)},
        label2id={label: i for i, label in enumerate(label_names)},
        )

def model_init(trial: optuna.Trial):
    """A function that instantiates the model to be used."""
    return AutoModelForSequenceClassification.from_pretrained(checkpoint, config=config)

### Define Evaluation Metrics

The funtion that computes the metrics needs to be passed to the Trainer.

In [12]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score, matthews_corrcoef
import numpy as np
from typing import Dict

def compute_metrics(eval_preds):
    """The function that will be used to compute metrics at evaluation.
    Must take a :class:`~transformers.EvalPrediction` and return a dictionary
    string to metric values."""
    logits, labels = eval_preds
    preds = np.argmax(logits, axis=-1)
    return {
        "acc": accuracy_score(labels, preds),
        "f1": f1_score(labels, preds, average='macro'),
        "precision": precision_score(labels, preds, average='macro'),
        "recall": recall_score(labels, preds, average='macro'),
        "mcc": matthews_corrcoef(labels, preds),
        }


def objective(metrics: Dict[str, float]):
    """A function computing the main optimization objective from the metrics
    returned by the :obj:`compute_metrics` method.
    To be used in :obj:`Trainer.hyperparameter_search`."""
    return metrics["eval_loss"]

## Hyperparameter Tuning

In [13]:
from transformers import TrainerCallback

# https://github.com/huggingface/transformers/blob/v4.14.1/src/transformers/trainer_callback.py#L505

class TrialPruningCallback(TrainerCallback):
    """
    A :class:`~transformers.TrainerCallback` that handles pruning.
    Args:
       trial:
            the current trial.
       objective_metric(:obj:`str`, `optional`):
            the metrics used for pruning.
    """

    def __init__(self, trial: optuna.Trial, objective_metric: str = "eval_loss"):
        self.trial = trial
        self.metric = objective_metric

    def on_evaluate(self, args, state, control, metrics, **kwargs):
        # TODO: use set_user_attrs instead of report
        self.trial.report(metrics[self.metric], step=state.global_step)
        if self.trial.should_prune():
            print(f"pruning trial at step {state.global_step}")
            # control.should_training_stop = True  # not needed
            raise optuna.TrialPruned()

class TrialLoggingCallback(TrainerCallback):
    def __init__(self, trial: optuna.Trial):
        self.trial = trial
    
    def on_evaluate(self, args, state, control, metrics, **kwargs):
        self.trial.set_user_attr(state.global_step, metrics)

In [None]:
from transformers import TrainingArguments, Trainer
import shutil

def hp_space(trial: optuna.Trial):
    """A function that defines the hyperparameter search space.
    To be used in :obj:`Trainer.hyperparameter_search`."""
    return {
        # "learning_rate": trial.suggest_float("learning_rate", 1e-5, 1e-4, log=True),
        "learning_rate": trial.suggest_float("learning_rate", 6e-5, 2e-4, log=True),  # electra
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", [2, 3]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16]),
        "weight_decay": trial.suggest_float("weight_decay", 1e-3, 1e-2, log=True),
        # "label_smoothing_factor": trial.suggest_float("label_smoothing_factor", 0.0, 0.1),
    }

best_model_dir = "best_model_trainer"

def best_model_callback(study, trial):
    """Save the model from a best trial"""
    for t in study.best_trials:
        if t.number == trial.number:
            print("This is a new besttrial", trial.number)
        
            out_filename = model_path / f"{project_name}_t{trial.number}"
            shutil.make_archive(out_filename, 'zip', f"{project_name}/{best_model_dir}")

def train(trial: optuna.Trial):

    # get hyperparameters choice
    hp = hp_space(trial)
    lr = hp["learning_rate"]
    bs = hp["per_device_train_batch_size"]
    epochs = hp["num_train_epochs"]
    weight_decay = hp["weight_decay"]
    # label_smoothing_factor = hp["label_smoothing_factor"]

    eval_rounds_per_epoch = 5
    eval_steps = gnad10k["train"].num_rows / bs // eval_rounds_per_epoch

    training_args = TrainingArguments(
        output_dir=str(project_name),
        report_to=[],
        log_level="error",
        disable_tqdm=False,

        evaluation_strategy="steps",
        eval_steps=eval_steps,
        save_strategy="steps",
        save_steps=eval_steps,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,

        # hyperparameters
        num_train_epochs=epochs,
        learning_rate=lr,
        per_device_train_batch_size=bs,
        per_device_eval_batch_size=bs,
        weight_decay=weight_decay,
        # label_smoothing_factor=label_smoothing_factor,

        # fp16=True,  # fp16 is disabled on Tesla P100 by pytorch
    )

    trainer = Trainer(
        model_init=model_init,
        args=training_args,
        train_dataset=tokenized_gnad10k["train"],
        eval_dataset=tokenized_gnad10k["test"],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
        callbacks=[TrialLoggingCallback(trial)]
        # callbacks=[TrialPruningCallback(trial)]
    )

    # train model and save best model from evaluations
    # needs 'load_best_model_at_end=True'
    trainer.train()
    trainer.save_model(f"{project_name}/{best_model_dir}")

    result = trainer.evaluate(eval_dataset=tokenized_gnad10k["test"])

    # store eval metrics in trial
    trial.set_user_attr("eval_result", result)
    
    # return result["eval_loss"]
    return result["eval_loss"], result["eval_f1"]


db_path = "/content/gdrive/My Drive/Colab Notebooks/nlp-classification/"
db_name = "10kgnad_optuna"
# study_name = checkpoint + "_multi_epoch234"
study_name = checkpoint + "_loss-f1_bs16_epoch234"

# multi objective study
# https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/002_multi_objective.html#sphx-glr-tutorial-20-recipes-002-multi-objective-py
study = optuna.create_study(study_name=study_name,
                            directions=["minimize", "maximize"],
                            # directions=["minimize"],
                            storage=f"sqlite:///{db_path}{db_name}.db",
                            load_if_exists=True,)

# give some hyperparameters that are presumably good
# study.enqueue_trial(
#     {
#         "learning_rate": 8e-5,
#         "weight_decay": 1e-3,
#         "label_smoothing_factor": 0.0,
#     }
# )
# study.enqueue_trial(
#     {
#         "learning_rate": 7e-5,
#         "weight_decay": 1e-3,
#         "label_smoothing_factor": 1e-5,
#     }
# )

# https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
import torch
torch.cuda.empty_cache()
import gc
gc.collect()


study.optimize(train, n_trials=100, callbacks=[best_model_callback])

# study.best_params

[32m[I 2022-01-01 14:56:57,377][0m A new study created in RDB with name: deepset/gelectra-base_loss-f1_bs16_epoch234[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.002699,0.710117,0.641881,0.674038,0.668931,0.680719
230,No log,0.696372,0.838521,0.838717,0.845712,0.840893,0.816952
345,No log,0.649802,0.830739,0.823059,0.817473,0.840887,0.808179
460,No log,0.510463,0.861868,0.857965,0.856129,0.865674,0.842883
575,0.904900,0.603628,0.845331,0.819685,0.859976,0.807886,0.823213
690,0.904900,0.550281,0.858949,0.855721,0.868699,0.848955,0.839202
805,0.904900,0.504997,0.86965,0.871457,0.869015,0.879971,0.851925
920,0.904900,0.469837,0.872568,0.866754,0.878322,0.865301,0.854807
1035,0.420000,0.428496,0.883268,0.881284,0.882152,0.880949,0.866296
1150,0.420000,0.429852,0.892023,0.888224,0.887794,0.890214,0.876515



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 15:28:39,428][0m Trial 0 finished with values: [0.427947461605072, 0.8860739548346436] and parameters: {'learning_rate': 7.459035593565026e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.001666591528498809}. [0m


This is a new besttrial 0


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.118085,0.542802,0.39406,0.445471,0.500994,0.506453
230,No log,1.168138,0.598249,0.515551,0.516146,0.575862,0.562176
345,No log,1.278793,0.537938,0.378945,0.397876,0.434705,0.481172
460,No log,1.439144,0.439689,0.324066,0.332602,0.412154,0.410612
575,1.355400,1.000235,0.655642,0.574203,0.593598,0.617381,0.619248
690,1.355400,0.929361,0.655642,0.597759,0.607179,0.632162,0.620911
805,1.355400,0.803398,0.746109,0.74423,0.754248,0.747703,0.71167
920,1.355400,0.804116,0.720817,0.711411,0.731882,0.727142,0.686482
1035,0.906800,0.741332,0.745136,0.748684,0.756679,0.761176,0.712191
1150,0.906800,0.68662,0.788911,0.796721,0.796274,0.799007,0.758293



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 15:50:13,862][0m Trial 1 finished with values: [0.6866200566291809, 0.7967214141677368] and parameters: {'learning_rate': 0.00018140418014791658, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0033149682823927057}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.139014,0.56323,0.439993,0.424119,0.507383,0.517749
230,No log,0.790669,0.757782,0.684992,0.668924,0.734512,0.729056
345,No log,0.789784,0.768482,0.728547,0.763727,0.750408,0.739823
460,No log,0.668912,0.820039,0.817304,0.826901,0.821867,0.795579
575,0.995500,0.579128,0.836576,0.826788,0.849925,0.819564,0.81333
690,0.995500,0.502239,0.867704,0.865933,0.870409,0.865489,0.848796
805,0.995500,0.489651,0.871595,0.869231,0.86815,0.876752,0.854295
920,0.995500,0.511332,0.86965,0.871202,0.880964,0.870342,0.852016
1035,0.482300,0.479637,0.878405,0.873412,0.887335,0.863982,0.860742
1150,0.482300,0.439846,0.885214,0.884286,0.882205,0.887794,0.868742



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 16:21:47,390][0m Trial 2 finished with values: [0.41652801632881165, 0.8885875266755703] and parameters: {'learning_rate': 8.351708679818037e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.0011179118493458203}. [0m


This is a new besttrial 2


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.014796,0.682879,0.58544,0.698366,0.621233,0.641419
230,No log,0.677539,0.843385,0.835964,0.835269,0.844203,0.821866
345,No log,0.589221,0.848249,0.839439,0.844244,0.848323,0.827315
460,No log,0.527204,0.853113,0.845871,0.852695,0.849484,0.832792
575,0.897400,0.498597,0.871595,0.857926,0.882391,0.846968,0.852996
690,0.897400,0.432723,0.889105,0.883972,0.887045,0.882166,0.872942
805,0.897400,0.402296,0.884241,0.88621,0.889331,0.886293,0.867885
920,0.897400,0.414438,0.895914,0.893508,0.894103,0.894548,0.880824
1035,0.392200,0.424925,0.886187,0.882718,0.886801,0.881873,0.870067
1150,0.392200,0.416173,0.888132,0.883832,0.889394,0.880323,0.872036


[32m[I 2022-01-01 16:53:48,309][0m Trial 3 finished with values: [0.3969542384147644, 0.8967635823815142] and parameters: {'learning_rate': 6.886186149966864e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.006160330961821564}. [0m


This is a new besttrial 3


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.00087,0.653696,0.548495,0.659697,0.600559,0.616619
230,No log,0.813654,0.748054,0.668537,0.683321,0.697624,0.718134
345,No log,0.870531,0.745136,0.662782,0.707238,0.686732,0.712807
460,No log,0.719874,0.782101,0.724325,0.746358,0.749943,0.754158
575,0.967000,0.646742,0.811284,0.729412,0.710463,0.75425,0.785023
690,0.967000,0.562061,0.828794,0.754588,0.836913,0.784451,0.806446
805,0.967000,0.550123,0.856031,0.849935,0.866088,0.84181,0.835487
920,0.967000,0.470215,0.872568,0.869396,0.877831,0.86821,0.854923
1035,0.516100,0.469289,0.874514,0.865868,0.870722,0.86376,0.856479
1150,0.516100,0.43651,0.877432,0.871776,0.867495,0.878741,0.860077



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 17:25:48,323][0m Trial 4 finished with values: [0.39400672912597656, 0.8940212033293528] and parameters: {'learning_rate': 0.00011847613031449236, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.002157873782316533}. [0m


This is a new besttrial 4


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.987961,0.703307,0.607642,0.648438,0.627727,0.663824
230,No log,0.654783,0.801556,0.723662,0.706607,0.758886,0.776383
345,No log,0.645476,0.820039,0.774478,0.812593,0.782497,0.796135
460,No log,0.600394,0.847276,0.851193,0.852741,0.862152,0.827233
575,0.892400,0.497043,0.864786,0.855258,0.876372,0.846318,0.845657
690,0.892400,0.531229,0.863813,0.861761,0.875736,0.855211,0.844871
805,0.892400,0.425914,0.889105,0.888397,0.892741,0.885262,0.873095
920,0.892400,0.416407,0.890078,0.889863,0.894852,0.888325,0.87443
1035,0.417500,0.429413,0.885214,0.881046,0.885565,0.88096,0.868725
1150,0.417500,0.436787,0.881323,0.880948,0.884016,0.881357,0.864639



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 17:57:51,884][0m Trial 5 finished with values: [0.41640692949295044, 0.8898631539122221] and parameters: {'learning_rate': 8.65821570904736e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.002122625560135982}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.288141,0.566148,0.465943,0.529146,0.500594,0.51509
230,No log,1.670474,0.531128,0.417015,0.496342,0.44499,0.498705
345,No log,1.572999,0.598249,0.51723,0.545465,0.543226,0.555288
460,No log,1.36553,0.521401,0.422263,0.550092,0.430278,0.50759
575,1.204600,1.242531,0.700389,0.599919,0.619171,0.60589,0.664873
690,1.204600,1.242518,0.698444,0.581916,0.702728,0.59806,0.660027
805,1.204600,1.097717,0.731518,0.622589,0.721174,0.64123,0.696137
920,1.204600,1.144391,0.729572,0.620911,0.661436,0.634509,0.697409
1035,0.791800,1.163083,0.735409,0.618417,0.611775,0.638518,0.699299
1150,0.791800,1.121511,0.738327,0.620549,0.606248,0.643902,0.702891



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 18:19:07,091][0m Trial 6 finished with values: [1.0977165699005127, 0.6225887759001697] and parameters: {'learning_rate': 0.0001581610572621774, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.009897942012801347}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.930996,0.705253,0.57921,0.667695,0.617545,0.668441
230,No log,0.659291,0.82393,0.810744,0.826609,0.807435,0.798887
345,No log,0.636605,0.835603,0.828048,0.845889,0.825591,0.812429
460,No log,0.494411,0.858949,0.852741,0.848481,0.862025,0.839127
575,0.872300,0.470881,0.86284,0.856473,0.871222,0.846728,0.843037
690,0.872300,0.447792,0.878405,0.878099,0.876513,0.88173,0.861036
805,0.872300,0.419652,0.892023,0.888467,0.897637,0.882456,0.876542
920,0.872300,0.426101,0.881323,0.882121,0.893513,0.878222,0.864997
1035,0.374400,0.390701,0.898833,0.89404,0.894794,0.894171,0.884155
1150,0.374400,0.386954,0.894942,0.890006,0.88933,0.891432,0.879744



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 18:40:20,910][0m Trial 7 finished with values: [0.38695359230041504, 0.8900060952382804] and parameters: {'learning_rate': 0.0001024558255203475, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0012265051297269552}. [0m


This is a new besttrial 7


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.949241,0.720817,0.60704,0.754984,0.637048,0.683415
230,No log,0.643125,0.840467,0.830799,0.832579,0.834336,0.817743
345,No log,0.60222,0.840467,0.840424,0.842376,0.849407,0.819289
460,No log,0.544818,0.850195,0.842946,0.841685,0.852659,0.829589
575,0.877400,0.47194,0.876459,0.875435,0.887097,0.870763,0.859389
690,0.877400,0.448174,0.889105,0.889292,0.891902,0.888715,0.873143
805,0.877400,0.396568,0.892023,0.892059,0.891308,0.894298,0.876494
920,0.877400,0.415893,0.892023,0.889696,0.893579,0.887887,0.876525
1035,0.380900,0.425896,0.890078,0.887611,0.89011,0.887392,0.874383
1150,0.380900,0.411826,0.888132,0.883775,0.887927,0.880436,0.871793


[32m[I 2022-01-01 19:02:05,126][0m Trial 8 finished with values: [0.3965684473514557, 0.8920590002181018] and parameters: {'learning_rate': 8.972952217857139e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.00160874055808389}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.068204,0.621595,0.505124,0.504582,0.564992,0.579776
230,No log,0.820977,0.746109,0.674068,0.657775,0.708038,0.71311
345,No log,0.726202,0.769455,0.692089,0.675927,0.729544,0.740483
460,No log,0.618381,0.80642,0.75111,0.771658,0.75456,0.779349
575,0.944400,0.673958,0.811284,0.716708,0.699532,0.748954,0.785827
690,0.944400,0.574227,0.82393,0.743627,0.73262,0.766485,0.80052
805,0.944400,0.581881,0.828794,0.784221,0.830163,0.790252,0.805944
920,0.944400,0.51931,0.86965,0.857296,0.867007,0.852708,0.850873
1035,0.513900,0.480607,0.865759,0.85947,0.860115,0.860259,0.846466
1150,0.513900,0.458585,0.876459,0.870016,0.873431,0.869526,0.859096



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 19:33:44,196][0m Trial 9 finished with values: [0.43812066316604614, 0.8841550276120551] and parameters: {'learning_rate': 0.00013343832357526326, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.0016214571925614353}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.03233,0.597276,0.443855,0.435494,0.508764,0.547385
230,No log,0.771321,0.758755,0.682994,0.677994,0.720397,0.730804
345,No log,0.731924,0.777237,0.715533,0.793352,0.739648,0.749117
460,No log,0.626118,0.815175,0.814609,0.831129,0.812902,0.790066
575,0.926700,0.557271,0.848249,0.841389,0.855822,0.837426,0.826925
690,0.926700,0.504036,0.874514,0.868141,0.874307,0.863971,0.856341
805,0.926700,0.424023,0.884241,0.877125,0.878038,0.877395,0.86753
920,0.926700,0.451584,0.876459,0.876555,0.886734,0.874061,0.859806
1035,0.415900,0.419294,0.883268,0.881729,0.882663,0.883344,0.866705
1150,0.415900,0.411962,0.886187,0.881983,0.882014,0.882991,0.86969



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 19:54:57,206][0m Trial 10 finished with values: [0.41196224093437195, 0.881983395172695] and parameters: {'learning_rate': 0.00012047481293862241, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.003205160334410035}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.075914,0.679961,0.560956,0.619642,0.609513,0.64094
230,No log,0.755625,0.802529,0.783826,0.819881,0.783447,0.777417
345,No log,0.728549,0.807393,0.803977,0.835607,0.797322,0.782286
460,No log,0.546798,0.856031,0.852193,0.853005,0.855943,0.835675
575,0.934000,0.461524,0.879377,0.874897,0.885421,0.868054,0.861773
690,0.934000,0.459003,0.881323,0.879176,0.885668,0.874487,0.864148
805,0.934000,0.402295,0.88716,0.884273,0.882815,0.886743,0.870942
920,0.934000,0.385312,0.898833,0.897088,0.895333,0.899887,0.884294
1035,0.379800,0.41715,0.88716,0.883397,0.88612,0.883586,0.871108
1150,0.379800,0.397057,0.891051,0.889314,0.891505,0.888274,0.875222



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 20:16:10,600][0m Trial 11 finished with values: [0.3853123188018799, 0.897087556457095] and parameters: {'learning_rate': 6.361408844678171e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0032297885417162976}. [0m


This is a new besttrial 11


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.879409,0.691634,0.552904,0.526155,0.589898,0.648279
230,No log,0.814705,0.750973,0.67739,0.713521,0.694742,0.717924
345,No log,0.727164,0.802529,0.801579,0.805229,0.813437,0.776717
460,No log,0.598092,0.821984,0.814915,0.82933,0.818363,0.797919
575,0.896000,0.555357,0.843385,0.833197,0.859732,0.823131,0.822203
690,0.896000,0.560636,0.860895,0.849681,0.865804,0.843938,0.840784
805,0.896000,0.463661,0.872568,0.871744,0.875573,0.872433,0.85474
920,0.896000,0.497171,0.872568,0.864636,0.874839,0.86463,0.854982
1035,0.428800,0.467443,0.884241,0.875518,0.889231,0.866686,0.86738
1150,0.428800,0.478969,0.872568,0.869632,0.873145,0.872088,0.855301



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 20:48:10,375][0m Trial 12 finished with values: [0.42575743794441223, 0.8916282429046417] and parameters: {'learning_rate': 0.00010931706740473336, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.002658827432269947}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.10859,0.698444,0.619723,0.73524,0.653059,0.662385
230,No log,0.710348,0.811284,0.775484,0.799729,0.782945,0.785584
345,No log,0.604229,0.847276,0.833836,0.850932,0.829451,0.825852
460,No log,0.56189,0.845331,0.837598,0.844826,0.840729,0.824269
575,0.938300,0.515902,0.857977,0.844676,0.869003,0.832558,0.838137
690,0.938300,0.467162,0.879377,0.878385,0.89209,0.868626,0.8623
805,0.938300,0.402812,0.892996,0.889122,0.888508,0.890664,0.877514
920,0.938300,0.428463,0.889105,0.886716,0.888785,0.887464,0.87342
1035,0.388300,0.417175,0.895914,0.892742,0.89422,0.893773,0.880937
1150,0.388300,0.404407,0.893969,0.890828,0.892527,0.890233,0.87859


[32m[I 2022-01-01 21:09:25,206][0m Trial 13 finished with values: [0.4028119146823883, 0.8891221995825563] and parameters: {'learning_rate': 6.176345599160745e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.005442357653671529}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.522191,0.405642,0.356621,0.3905,0.382434,0.335771
230,No log,0.950073,0.672179,0.574745,0.604355,0.608544,0.63628
345,No log,0.896427,0.703307,0.625223,0.636685,0.661321,0.668811
460,No log,0.681159,0.810311,0.80703,0.812046,0.809215,0.783896
575,1.117800,0.656267,0.831712,0.814184,0.846252,0.801789,0.807845
690,1.117800,0.539133,0.860895,0.862653,0.872597,0.855489,0.840695
805,1.117800,0.553456,0.863813,0.860702,0.869853,0.856133,0.84435
920,1.117800,0.548706,0.849222,0.850105,0.85772,0.848903,0.828107
1035,0.530300,0.630055,0.849222,0.83725,0.864962,0.820615,0.827787
1150,0.530300,0.536451,0.857004,0.858861,0.86192,0.86082,0.83719



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2022-01-01 21:41:05,166][0m Trial 14 finished with values: [0.46393945813179016, 0.87622298619547] and parameters: {'learning_rate': 0.0001330486871037977, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.001893852139993153}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.044448,0.618677,0.492429,0.602492,0.55425,0.582025
230,No log,0.715353,0.784047,0.696573,0.679936,0.720378,0.753987
345,No log,0.698153,0.790856,0.757427,0.77774,0.777758,0.764031
460,No log,0.671509,0.794747,0.763238,0.772215,0.783859,0.767111
575,0.978800,0.562835,0.84144,0.824711,0.822643,0.834089,0.818984
690,0.978800,0.625627,0.833658,0.820305,0.832819,0.81532,0.81034
805,0.978800,0.580726,0.847276,0.835372,0.859374,0.825254,0.826543
920,0.978800,0.50377,0.863813,0.858417,0.865894,0.8566,0.844549
1035,0.523100,0.462222,0.872568,0.865764,0.866919,0.867228,0.854175
1150,0.523100,0.502994,0.865759,0.863961,0.864489,0.869365,0.847813



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



In [None]:
!ls -lahtr 10kgnad_hf__distilbert-base-german-cased/

## Hyperparameter Tuning

https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer.hyperparameter_search

In [None]:
# disable transformer warnings like "Some weights of the model checkpoint ..."
logging.set_verbosity_error()


training_args = TrainingArguments(
    output_dir=str(project_name),
    report_to=[],
    log_level="error",
    disable_tqdm=False,

    evaluation_strategy="steps",
    # eval_steps=eval_steps,
    save_strategy="steps",
    # save_steps=eval_steps,
    # load_best_model_at_end=False,
    # metric_for_best_model="eval_loss",
    # greater_is_better=False,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_gnad10k["train"],
    eval_dataset=tokenized_gnad10k["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)


# Default objective is the sum of all metrics
# when metrics are provided, so we have to maximize it.
# best = trainer.hyperparameter_search(
#     hp_space=hp_space,
#     compute_objective=objective,
#     n_trials=2
# )