<a href="https://colab.research.google.com/github/goerlitz/nlp-classification/blob/main/notebooks/10kGNAD/colab/21c_10kGNAD_huggingface_basic_optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter Optimization with HuggingFace Transformers

Adapted from https://huggingface.co/docs/transformers/custom_datasets#sequence-classification-with-imdb-reviews

Things we need
* a tokenizer
* tokenized input data
* a pretrained model
* evaluation metrics
* training parameters
* a Trainer instance

Notes
* [class labels can be included in the model config](https://github.com/huggingface/transformers/pull/2945#issuecomment-781986506) (a bit hacky)
* [fp16 is disabled on tesla P100 GPU in pytorch](https://discuss.pytorch.org/t/cnn-fp16-slower-than-fp32-on-tesla-p100/12146)

## Prerequisites

In [1]:
# checkpoint = "distilbert-base-german-cased"
checkpoint = "deepset/gbert-base"
# checkpoint = "deepset/gelectra-base"

project_name = f'10kgnad_hf__{checkpoint.replace("/", "_")}'

### Connect Google Drive

Will be used to save results

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [3]:
from pathlib import Path

# define model path
root_path = Path('/content/gdrive/My Drive/')
base_path = root_path / 'Colab Notebooks/nlp-classification/'
model_path = base_path / 'models'

## Check GPU

In [4]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Sun Jan  9 14:28:50 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   57C    P0    32W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Install Packages

In [5]:
%%time
!pip install -q -U transformers datasets >/dev/null
!pip install -q -U optuna >/dev/null

# check installed version
!pip freeze | grep optuna        # optuna==2.10.0
!pip freeze | grep transformers  # transformers==4.15.0
!pip freeze | grep "torch "      # torch==1.10.0+cu111

optuna==2.10.0
transformers==4.15.0
torch @ https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
CPU times: user 105 ms, sys: 56.5 ms, total: 162 ms
Wall time: 10.6 s


In [6]:
from transformers import logging

# hide progress bar when downloading tokenizer and model (a workaround!)
logging.get_verbosity = lambda : logging.NOTSET

## Load Dataset

In [7]:
from datasets import load_dataset

# gnad10k = load_dataset("gnad10")
# label_names = gnad10k["train"].features["label"].names

base_url = "https://raw.githubusercontent.com/tblock/10kGNAD/master/{}.csv"
data_files = {x: base_url.format(x) for x in ["train", "test"]}
gnad10k = (load_dataset('csv',
                        data_files=data_files,
                        sep=";",
                        quotechar="'",
                        names=["label", "text"]).
           class_encode_column("label"))

label_names = gnad10k["train"].features["label"].names

Using custom data configuration default-0e1a53e9f937c1cf
Reusing dataset csv (/root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e)


  0%|          | 0/2 [00:00<?, ?it/s]

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-59d70429eaf283cf.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-a2cd8f79c68f7005.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-5dec691e070de180.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-7ee1cd4bfba8d7d3.arrow


In [8]:
print(gnad10k)
print("labels:", label_names)
gnad10k["train"][0]

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 9245
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 1028
    })
})
labels: ['Etat', 'Inland', 'International', 'Kultur', 'Panorama', 'Sport', 'Web', 'Wirtschaft', 'Wissenschaft']


{'label': 5,
 'text': '21-Jähriger fällt wohl bis Saisonende aus. Wien – Rapid muss wohl bis Saisonende auf Offensivspieler Thomas Murg verzichten. Der im Winter aus Ried gekommene 21-Jährige erlitt beim 0:4-Heimdebakel gegen Admira Wacker Mödling am Samstag einen Teilriss des Innenbandes im linken Knie, wie eine Magnetresonanz-Untersuchung am Donnerstag ergab. Murg erhielt eine Schiene, muss aber nicht operiert werden. Dennoch steht ihm eine mehrwöchige Pause bevor.'}

## Data Preprocessing

* Loading the same Tokenizer that was used with the pretrained model.
* Define function to tokenize the text (with truncation to max input length of model.
* Run the tokenization

In [9]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_gnad10k = gnad10k.map(preprocess_function, batched=True).remove_columns("text")

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-0e1a53e9f937c1cf/0.0.0/6b9057d9e23d9d8a2f05b985917a0da84d70c5dae3d22ddd8a3f22fb01c69d9e/cache-3a71e492fa9729cb.arrow


  0%|          | 0/2 [00:00<?, ?ba/s]

### Use Dynamic Padding

Apply panding only on longest text in batch - this is more efficient than applying padding on the whole dataset.

In [10]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## Model Setup

We want to include the label names and save them together with the model.
The only way to do this is to create a Config and put them in. 

In [11]:
import optuna
from transformers import AutoConfig, AutoModelForSequenceClassification

config = AutoConfig.from_pretrained(
        checkpoint,
        num_labels=len(label_names),
        id2label={i: label for i, label in enumerate(label_names)},
        label2id={label: i for i, label in enumerate(label_names)},
        )

def model_init(trial: optuna.Trial):
    """A function that instantiates the model to be used."""
    return AutoModelForSequenceClassification.from_pretrained(checkpoint, config=config)

### Define Evaluation Metrics

The funtion that computes the metrics needs to be passed to the Trainer.

In [12]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score, matthews_corrcoef
import numpy as np
from typing import Dict

def compute_metrics(eval_preds):
    """The function that will be used to compute metrics at evaluation.
    Must take a :class:`~transformers.EvalPrediction` and return a dictionary
    string to metric values."""
    logits, labels = eval_preds
    preds = np.argmax(logits, axis=-1)
    return {
        "acc": accuracy_score(labels, preds),
        "f1": f1_score(labels, preds, average='macro'),
        "precision": precision_score(labels, preds, average='macro'),
        "recall": recall_score(labels, preds, average='macro'),
        "mcc": matthews_corrcoef(labels, preds),
        }


# def objective(metrics: Dict[str, float]):
#     """A function computing the main optimization objective from the metrics
#     returned by the :obj:`compute_metrics` method.
#     To be used in :obj:`Trainer.hyperparameter_search`."""
#     return metrics["eval_loss"]

## Hyperparameter Tuning

In [13]:
from transformers import TrainerCallback
from optuna.trial import TrialState
from optuna.study._study_direction import StudyDirection
import pandas as pd

# https://github.com/huggingface/transformers/blob/v4.14.1/src/transformers/trainer_callback.py#L505
# https://huggingface.co/docs/transformers/main_classes/callback#transformers.TrainerCallback

import logging
logging.getLogger(__name__).setLevel(logging.INFO)
log = logging.getLogger(__name__)

class TrialLogAndPruningCallback(TrainerCallback):
    """Stores eval metrics at each evaluation step in the trial user attrs."""
    def __init__(self, trial: optuna.Trial, objectives=None, warmup_steps=0, min_trials=7):
        self.study = trial.study
        self.trial = trial
        self.param_keys = ["num_train_epochs", "per_device_train_batch_size"]
        self.param_vals = [trial.params[k] for k in self.param_keys]

        log.warning(f"fixed params: {list(zip(self.param_keys, self.param_vals))}")

        if objectives == None:
            self.objectives = ["eval_loss"]
        else:
            self.objectives = objectives
        self._warmup_steps = warmup_steps
        self._min_trials = max(1, int(min_trials))

        log.warning(f"objectives: {self.objectives}, directions: {self.study.directions}, warmup={self._warmup_steps}, min_trials={self._min_trials}")
        

    def _filter_trials(self, complete_trials):
        """Select only trials with same parameter values"""
        # values = [self.trial.params[k] for k in keys]
        return [t for t in complete_trials if self.param_vals == [t.params[k] for k in self.param_keys]]

    def _prune(self, step: int, metrics) -> bool:
        """Median Pruning on multiple objectives."""
        if step < self._warmup_steps:
            # log.warning(f"less than warmup steps {step}<{self._warmup_steps}")
            return False

        # get all completed trials
        complete_trials = self.study.get_trials(deepcopy=False,
                                                states=[TrialState.COMPLETE])
        # only compare trials with same batch size and epochs
        complete_trials = self._filter_trials(complete_trials)
        n_trials = len(complete_trials)

        # check minimal number of trial required
        if n_trials < self._min_trials:
            # log.warning(f"less than min trials {n_trials}<{self._min_trials}")
            return False

        # log.warning(f"checking {step}: {metrics}")

        # sanity check
        has_metrics = [o in metrics.keys() for o in self.objectives]
        if not all(has_metrics):
            log.warning(f"missing objective metrics {list(zip(self.objectives, has_metrics))}")

        # extract metrics from trials
        # print(f"fetching metrics of {n_trials} complete trials")
        trial_metrics = []
        for t in complete_trials:
            # print(str(step), "in keys?", str(step) in t.user_attrs.keys(), t.user_attrs.keys())
            if str(step) in t.user_attrs.keys():
                trial_metrics.append(t.user_attrs[str(step)])
        n_metrics = len(trial_metrics)

        # compute median for each metric over all trials
        median = pd.DataFrame(trial_metrics).median()

        # log.warning(f"median of {n_metrics}/{n_trials}: {median.to_dict()}")

        # compare current metric value with median
        prune_state = []
        for i, o in enumerate(self.objectives):
            if self.study.directions[i] == StudyDirection.MAXIMIZE:
                prune_state.append(metrics[o] <= median[o])
            else:
                prune_state.append(metrics[o] > median[o])
        
        met = ",".join([f"{m}={metrics[m]:.4}/{median[m]:.4}" for m in self.objectives])
        print(f"prune? step={step}, warmup={self._warmup_steps}, complete_trials={n_trials}, metrics={n_metrics} -> {met}; {prune_state}")

        # all metrics must be marked for pruning
        return all(prune_state)
    
    def on_evaluate(self, args, state, control, lr_scheduler, metrics, **kwargs):
        step = state.global_step
        values = {**metrics, "lr": lr_scheduler.get_last_lr()[-1]}
        self.trial.set_user_attr(str(step), values)

        # pruning
        if self._prune(step, metrics):
            print(f"pruning trial at step {step}")
            # control.should_training_stop = True  # not needed
            raise optuna.TrialPruned()

In [14]:
from transformers import TrainingArguments, Trainer
import shutil

def hp_space(trial: optuna.Trial):
    """A function that defines the hyperparameter search space.
    To be used in :obj:`Trainer.hyperparameter_search`."""
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 1e-4, log=True),  # distilbert/bert
        # "learning_rate": trial.suggest_float("learning_rate", 6e-5, 2e-4, log=True),  # electra
        # "num_train_epochs": trial.suggest_categorical("num_train_epochs", [1]),
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", [2, 3]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16]),
        "weight_decay": trial.suggest_float("weight_decay", 1e-3, 1e-2, log=True),
        # "weight_decay": trial.suggest_categorical("weight_decay", [1e-3, 0.0]),
    }

best_model_dir = "best_model_trainer"

def best_model_callback(study, trial):
    """Save the model from a best trial"""
    for t in study.best_trials:
        if t.number == trial.number:
            print("This is a new besttrial", trial.number)
        
            out_filename = model_path / f"{project_name}_t{trial.number}"
            shutil.make_archive(out_filename, 'zip', f"{project_name}/{best_model_dir}")

def objective(trial: optuna.Trial):

    # get hyperparameters choice
    hp = hp_space(trial)
    lr = hp["learning_rate"]
    bs = hp["per_device_train_batch_size"]
    epochs = hp["num_train_epochs"]
    weight_decay = hp["weight_decay"]
    # label_smoothing_factor = hp["label_smoothing_factor"]

    eval_rounds_per_epoch = 5
    eval_steps = gnad10k["train"].num_rows / bs // eval_rounds_per_epoch

    training_args = TrainingArguments(
        output_dir=str(project_name),
        report_to=[],
        log_level="error",
        disable_tqdm=False,

        evaluation_strategy="steps",
        eval_steps=eval_steps,
        logging_steps=eval_steps,
        save_strategy="steps",
        save_steps=eval_steps,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,

        # hyperparameters
        num_train_epochs=epochs,
        learning_rate=lr,
        per_device_train_batch_size=bs,
        per_device_eval_batch_size=bs,
        weight_decay=weight_decay,
        # label_smoothing_factor=label_smoothing_factor,

        # fp16=True,  # fp16 is disabled on Tesla P100 by pytorch
    )

    trainer = Trainer(
        model_init=model_init,
        args=training_args,
        train_dataset=tokenized_gnad10k["train"],
        eval_dataset=tokenized_gnad10k["test"],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
        callbacks=[TrialLogAndPruningCallback(trial, objectives=["eval_loss", "eval_f1"], min_trials=7, warmup_steps=eval_steps*3)]
        # callbacks=[TrialPruningCallback(trial)]
    )

    # train model and save best model from evaluations
    # needs 'load_best_model_at_end=True'
    trainer.train()
    trainer.save_model(f"{project_name}/{best_model_dir}")

    result = trainer.evaluate(eval_dataset=tokenized_gnad10k["test"])

    # store eval metrics in trial
    trial.set_user_attr("eval_result", result)
    
    # return result["eval_loss"]
    return result["eval_loss"], result["eval_f1"]

## Hyperparameter Tuning

In [None]:
db_path = "/content/gdrive/My Drive/Colab Notebooks/nlp-classification/"
db_name = "10kgnad_optuna"
# study_name = checkpoint + "_multi_epoch234"
# study_name = checkpoint + "_loss-f1_bs32_epoch23"
study_name = checkpoint + "_loss-f1_bs16_epoch23"

# multi objective study
# https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/002_multi_objective.html#sphx-glr-tutorial-20-recipes-002-multi-objective-py
study = optuna.create_study(study_name=study_name,
                            directions=["minimize", "maximize"],
                            storage=f"sqlite:///{db_path}{db_name}.db",
                            load_if_exists=True,)

# give some hyperparameters that are presumably good
# study.enqueue_trial(
#     {
#         "learning_rate": 8e-5,
#         "weight_decay": 1e-3,
#     }
# )

# https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
import torch
torch.cuda.empty_cache()
import gc
gc.collect()


study.optimize(objective, n_trials=100, callbacks=[best_model_callback])

# study.best_params

[32m[I 2022-01-09 14:29:21,365][0m Using an existing study with name 'deepset/gbert-base_loss-f1_bs16_epoch23' instead of creating a new one.[0m
fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=7


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.8883,0.468282,0.851167,0.84269,0.841975,0.85038,0.83028
230,0.4693,0.417981,0.872568,0.870239,0.870143,0.877034,0.854881
345,0.4302,0.430449,0.864786,0.856578,0.856275,0.871952,0.847203
460,0.3693,0.425328,0.868677,0.868235,0.873588,0.876157,0.851837
575,0.3936,0.327018,0.893969,0.893994,0.896787,0.891618,0.878511
690,0.1971,0.348116,0.889105,0.888157,0.88754,0.891538,0.873397
805,0.1853,0.37346,0.898833,0.894175,0.894044,0.898662,0.884433
920,0.1965,0.350003,0.894942,0.892846,0.89419,0.89361,0.879757
1035,0.1786,0.361307,0.896887,0.891236,0.890821,0.894634,0.882168
1150,0.2048,0.341575,0.900778,0.8984,0.898327,0.900101,0.886469


[32m[I 2022-01-09 14:50:55,392][0m Trial 1 finished with values: [0.3270176947116852, 0.8939935101635533] and parameters: {'learning_rate': 4.772157231652486e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.009565604336674773}. [0m


This is a new besttrial 1


fixed params: [('num_train_epochs', 2), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=7


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.8507,0.536545,0.827821,0.82706,0.827723,0.838801,0.805282
230,0.4986,0.420047,0.86965,0.868153,0.86965,0.871935,0.851295
345,0.4474,0.426456,0.868677,0.860825,0.855087,0.877051,0.851047
460,0.3974,0.460185,0.861868,0.855141,0.861475,0.8608,0.843079
575,0.4087,0.350615,0.892996,0.893079,0.89475,0.893473,0.877762
690,0.1984,0.363162,0.892023,0.890912,0.88915,0.894771,0.87667
805,0.1809,0.371381,0.898833,0.894341,0.892211,0.898438,0.884261
920,0.2126,0.348848,0.901751,0.897115,0.900159,0.896737,0.887674
1035,0.1907,0.353676,0.907588,0.902728,0.898042,0.909873,0.894481
1150,0.2099,0.337582,0.907588,0.902066,0.899982,0.904906,0.894229


[32m[I 2022-01-09 15:12:55,083][0m Trial 2 finished with values: [0.3375816345214844, 0.9020656676982863] and parameters: {'learning_rate': 6.657366287711563e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16, 'weight_decay': 0.0012099132058381746}. [0m


This is a new besttrial 2


fixed params: [('num_train_epochs', 3), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=7


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,1.0444,0.477999,0.857004,0.849846,0.861881,0.848858,0.836838
230,0.4626,0.387921,0.870623,0.869135,0.869525,0.872959,0.852433
345,0.419,0.40026,0.870623,0.864652,0.868318,0.876437,0.8534
460,0.3702,0.41843,0.872568,0.871944,0.88314,0.874731,0.856212
575,0.3825,0.32407,0.903696,0.902675,0.907691,0.898722,0.889719
690,0.2133,0.348456,0.895914,0.895949,0.899584,0.895676,0.881252
805,0.2091,0.329644,0.89786,0.894247,0.894444,0.896377,0.883204
920,0.2167,0.35848,0.896887,0.895374,0.908268,0.889416,0.88265
1035,0.209,0.347998,0.898833,0.895017,0.896153,0.898837,0.884561
1150,0.2445,0.312597,0.911479,0.911428,0.911844,0.911582,0.898648


[32m[I 2022-01-09 15:45:28,835][0m Trial 3 finished with values: [0.31259655952453613, 0.9114277939920671] and parameters: {'learning_rate': 2.5148927786021194e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.0018156530773452493}. [0m


This is a new besttrial 3


fixed params: [('num_train_epochs', 3), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=7


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,0.8269,0.507011,0.840467,0.84316,0.839377,0.85567,0.819038
230,0.511,0.465518,0.859922,0.858179,0.859111,0.868707,0.841259
345,0.4512,0.46844,0.860895,0.856261,0.854231,0.869166,0.84241
460,0.4288,0.45832,0.854086,0.847473,0.852848,0.852775,0.834543
575,0.4466,0.359926,0.883268,0.882475,0.884957,0.885909,0.867458
690,0.2445,0.44177,0.879377,0.875918,0.885298,0.875331,0.863253
805,0.2414,0.399898,0.896887,0.890433,0.884021,0.899641,0.882342
920,0.261,0.379392,0.892023,0.888796,0.897638,0.883854,0.876447
1035,0.2278,0.364324,0.899805,0.898228,0.898876,0.898032,0.885273
1150,0.2617,0.374487,0.899805,0.898439,0.895813,0.902981,0.885512


[32m[I 2022-01-09 16:18:05,200][0m Trial 4 finished with values: [0.3599255383014679, 0.8824751058105537] and parameters: {'learning_rate': 7.664911172180888e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'weight_decay': 0.0011232522906387502}. [0m
fixed params: [('num_train_epochs', 3), ('per_device_train_batch_size', 16)]
objectives: ['eval_loss', 'eval_f1'], directions: [<StudyDirection.MINIMIZE: 1>, <StudyDirection.MAXIMIZE: 2>], warmup=345.0, min_trials=7


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,1.1384,0.510751,0.858949,0.850783,0.865221,0.84553,0.838606
230,0.4934,0.403923,0.871595,0.871239,0.875081,0.871106,0.853473
345,0.4275,0.400283,0.879377,0.873547,0.876968,0.881498,0.862976
460,0.3763,0.422539,0.866732,0.867527,0.877773,0.871572,0.849942
575,0.3826,0.326713,0.901751,0.898197,0.90355,0.893859,0.887436
690,0.2346,0.348992,0.896887,0.895712,0.900723,0.893602,0.882282
805,0.221,0.323622,0.902724,0.900304,0.902369,0.899446,0.888599


In [None]:
!ls -lahtr 10kgnad_hf__distilbert-base-german-cased/

## Hyperparameter Tuning

https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer.hyperparameter_search

In [None]:
# disable transformer warnings like "Some weights of the model checkpoint ..."
logging.set_verbosity_error()


training_args = TrainingArguments(
    output_dir=str(project_name),
    report_to=[],
    log_level="error",
    disable_tqdm=False,

    evaluation_strategy="steps",
    # eval_steps=eval_steps,
    save_strategy="steps",
    # save_steps=eval_steps,
    # load_best_model_at_end=False,
    # metric_for_best_model="eval_loss",
    # greater_is_better=False,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_gnad10k["train"],
    eval_dataset=tokenized_gnad10k["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)


# Default objective is the sum of all metrics
# when metrics are provided, so we have to maximize it.
# best = trainer.hyperparameter_search(
#     hp_space=hp_space,
#     compute_objective=objective,
#     n_trials=2
# )