<a href="https://colab.research.google.com/github/goerlitz/nlp-classification/blob/main/notebooks/10kGNAD/colab/21c_10kGNAD_huggingface_basic_optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter Optimization with HuggingFace Transformers

Adapted from https://huggingface.co/docs/transformers/custom_datasets#sequence-classification-with-imdb-reviews

Things we need
* a tokenizer
* tokenized input data
* a pretrained model
* evaluation metrics
* training parameters
* a Trainer instance

Notes
* [class labels can be included in the model config](https://github.com/huggingface/transformers/pull/2945#issuecomment-781986506) (a bit hacky)
* [fp16 is disabled on tesla P100 GPU in pytorch](https://discuss.pytorch.org/t/cnn-fp16-slower-than-fp32-on-tesla-p100/12146)

## Prerequisites

In [27]:
checkpoint = "distilbert-base-german-cased"

# checkpoint = "deepset/gbert-base"

# checkpoint = "deepset/gelectra-base"

project_name = f'10kgnad_hf__{checkpoint.replace("/", "_")}'

### Connect Google Drive

Will be used to save results

In [28]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [29]:
from pathlib import Path

# define model path
root_path = Path('/content/gdrive/My Drive/')
base_path = root_path / 'Colab Notebooks/nlp-classification/'
model_path = base_path / 'models'

## Check GPU

In [30]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Mon Dec 20 13:07:27 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   56C    P0    38W / 250W |   5661MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Install Packages

In [31]:
%%time
!pip install -q -U transformers datasets >/dev/null
!pip install -q -U optuna >/dev/null

# check installed version
!pip freeze | grep optuna        # optuna==2.10.0
!pip freeze | grep transformers  # transformers==4.13.0
!pip freeze | grep torch         # torch==1.10.0+cu111

optuna==2.10.0
transformers==4.14.1
torch @ https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchaudio @ https://download.pytorch.org/whl/cu111/torchaudio-0.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchsummary==1.5.1
torchtext==0.11.0
torchvision @ https://download.pytorch.org/whl/cu111/torchvision-0.11.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
CPU times: user 69.1 ms, sys: 218 ms, total: 287 ms
Wall time: 8.67 s


In [32]:
from transformers import logging

# hide progress bar when downloading tokenizer and model (a workaround!)
logging.get_verbosity = lambda : logging.NOTSET

## Load Dataset

In [33]:
from datasets import load_dataset

gnad10k = load_dataset("gnad10")
label_names = gnad10k["train"].features["label"].names

Using custom data configuration default
Reusing dataset gnad10 (/root/.cache/huggingface/datasets/gnad10/default/1.1.0/3a8445be65795ad88270af4d797034c3d99f70f8352ca658c586faf1cf960881)


  0%|          | 0/2 [00:00<?, ?it/s]

In [34]:
print(gnad10k)
print("labels:", label_names)
gnad10k["train"][0]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 9245
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1028
    })
})
labels: ['Web', 'Panorama', 'International', 'Wirtschaft', 'Sport', 'Inland', 'Etat', 'Wissenschaft', 'Kultur']


{'label': 4,
 'text': '21-Jähriger fällt wohl bis Saisonende aus. Wien – Rapid muss wohl bis Saisonende auf Offensivspieler Thomas Murg verzichten. Der im Winter aus Ried gekommene 21-Jährige erlitt beim 0:4-Heimdebakel gegen Admira Wacker Mödling am Samstag einen Teilriss des Innenbandes im linken Knie, wie eine Magnetresonanz-Untersuchung am Donnerstag ergab. Murg erhielt eine Schiene, muss aber nicht operiert werden. Dennoch steht ihm eine mehrwöchige Pause bevor.'}

## Data Preprocessing

* Loading the same Tokenizer that was used with the pretrained model.
* Define function to tokenize the text (with truncation to max input length of model.
* Run the tokenization

In [35]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_gnad10k = gnad10k.map(preprocess_function, batched=True).remove_columns("text")

Loading cached processed dataset at /root/.cache/huggingface/datasets/gnad10/default/1.1.0/3a8445be65795ad88270af4d797034c3d99f70f8352ca658c586faf1cf960881/cache-5d66d7a004b32c63.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/gnad10/default/1.1.0/3a8445be65795ad88270af4d797034c3d99f70f8352ca658c586faf1cf960881/cache-1e7aaca04dbb52e2.arrow


### Use Dynamic Padding

Apply panding only on longest text in batch - this is more efficient than applying padding on the whole dataset.

In [36]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## Model Setup

We want to include the label names and save them together with the model.
The only way to do this is to create a Config and put them in. 

In [37]:
import optuna
from transformers import AutoConfig, AutoModelForSequenceClassification

config = AutoConfig.from_pretrained(
        checkpoint,
        num_labels=len(label_names),
        id2label={i: label for i, label in enumerate(label_names)},
        label2id={label: i for i, label in enumerate(label_names)},
        )

def model_init(trial: optuna.Trial):
    """A function that instantiates the model to be used."""
    return AutoModelForSequenceClassification.from_pretrained(checkpoint, config=config)

### Define Evaluation Metrics

The funtion that computes the metrics needs to be passed to the Trainer.

In [38]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score, matthews_corrcoef
import numpy as np
from typing import Dict

def compute_metrics(eval_preds):
    """The function that will be used to compute metrics at evaluation.
    Must take a :class:`~transformers.EvalPrediction` and return a dictionary
    string to metric values."""
    logits, labels = eval_preds
    preds = np.argmax(logits, axis=-1)
    return {
        "acc": accuracy_score(labels, preds),
        "f1": f1_score(labels, preds, average='macro'),
        "precision": precision_score(labels, preds, average='macro'),
        "recall": recall_score(labels, preds, average='macro'),
        "mcc": matthews_corrcoef(labels, preds),
        }


def objective(metrics: Dict[str, float]):
    """A function computing the main optimization objective from the metrics
    returned by the :obj:`compute_metrics` method.
    To be used in :obj:`Trainer.hyperparameter_search`."""
    return metrics["eval_loss"]

## Hyperparameter Tuning

In [None]:
from transformers import TrainingArguments, Trainer
import shutil

def hp_space(trial: optuna.Trial):
    """A function that defines the hyperparameter search space.
    To be used in :obj:`Trainer.hyperparameter_search`."""
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 1e-4, log=True),
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", [1]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32]),
        "weight_decay": trial.suggest_float("weight_decay", 1e-3, 1e-2, log=True),
        # "label_smoothing_factor": trial.suggest_float("label_smoothing_factor", 0.0, 0.1),
    }

best_model_dir = "best_model_trainer"

def callback(study, trial):
    for t in study.best_trials:
        if t.number == trial.number:
            print("This is a new besttrial", trial.number)
        
            out_filename = model_path / f"{project_name}_t{trial.number}"
            shutil.make_archive(out_filename, 'zip', f"{project_name}/{best_model_dir}")

def train(trial: optuna.Trial):

    # get hyperparameters choice
    hp = hp_space(trial)
    lr = hp["learning_rate"]
    bs = hp["per_device_train_batch_size"]
    epochs = hp["num_train_epochs"]
    weight_decay = hp["weight_decay"]
    # label_smoothing_factor = hp["label_smoothing_factor"]

    eval_rounds_per_epoch = 5
    eval_steps = gnad10k["train"].num_rows / bs // eval_rounds_per_epoch

    training_args = TrainingArguments(
        output_dir=str(project_name),
        report_to=[],
        log_level="error",
        disable_tqdm=False,

        evaluation_strategy="steps",
        eval_steps=eval_steps,
        save_strategy="steps",
        save_steps=eval_steps,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,

        # hyperparameters
        num_train_epochs=epochs,
        learning_rate=lr,
        per_device_train_batch_size=bs,
        per_device_eval_batch_size=bs,
        weight_decay=weight_decay,
        # label_smoothing_factor=label_smoothing_factor,

        # fp16=True,  # fp16 is disabled on Tesla P100 by pytorch
    )

    trainer = Trainer(
        model_init=model_init,
        args=training_args,
        train_dataset=tokenized_gnad10k["train"],
        eval_dataset=tokenized_gnad10k["test"],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
    )

    # train model and save best model from evaluations
    # needs 'load_best_model_at_end=True'
    trainer.train()
    trainer.save_model(f"{project_name}/{best_model_dir}")

    result = trainer.evaluate(eval_dataset=tokenized_gnad10k["test"])

    # store eval metrics in trial
    for key in result.keys():
        if key != "epoch":
            trial.set_user_attr(key, result[key])
    
    return result["eval_loss"], result["eval_mcc"]


db_path = "/content/gdrive/My Drive/Colab Notebooks/nlp-classification/"
db_name = "10kgnad_optuna"
study_name = checkpoint + "_multi_mcc"

# multi objective study
# https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/002_multi_objective.html#sphx-glr-tutorial-20-recipes-002-multi-objective-py
study = optuna.create_study(study_name=study_name,
                            directions=["minimize", "maximize"],
                            storage=f"sqlite:///{db_path}{db_name}.db",
                            load_if_exists=True,)
study.optimize(train, n_trials=70, callbacks=[callback])

study.best_params

[32m[I 2021-12-20 13:07:38,167][0m Using an existing study with name 'distilbert-base-german-cased_multi_mcc' instead of creating a new one.[0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.66851,0.789883,0.797635,0.822782,0.794905,0.761854
230,No log,0.50704,0.846304,0.84768,0.863774,0.838392,0.824808
345,No log,0.515576,0.83463,0.8386,0.857747,0.842261,0.815524
460,No log,0.393281,0.868677,0.870326,0.867528,0.876393,0.849964
575,0.638700,0.372356,0.883268,0.882518,0.881823,0.88363,0.866298


[32m[I 2021-12-20 13:13:18,068][0m Trial 1 finished with values: [0.3723558187484741, 0.8662979784184075] and parameters: {'learning_rate': 9.027375856819522e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.0013772751568888668}. [0m


This is a new besttrial 1


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.891729,0.748054,0.695007,0.765625,0.684234,0.712569
230,No log,0.609391,0.830739,0.834713,0.848949,0.825443,0.806838
345,No log,0.546623,0.825875,0.829316,0.838409,0.837283,0.80417
460,No log,0.454447,0.864786,0.866409,0.864882,0.86902,0.845222
575,0.809200,0.448002,0.854086,0.85535,0.856838,0.854816,0.832907


[32m[I 2021-12-20 13:19:10,939][0m Trial 2 finished with values: [0.4480017423629761, 0.832906546619818] and parameters: {'learning_rate': 2.5699211211567478e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.0016545565398837247}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.637084,0.577821,0.430455,0.45911,0.452925,0.525532
114,No log,1.142542,0.687743,0.612541,0.745609,0.604423,0.642571
171,No log,0.924732,0.765564,0.739945,0.781514,0.734154,0.733878
228,No log,0.822541,0.779183,0.762208,0.79006,0.748746,0.74665
285,No log,0.782745,0.799611,0.783704,0.808141,0.77006,0.770004



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 13:24:34,693][0m Trial 3 finished with values: [0.7827445268630981, 0.770003799564523] and parameters: {'learning_rate': 1.3665343859439226e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0011525075868379787}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.195721,0.674125,0.563357,0.748579,0.563389,0.626593
114,No log,0.77549,0.794747,0.780233,0.819694,0.762473,0.765118
171,No log,0.616354,0.83463,0.830084,0.835256,0.834987,0.812063
228,No log,0.545324,0.853113,0.855515,0.85961,0.852625,0.831706
285,No log,0.520814,0.844358,0.84653,0.852185,0.842614,0.821756


[32m[I 2021-12-20 13:29:58,932][0m Trial 4 finished with values: [0.5208142995834351, 0.8217559938341262] and parameters: {'learning_rate': 2.618125710808599e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0023533088688421386}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.619424,0.809339,0.812775,0.842872,0.804066,0.783346
230,No log,0.500164,0.845331,0.847428,0.86381,0.837997,0.823899
345,No log,0.494883,0.837549,0.842718,0.856187,0.849808,0.818713
460,No log,0.38941,0.871595,0.871414,0.867007,0.877275,0.853116
575,0.635800,0.377273,0.872568,0.870155,0.870085,0.870939,0.85406


[32m[I 2021-12-20 13:35:37,127][0m Trial 5 finished with values: [0.3772728443145752, 0.8540595677839364] and parameters: {'learning_rate': 6.927780912409597e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.00944326965590354}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.627458,0.812257,0.818764,0.830433,0.818049,0.786589
230,No log,0.500268,0.847276,0.848767,0.864757,0.83938,0.826115
345,No log,0.536381,0.824903,0.826947,0.850594,0.827116,0.804862
460,No log,0.40252,0.866732,0.866845,0.864685,0.87222,0.8477
575,0.634100,0.382425,0.872568,0.871629,0.871906,0.87242,0.854086


[32m[I 2021-12-20 13:41:14,960][0m Trial 6 finished with values: [0.38242509961128235, 0.8540857997896235] and parameters: {'learning_rate': 9.460331576251244e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.0016892211264089025}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.687499,0.566148,0.419785,0.462408,0.440744,0.514067
114,No log,1.208702,0.657588,0.558422,0.625651,0.559615,0.608573
171,No log,0.983366,0.757782,0.720995,0.784329,0.715511,0.725101
228,No log,0.879302,0.771401,0.748749,0.784473,0.734827,0.737753
285,No log,0.838218,0.781128,0.758917,0.788982,0.745314,0.748763



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 13:46:39,410][0m Trial 7 finished with values: [0.8382183909416199, 0.7487628267814431] and parameters: {'learning_rate': 1.2492181134557523e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.002805553425030019}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.39277,0.625486,0.466318,0.57405,0.495116,0.573577
114,No log,0.920892,0.75,0.700713,0.768955,0.690186,0.713847
171,No log,0.732555,0.817121,0.809144,0.821205,0.809969,0.792178
228,No log,0.644006,0.833658,0.829264,0.843011,0.81924,0.809221
285,No log,0.612903,0.831712,0.830664,0.837871,0.824953,0.807031



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 13:52:03,593][0m Trial 8 finished with values: [0.6129025816917419, 0.8070313677180124] and parameters: {'learning_rate': 1.9618125877536372e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.004838135542064863}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,0.980408,0.736381,0.684301,0.767348,0.663289,0.698165
114,No log,0.666255,0.816148,0.81016,0.850162,0.791343,0.789925
171,No log,0.528666,0.836576,0.83499,0.83817,0.840751,0.814293
228,No log,0.474736,0.857004,0.857188,0.859608,0.856722,0.83635
285,No log,0.454372,0.857004,0.858336,0.858967,0.858616,0.836307


[32m[I 2021-12-20 13:57:28,084][0m Trial 9 finished with values: [0.45437178015708923, 0.8363073498477495] and parameters: {'learning_rate': 3.724356197357372e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0013401693425654114}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.67875,0.792802,0.789394,0.830577,0.77548,0.764784
230,No log,0.491867,0.838521,0.840155,0.851541,0.834804,0.815906
345,No log,0.520894,0.826848,0.831994,0.847209,0.838841,0.806805
460,No log,0.403235,0.868677,0.867017,0.864238,0.871313,0.849804
575,0.652300,0.39534,0.864786,0.862639,0.864762,0.861657,0.845113


[32m[I 2021-12-20 14:03:06,207][0m Trial 10 finished with values: [0.3953400254249573, 0.8451128819392325] and parameters: {'learning_rate': 5.9113053210398586e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.0030703036342070575}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.638912,0.807393,0.801152,0.82563,0.795034,0.780924
230,No log,0.492354,0.840467,0.845528,0.857442,0.838501,0.818013
345,No log,0.527708,0.829767,0.833074,0.856363,0.835088,0.810446
460,No log,0.392304,0.867704,0.868379,0.862918,0.876413,0.848828
575,0.638600,0.370656,0.875486,0.875391,0.876063,0.875288,0.857377


[32m[I 2021-12-20 14:08:44,201][0m Trial 11 finished with values: [0.3706563115119934, 0.8573769061986882] and parameters: {'learning_rate': 7.73660728575556e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.002372610042368086}. [0m


This is a new besttrial 11


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.038676,0.709144,0.643515,0.74874,0.628446,0.668592
230,No log,0.692201,0.818093,0.819117,0.838645,0.806081,0.792249
345,No log,0.589183,0.821984,0.821418,0.831879,0.827189,0.79915
460,No log,0.501406,0.853113,0.857148,0.85858,0.85691,0.831851
575,0.892400,0.491492,0.849222,0.850087,0.854994,0.846666,0.827343


[32m[I 2021-12-20 14:14:36,331][0m Trial 12 finished with values: [0.49149179458618164, 0.8273430577178728] and parameters: {'learning_rate': 1.9683623029877304e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.001556225944369944}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.905965,0.742218,0.685458,0.760321,0.674351,0.706026
230,No log,0.617227,0.831712,0.835782,0.850139,0.826532,0.808006
345,No log,0.551193,0.826848,0.830477,0.840484,0.836947,0.805117
460,No log,0.458091,0.86284,0.863941,0.862206,0.866596,0.84298
575,0.817100,0.451478,0.853113,0.853577,0.854794,0.853459,0.831821


[32m[I 2021-12-20 14:20:14,729][0m Trial 13 finished with values: [0.4514780044555664, 0.8318208263135594] and parameters: {'learning_rate': 2.502288755655432e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.007033381652717656}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.650662,0.799611,0.800962,0.836361,0.787488,0.772385
230,No log,0.503266,0.843385,0.845371,0.861315,0.837319,0.821524
345,No log,0.522817,0.822957,0.829966,0.84548,0.837058,0.802752
460,No log,0.390777,0.871595,0.87216,0.869431,0.875857,0.853045
575,0.656000,0.389705,0.875486,0.875303,0.875381,0.87589,0.857424


[32m[I 2021-12-20 14:25:52,482][0m Trial 14 finished with values: [0.38970524072647095, 0.8574239994510773] and parameters: {'learning_rate': 5.4968865059080574e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.004331956336329123}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.292417,0.642023,0.485004,0.506206,0.511856,0.590613
114,No log,0.838198,0.774319,0.750183,0.793962,0.732568,0.741428
171,No log,0.665679,0.825875,0.820902,0.827442,0.825265,0.802173
228,No log,0.585734,0.843385,0.842815,0.848297,0.83838,0.820478
285,No log,0.558154,0.838521,0.839666,0.847081,0.834109,0.814968



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 14:31:17,116][0m Trial 15 finished with values: [0.5581539273262024, 0.8149679942066972] and parameters: {'learning_rate': 2.2914519416893815e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.001841496962995193}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.629901,0.808366,0.806996,0.839266,0.793662,0.781876
230,No log,0.519842,0.840467,0.84349,0.85974,0.835847,0.818483
345,No log,0.536212,0.827821,0.835488,0.854016,0.84274,0.809104
460,No log,0.39694,0.867704,0.867583,0.864844,0.87109,0.848575
575,0.661200,0.394062,0.873541,0.872399,0.872634,0.872853,0.855189


[32m[I 2021-12-20 14:36:55,270][0m Trial 16 finished with values: [0.3940621614456177, 0.8551891180305556] and parameters: {'learning_rate': 5.322178448295809e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.002268555667798629}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.270357,0.648833,0.513044,0.734639,0.525366,0.600141
230,No log,0.869197,0.773346,0.766434,0.807413,0.745652,0.741449
345,No log,0.706952,0.816148,0.810459,0.821996,0.812579,0.791458
460,No log,0.613764,0.829767,0.832669,0.835603,0.830322,0.804916
575,1.048200,0.594306,0.832685,0.831638,0.841756,0.823995,0.80821


[32m[I 2021-12-20 14:42:33,768][0m Trial 17 finished with values: [0.5943055152893066, 0.8082098044259179] and parameters: {'learning_rate': 1.3232556465350068e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.005459364481933602}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.124949,0.691634,0.605947,0.759044,0.593864,0.646726
114,No log,0.740494,0.802529,0.791077,0.828413,0.773097,0.773967
171,No log,0.585832,0.839494,0.834822,0.839548,0.839912,0.817568
228,No log,0.519967,0.855058,0.85762,0.861697,0.855432,0.834011
285,No log,0.497096,0.851167,0.850262,0.853504,0.848201,0.829526


[32m[I 2021-12-20 14:47:58,063][0m Trial 18 finished with values: [0.49709615111351013, 0.8295255868305729] and parameters: {'learning_rate': 2.9163289155964626e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.00454220600341496}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,0.949736,0.740272,0.690293,0.767579,0.672455,0.702981
114,No log,0.65414,0.820039,0.818106,0.856479,0.798516,0.794447
171,No log,0.52123,0.839494,0.837492,0.841055,0.842809,0.817612
228,No log,0.466698,0.860895,0.862338,0.864402,0.8616,0.840748
285,No log,0.447533,0.860895,0.861885,0.862702,0.861716,0.840706


[32m[I 2021-12-20 14:53:22,626][0m Trial 19 finished with values: [0.4475332796573639, 0.8407064703703742] and parameters: {'learning_rate': 3.9044997141240665e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.00322257376769894}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.778819,0.525292,0.376096,0.457028,0.401616,0.467301
114,No log,1.33446,0.618677,0.496798,0.628833,0.508357,0.565096
171,No log,1.096475,0.724708,0.674465,0.761726,0.664396,0.686311
228,No log,0.993813,0.736381,0.695007,0.760095,0.680672,0.69737
285,No log,0.95158,0.752918,0.715455,0.76486,0.698766,0.716365



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 14:58:47,756][0m Trial 20 finished with values: [0.9515802264213562, 0.7163650275990657] and parameters: {'learning_rate': 1.0437331612272703e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.005645051219523894}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.463305,0.618677,0.460584,0.569914,0.488807,0.566251
114,No log,0.977297,0.737354,0.682591,0.759691,0.672257,0.699534
171,No log,0.778251,0.80642,0.792526,0.810229,0.792157,0.779944
228,No log,0.687504,0.82393,0.817876,0.834127,0.806292,0.797979
285,No log,0.652603,0.821984,0.816731,0.829287,0.808169,0.795783



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 15:04:12,759][0m Trial 21 finished with values: [0.6526034474372864, 0.7957828093649362] and parameters: {'learning_rate': 1.777867885453977e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0030175413698191656}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,0.973875,0.729572,0.675303,0.760243,0.656783,0.690694
114,No log,0.663943,0.816148,0.812483,0.851947,0.793501,0.7899
171,No log,0.527842,0.836576,0.834502,0.837561,0.840624,0.814335
228,No log,0.471997,0.858949,0.858758,0.860931,0.858298,0.838545
285,No log,0.453015,0.857004,0.857752,0.858937,0.857619,0.836309


[32m[I 2021-12-20 15:09:38,499][0m Trial 22 finished with values: [0.4530148506164551, 0.8363091824101395] and parameters: {'learning_rate': 3.78947290474435e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.008572572991134019}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.654008,0.574903,0.426735,0.461047,0.449318,0.522796
114,No log,1.16192,0.681907,0.606432,0.749083,0.596755,0.635965
171,No log,0.94155,0.763619,0.73802,0.781197,0.730964,0.731567
228,No log,0.838465,0.777237,0.758402,0.787518,0.744904,0.744398
285,No log,0.798561,0.796693,0.781518,0.806844,0.767395,0.766657



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 15:15:03,604][0m Trial 23 finished with values: [0.7985613346099854, 0.7666573527316356] and parameters: {'learning_rate': 1.328345424486003e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0010854190075026305}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.32434,0.644942,0.491367,0.62385,0.515277,0.595399
230,No log,0.912446,0.759728,0.744328,0.792527,0.722983,0.725658
345,No log,0.742628,0.809339,0.801105,0.815891,0.802511,0.78379
460,No log,0.647291,0.824903,0.829655,0.834393,0.825596,0.799288
575,1.088000,0.625398,0.826848,0.824999,0.836185,0.81683,0.801474



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 15:20:44,447][0m Trial 24 finished with values: [0.6253980398178101, 0.8014738861356477] and parameters: {'learning_rate': 1.2135558623303133e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.008495332760303646}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.629773,0.805447,0.807582,0.831205,0.798246,0.778161
230,No log,0.508938,0.838521,0.841522,0.855662,0.833091,0.815775
345,No log,0.516852,0.835603,0.833939,0.849238,0.835575,0.814896
460,No log,0.397223,0.871595,0.871022,0.870991,0.87257,0.852989
575,0.632200,0.388431,0.872568,0.871978,0.872539,0.872079,0.854061


[32m[I 2021-12-20 15:26:24,088][0m Trial 25 finished with values: [0.38843104243278503, 0.854061253127864] and parameters: {'learning_rate': 9.4705050091967e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.004755481585981095}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.763863,0.785992,0.762431,0.808426,0.748743,0.75573
230,No log,0.541679,0.84144,0.845298,0.853219,0.841771,0.819099
345,No log,0.526516,0.825875,0.828522,0.837857,0.838156,0.80476
460,No log,0.41993,0.86965,0.870788,0.868484,0.874568,0.850836
575,0.744100,0.417173,0.858949,0.859847,0.860242,0.860633,0.838564


[32m[I 2021-12-20 15:32:03,797][0m Trial 26 finished with values: [0.41717344522476196, 0.8385641849239942] and parameters: {'learning_rate': 3.274120993423763e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.003090711368125064}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,0.709991,0.781128,0.762358,0.811602,0.752814,0.751653
114,No log,0.499562,0.842412,0.842096,0.855366,0.833443,0.819296
171,No log,0.450442,0.851167,0.851689,0.857536,0.857071,0.831587
228,No log,0.379501,0.876459,0.874265,0.875584,0.873164,0.858406
285,No log,0.370018,0.877432,0.876126,0.876942,0.875907,0.859585


[32m[I 2021-12-20 15:37:29,299][0m Trial 27 finished with values: [0.37001755833625793, 0.8595845151158474] and parameters: {'learning_rate': 9.142476962730016e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.004998050740399115}. [0m


This is a new besttrial 27


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.649613,0.803502,0.799775,0.836604,0.78515,0.776311
230,No log,0.507121,0.839494,0.840425,0.852593,0.835431,0.81728
345,No log,0.53178,0.83463,0.839482,0.85327,0.847487,0.815716
460,No log,0.398517,0.870623,0.870846,0.867715,0.875572,0.85207
575,0.662100,0.393438,0.873541,0.871837,0.873327,0.871367,0.855192


[32m[I 2021-12-20 15:43:24,113][0m Trial 28 finished with values: [0.39343810081481934, 0.8551916528313239] and parameters: {'learning_rate': 5.296798744857186e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.0013883536718487142}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,1.76468,0.535019,0.386567,0.46034,0.410407,0.4785
114,No log,1.312658,0.629377,0.511942,0.628724,0.520493,0.577197
171,No log,1.076815,0.728599,0.678479,0.758829,0.671042,0.690846
228,No log,0.973558,0.747082,0.713198,0.770017,0.696773,0.709711
285,No log,0.931464,0.761673,0.731368,0.7708,0.715762,0.72642



Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



[32m[I 2021-12-20 15:48:50,063][0m Trial 29 finished with values: [0.9314635992050171, 0.7264199682233355] and parameters: {'learning_rate': 1.0759866093782487e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.005551495276979483}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.714612,0.79572,0.776317,0.814723,0.762816,0.766316
230,No log,0.536188,0.83463,0.841732,0.853615,0.836021,0.811639
345,No log,0.520661,0.827821,0.832366,0.841677,0.841872,0.806978
460,No log,0.409419,0.872568,0.872831,0.870623,0.875726,0.854072
575,0.722400,0.407765,0.865759,0.864152,0.865115,0.8643,0.846328


[32m[I 2021-12-20 15:54:30,082][0m Trial 30 finished with values: [0.40776512026786804, 0.8463282657588233] and parameters: {'learning_rate': 3.609659204157708e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.00729359630545312}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.680873,0.788911,0.787677,0.829555,0.774138,0.760709
230,No log,0.495185,0.835603,0.838305,0.851529,0.831313,0.812433
345,No log,0.527914,0.825875,0.832738,0.850516,0.840356,0.806578
460,No log,0.396821,0.871595,0.870237,0.866672,0.874839,0.853086
575,0.652900,0.387173,0.868677,0.866634,0.867698,0.866342,0.849549


[32m[I 2021-12-20 16:00:09,841][0m Trial 31 finished with values: [0.3871729373931885, 0.8495493550429445] and parameters: {'learning_rate': 5.817544086562296e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.002303017097867922}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.764105,0.784047,0.761119,0.808334,0.746871,0.753466
230,No log,0.545694,0.837549,0.842064,0.851559,0.837749,0.814706
345,No log,0.523666,0.830739,0.833857,0.843397,0.842534,0.8101
460,No log,0.419699,0.870623,0.871571,0.869742,0.874254,0.85186
575,0.742100,0.417689,0.858949,0.858045,0.858834,0.858367,0.838524


[32m[I 2021-12-20 16:05:50,110][0m Trial 32 finished with values: [0.4176887273788452, 0.8385239863885064] and parameters: {'learning_rate': 3.296854486700235e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.006883227379410138}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,0.953175,0.733463,0.676515,0.762014,0.660316,0.695228
114,No log,0.65322,0.818093,0.816168,0.853546,0.797541,0.792012
171,No log,0.521223,0.839494,0.837594,0.841097,0.842884,0.817598
228,No log,0.467615,0.855058,0.85735,0.860999,0.855626,0.834114
285,No log,0.448186,0.857004,0.858629,0.85956,0.858616,0.836315


[32m[I 2021-12-20 16:11:16,309][0m Trial 33 finished with values: [0.448186457157135, 0.836315362516697] and parameters: {'learning_rate': 3.936384777664846e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0021658723591611306}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,1.048518,0.710117,0.640832,0.743724,0.627719,0.669346
230,No log,0.705118,0.809339,0.813102,0.833832,0.799484,0.782335
345,No log,0.596361,0.82393,0.8236,0.834013,0.828787,0.801253
460,No log,0.508086,0.851167,0.855801,0.85669,0.855878,0.829609
575,0.901800,0.497615,0.849222,0.849984,0.853664,0.847663,0.827363


[32m[I 2021-12-20 16:16:57,253][0m Trial 34 finished with values: [0.49761536717414856, 0.8273625082484904] and parameters: {'learning_rate': 1.915382149844806e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.009482985681700227}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,0.753008,0.789883,0.781164,0.813598,0.774662,0.761793
114,No log,0.531066,0.837549,0.834416,0.854792,0.823665,0.813915
171,No log,0.466266,0.854086,0.850013,0.854019,0.855731,0.834444
228,No log,0.402389,0.870623,0.871856,0.872552,0.871272,0.851736
285,No log,0.390272,0.868677,0.869289,0.868347,0.870776,0.849629


[32m[I 2021-12-20 16:22:23,532][0m Trial 35 finished with values: [0.390272319316864, 0.8496293886440381] and parameters: {'learning_rate': 6.911060395543888e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.0026170736049645487}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
115,No log,0.659446,0.788911,0.781078,0.818094,0.77239,0.760251
230,No log,0.498177,0.846304,0.851249,0.863029,0.844062,0.824685
345,No log,0.536131,0.828794,0.834611,0.854541,0.840235,0.810025
460,No log,0.399409,0.872568,0.871538,0.8662,0.87848,0.854287
575,0.643300,0.383218,0.870623,0.868392,0.869384,0.868488,0.851874


[32m[I 2021-12-20 16:28:04,342][0m Trial 36 finished with values: [0.38321807980537415, 0.8518741434752752] and parameters: {'learning_rate': 7.22598117655033e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 16, 'weight_decay': 0.0042315882546448925}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,0.849023,0.749027,0.717892,0.78698,0.699565,0.715262
114,No log,0.575285,0.827821,0.826803,0.85357,0.811657,0.802838
171,No log,0.498549,0.843385,0.842025,0.848638,0.846385,0.822434
228,No log,0.439368,0.860895,0.862432,0.862412,0.863594,0.840778
285,No log,0.421327,0.865759,0.866353,0.86689,0.866537,0.846302


[32m[I 2021-12-20 16:33:30,150][0m Trial 37 finished with values: [0.42132681608200073, 0.8463016850176639] and parameters: {'learning_rate': 4.951724923311126e-05, 'num_train_epochs': 1, 'per_device_train_batch_size': 32, 'weight_decay': 0.00890986171527578}. [0m


Step,Training Loss,Validation Loss,Acc,F1,Precision,Recall,Mcc
57,No log,0.6887,0.793774,0.78366,0.818563,0.775882,0.766095
114,No log,0.498429,0.842412,0.841791,0.855446,0.834053,0.819363
171,No log,0.457022,0.854086,0.853492,0.860569,0.857312,0.83471


In [None]:
!ls -lahtr 10kgnad_hf__distilbert-base-german-cased/

## Hyperparameter Tuning

https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer.hyperparameter_search

In [None]:
# disable transformer warnings like "Some weights of the model checkpoint ..."
logging.set_verbosity_error()


training_args = TrainingArguments(
    output_dir=str(project_name),
    report_to=[],
    log_level="error",
    disable_tqdm=False,

    evaluation_strategy="steps",
    # eval_steps=eval_steps,
    save_strategy="steps",
    # save_steps=eval_steps,
    # load_best_model_at_end=False,
    # metric_for_best_model="eval_loss",
    # greater_is_better=False,
)

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_gnad10k["train"],
    eval_dataset=tokenized_gnad10k["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)


# Default objective is the sum of all metrics
# when metrics are provided, so we have to maximize it.
# best = trainer.hyperparameter_search(
#     hp_space=hp_space,
#     compute_objective=objective,
#     n_trials=2
# )