# Optuna

It is a framework to help find the optimal hyper-parameters for models.

the hyperparameters can be: lr, batch size, epochs...

For transformers models, the steps to follow:

    step0: install optuna
    step1: init model - load model using a function
    step2: define trainer.hyperparameter_search()
        - hp_space: the hyperparameters to be searched
        - compute_objective: the metric to be used
        - direction: for the objective to be "minimize"/"maximize"
        - hp_name: the parameters name
        - backend: optuna(default)/RayTune/SigOpt/Wandb


In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # or "0,1" for multiple GPUs
os.environ["TOKENIZERS_PARALLELISM"] = "false"

## Example

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer

ckp_data = "davidberg/sentiment-reviews"
ckp = "google-bert/bert-base-uncased"

# load data
data = load_dataset(ckp_data)

split_data = data["train"].train_test_split(test_size=0.2)

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(ckp)

label2id = {"positive": 0, "negative": 1}

# process data

def process(samples):

    _data = {"review":[], "division":[]}
    for i in range(len(samples["review"])):
        if samples["division"][i] in label2id.keys():
            _data["review"].append(samples["review"][i])
            _data["division"].append(samples["division"][i])

    toks = tokenizer(_data["review"], max_length=128, truncation=True, padding="max_length", return_tensors="pt")

    toks["labels"] = [label2id.get(d) for d in _data["division"]]

    return toks

tokenized_data = split_data.map(process, batched=True, remove_columns=split_data["train"].column_names)

Map:   0%|          | 0/3267 [00:00<?, ? examples/s]

Map:   0%|          | 0/817 [00:00<?, ? examples/s]

In [3]:
# load model

def load_model():

    model = AutoModelForSequenceClassification.from_pretrained(ckp)
    return model

In [4]:
# metric
import evaluate

def metric(pred):

    acc_fct = evaluate.load("accuracy")
    f1_fct = evaluate.load("f1")

    preds, refs = pred

    preds = preds.argmax(axis=-1)

    acc = acc_fct.compute(predictions=preds, references=refs)
    f1 = f1_fct.compute(predictions=preds, references=refs)
    acc.update(f1)

    return acc

2024-06-18 13:57:19.673001: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-18 13:57:19.673067: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-18 13:57:19.676160: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-18 13:57:19.688967: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [5]:
# training
from transformers import DataCollatorWithPadding, TrainingArguments, Trainer

args = TrainingArguments(
    output_dir="./checkpoints",
    per_device_train_batch_size=32,
    gradient_accumulation_steps=32,
    logging_steps=100,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    weight_decay=0.01,
    metric_for_best_model="f1",
    load_best_model_at_end=True
)

## step1. init model

In [6]:
trainer = Trainer(
    model_init=load_model, # define model
    args=args,
    train_dataset=tokenized_data["train"],
    eval_dataset=tokenized_data["test"],
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding=True),
    compute_metrics=metric
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## step2. search parameters

In [7]:
# seach default hyper-parameters

trainer.hyperparameter_search(
    compute_objective=lambda x: x["eval_f1"],
    direction="maximize",
    n_trials=10
)

[I 2024-06-18 13:23:34,050] A new study created in memory with name: no-name-dac549a9-e14e-4141-8b5d-c010dfc41256
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.637352,0.765537,0.245455
1,No log,0.607996,0.84322,0.125984
2,No log,0.596697,0.860169,0.108108
3,No log,0.593769,0.864407,0.09434


[I 2024-06-18 13:25:05,029] Trial 0 finished with value: 0.09433962264150944 and parameters: {'learning_rate': 1.1258246037401523e-06, 'num_train_epochs': 4, 'seed': 4, 'per_device_train_batch_size': 16}. Best is trial 0 with value: 0.09433962264150944.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.520897,0.862994,0.0
1,No log,0.473353,0.862994,0.0
2,No log,0.405521,0.862994,0.0
3,No log,0.396973,0.862994,0.0


[I 2024-06-18 13:26:17,390] Trial 1 finished with value: 0.0 and parameters: {'learning_rate': 9.373362713061979e-05, 'num_train_epochs': 5, 'seed': 31, 'per_device_train_batch_size': 64}. Best is trial 0 with value: 0.09433962264150944.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.152062,0.944915,0.80402
1,No log,0.157956,0.936441,0.716981
2,No log,0.117502,0.95339,0.809249
3,No log,0.118258,0.960452,0.842697


[I 2024-06-18 13:30:32,893] Trial 2 finished with value: 0.8426966292134831 and parameters: {'learning_rate': 8.392044000120715e-05, 'num_train_epochs': 4, 'seed': 26, 'per_device_train_batch_size': 4}. Best is trial 2 with value: 0.8426966292134831.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.849904,0.137006,0.240994
1,No log,0.806446,0.137006,0.240994


[I 2024-06-18 13:31:04,592] Trial 3 finished with value: 0.24099378881987576 and parameters: {'learning_rate': 1.3189956934698335e-05, 'num_train_epochs': 2, 'seed': 13, 'per_device_train_batch_size': 64}. Best is trial 2 with value: 0.8426966292134831.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.37238,0.862994,0.0
1,No log,0.302973,0.862994,0.0
2,No log,0.221739,0.862994,0.0
3,No log,0.195869,0.882768,0.385185


[I 2024-06-18 13:33:42,637] Trial 4 finished with value: 0.3851851851851852 and parameters: {'learning_rate': 2.7339212107014292e-05, 'num_train_epochs': 4, 'seed': 34, 'per_device_train_batch_size': 8}. Best is trial 2 with value: 0.8426966292134831.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.364242,0.862994,0.0


[I 2024-06-18 13:34:02,964] Trial 5 pruned. 
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.495013,0.862994,0.0


[I 2024-06-18 13:34:19,610] Trial 6 pruned. 
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.379334,0.862994,0.0


[I 2024-06-18 13:35:18,963] Trial 7 pruned. 
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.38008,0.862994,0.0


[I 2024-06-18 13:36:23,879] Trial 8 pruned. 
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.677368,0.525424,0.151515


[I 2024-06-18 13:37:25,811] Trial 9 pruned. 


BestRun(run_id='2', objective=0.8426966292134831, hyperparameters={'learning_rate': 8.392044000120715e-05, 'num_train_epochs': 4, 'seed': 26, 'per_device_train_batch_size': 4}, run_summary=None)

In [7]:
# self-defined parameters tuning

def default_hp_space_optuna(trial):

    return {
        "learning rate": trial.suggested_float("learning_rate", 1e-6, 1e-4, log=True),
        "num_train_epochs": trial.suggested_int("num_train_epochs", 1, 5),
        "seed": trial.suggested_int("seed", 1, 40),
        "per_device_train_batch_size": trial.suggested_categorical("per_device_train_batch_size", [4, 8, 16, 32, 64]),
        "optim": trial.suggested_categorical("optim", ["sgd", "adamw_hf"])
    }

# seach
trainer.hyperparameter_search(
    compute_objective=lambda x: x["eval_f1"],
    direction="maximize",
    n_trials=10
)

[I 2024-06-18 13:57:30,871] A new study created in memory with name: no-name-9464010c-c71e-4ada-b1b9-cd3acacb3583
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.479269,0.87535,0.0
1,No log,0.412576,0.87535,0.0
2,No log,0.397649,0.87535,0.0


[I 2024-06-18 13:58:50,024] Trial 0 finished with value: 0.0 and parameters: {'learning_rate': 1.680075891587224e-05, 'num_train_epochs': 3, 'seed': 36, 'per_device_train_batch_size': 16}. Best is trial 0 with value: 0.0.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.432264,0.87535,0.0


[I 2024-06-18 13:59:55,577] Trial 1 finished with value: 0.0 and parameters: {'learning_rate': 6.349786102601948e-06, 'num_train_epochs': 1, 'seed': 34, 'per_device_train_batch_size': 4}. Best is trial 0 with value: 0.0.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.666834,0.662465,0.136201
1,No log,0.616966,0.85014,0.036036
2,No log,0.589679,0.868347,0.020833
3,No log,0.575223,0.872549,0.021505
4,No log,0.570193,0.87535,0.021978


[I 2024-06-18 14:03:03,678] Trial 2 finished with value: 0.02197802197802198 and parameters: {'learning_rate': 1.0591550097831422e-06, 'num_train_epochs': 5, 'seed': 9, 'per_device_train_batch_size': 8}. Best is trial 2 with value: 0.02197802197802198.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.529404,0.87535,0.0
1,No log,0.502711,0.87535,0.0
2,No log,0.488693,0.87535,0.0


[I 2024-06-18 14:04:08,651] Trial 3 finished with value: 0.0 and parameters: {'learning_rate': 1.0156636288259209e-05, 'num_train_epochs': 4, 'seed': 34, 'per_device_train_batch_size': 32}. Best is trial 2 with value: 0.02197802197802198.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.830551,0.133053,0.221384
1,No log,0.737733,0.336134,0.188356
2,No log,0.67606,0.516807,0.148148
3,No log,0.631984,0.670868,0.126394
4,No log,0.624948,0.72409,0.139738


[I 2024-06-18 14:06:27,181] Trial 4 finished with value: 0.13973799126637554 and parameters: {'learning_rate': 3.110610102632726e-06, 'num_train_epochs': 5, 'seed': 39, 'per_device_train_batch_size': 16}. Best is trial 4 with value: 0.13973799126637554.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.710693,0.383754,0.249147


[I 2024-06-18 14:06:51,109] Trial 5 finished with value: 0.24914675767918087 and parameters: {'learning_rate': 1.237123527047492e-06, 'num_train_epochs': 1, 'seed': 23, 'per_device_train_batch_size': 16}. Best is trial 5 with value: 0.24914675767918087.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.542301,0.87535,0.0
1,No log,0.394847,0.87535,0.0


[I 2024-06-18 14:07:28,458] Trial 6 pruned. 
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.6369,0.673669,0.231023
1,No log,0.621115,0.752101,0.253165


[I 2024-06-18 14:08:12,713] Trial 7 finished with value: 0.25316455696202533 and parameters: {'learning_rate': 1.1594816998905184e-06, 'num_train_epochs': 2, 'seed': 22, 'per_device_train_batch_size': 16}. Best is trial 7 with value: 0.25316455696202533.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.530413,0.87535,0.0


[I 2024-06-18 14:08:35,756] Trial 8 pruned. 
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
0,No log,0.665744,0.763305,0.076503
1,No log,0.634429,0.847339,0.052174


[I 2024-06-18 14:10:43,110] Trial 9 finished with value: 0.05217391304347826 and parameters: {'learning_rate': 1.1130265041636542e-06, 'num_train_epochs': 2, 'seed': 40, 'per_device_train_batch_size': 4}. Best is trial 7 with value: 0.25316455696202533.


BestRun(run_id='7', objective=0.25316455696202533, hyperparameters={'learning_rate': 1.1594816998905184e-06, 'num_train_epochs': 2, 'seed': 22, 'per_device_train_batch_size': 16}, run_summary=None)

: 