<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/4/47/Acronimo_y_nombre_uc3m.png"/>

<img src="https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-nc-sa.png" width=15%/>
</center> 

# Búsqueda de los hiperparámetros óptimos (Hyperparameter search)

Ya sabemos como usar la clase **Trainer** para ajustar un transformer a una tarea concreta, sin tener que escribir el ciclo de entrenamiento.

La clase **Trainer** también  proporciona una API para la búsqueda de los mejores hiperparámetros. **Trainer** admite actualmente cuatro backends de búsqueda de hiperparámetros: **optuna**, **sigopt**, **raytune** y **wandb**. 
En este tutorial, usaremos **optuna**.

Debemos instalar algunas bibliotecas:


In [None]:
!pip install transformers datasets optuna
#!pip install sigopt/wandb/ray[tune] 


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


 En particular, vamos a buscar los mejores valores para la tasa de aprendizaje de parámetros y el tamaño del lote para el entrenamiento. Para ello, definiremos la siguiente función con los posibles valores para cada uno de los hiperparámetros anteriores:

In [None]:
def optuna_hp_space(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
    }

Ahora cargamos un dataset para la tarea de clasifiación binaria de textos (análisi de sentimiento). Vamos a usar el modelo *distilbert*

In [None]:
from datasets import load_dataset
dataset = load_dataset("glue", "sst2")

from transformers import AutoTokenizer
model_name='distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

def tokenize(example):
    return tokenizer(example["sentence"], truncation=True)

encoded_dataset = dataset.map(tokenize, batched=True)
encoded_dataset

Downloading builder script:   0%|          | 0.00/28.8k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/28.7k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/27.8k [00:00<?, ?B/s]

Downloading and preparing dataset glue/sst2 to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...


Downloading data:   0%|          | 0.00/7.44M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

  0%|          | 0/68 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'attention_mask'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'attention_mask'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx', 'input_ids', 'attention_mask'],
        num_rows: 1821
    })
})

In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

num_labels = 2 # because this is a binary text classification task 

def model_init():
    return AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

training_args = TrainingArguments(
    output_dir='./outputs/',
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=1, # 5, we changed to 1 for a faster training. You should increase its value to 3 or 5
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",   
)

from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='macro')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }



PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


En lugar de llamar al método **train()**, llamaremos a **hyperparameter_search**. 

Esto podría llevar mucho tiempo si usamos todo el conjunto de entrenamiento. Para agilizar el proceso, vamos a usar una porción más pequeña para encontrar los mejores hiperparámetros. Para obtener esta porción más pequeña, podemos usar el método **shard** que divide un conjunto de datos en un número predefinido de fragmentos.




In [None]:
## this returns the second shard
train_dataset = encoded_dataset["train"].shard(index=1, num_shards=20) # num_shards=10

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    # train_dataset=encoded_dataset["train"],
    train_dataset=train_dataset,
    eval_dataset=encoded_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

best_run = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=optuna_hp_space,
    n_trials=5  #10 or 20
)

best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")
best_run

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/1c4513b2eedbda136f57676a34eea67aba266e5c/config.json
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.24.0",
  "vocab_size": 30522
}

loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/1c4513b2eedbda136f57676a34eea67aba266e5c/pytorch_model.bin
Some weights of the model checkpoint at distilbert-base-uncased were not used when in

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.345281,0.855505,0.855477,0.85545,0.85554


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to ./outputs/run-0/checkpoint-106
Configuration saved in ./outputs/run-0/checkpoint-106/config.json
Model weights saved in ./outputs/run-0/checkpoint-106/pytorch_model.bin
tokenizer config file saved in ./outputs/run-0/checkpoint-106/tokenizer_config.json
Special tokens file saved in ./outputs/run-0/checkpoint-106/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./outputs/run-0/checkpoint-106 (score: 0.8555045871559633).
[32m[I 2022-11-28 19:48:08,954][0m Trial 0 finished with value: 3.4219722611358314 and parameters: {'learn

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.352717,0.847477,0.847461,0.847458,0.847573


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to ./outputs/run-1/checkpoint-106
Configuration saved in ./outputs/run-1/checkpoint-106/config.json
Model weights saved in ./outputs/run-1/checkpoint-106/pytorch_model.bin
tokenizer config file saved in ./outputs/run-1/checkpoint-106/tokenizer_config.json
Special tokens file saved in ./outputs/run-1/checkpoint-106/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./outputs/run-1/checkpoint-106 (score: 0.8474770642201835).
[32m[I 2022-11-28 19:59:18,371][0m Trial 1 finished with value: 3.3899687695371505 and parameters: {'learn

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.37311,0.841743,0.84173,0.841743,0.841858


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to ./outputs/run-2/checkpoint-106
Configuration saved in ./outputs/run-2/checkpoint-106/config.json
Model weights saved in ./outputs/run-2/checkpoint-106/pytorch_model.bin
tokenizer config file saved in ./outputs/run-2/checkpoint-106/tokenizer_config.json
Special tokens file saved in ./outputs/run-2/checkpoint-106/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./outputs/run-2/checkpoint-106 (score: 0.841743119266055).
[32m[I 2022-11-28 20:10:25,951][0m Trial 2 finished with value: 3.3670742498654485 and parameters: {'learni

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.44496,0.827982,0.827937,0.827917,0.827966


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to ./outputs/run-3/checkpoint-211
Configuration saved in ./outputs/run-3/checkpoint-211/config.json
Model weights saved in ./outputs/run-3/checkpoint-211/pytorch_model.bin
tokenizer config file saved in ./outputs/run-3/checkpoint-211/tokenizer_config.json
Special tokens file saved in ./outputs/run-3/checkpoint-211/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./outputs/run-3/checkpoint-211 (score: 0.8279816513761468).
[32m[I 2022-11-28 20:21:45,446][0m Trial 3 finished with value: 3.3118022671172405 and parameters: {'learn

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.686073,0.509174,0.337386,0.254587,0.5


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to ./outputs/run-4/checkpoint-27
Configuration saved in ./outputs/run-4/checkpoint-27/config.json
Model weights saved in ./outputs/run-4/checkpoint-27/pytorch_model.bin
tokenizer config file saved in ./outputs/run-4/checkpoint-27/tokenizer_config.json
Special tokens file saved in ./outputs/run-4/checkpoint-27/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./outputs/run-4/checkpoint-27 (score: 0.5091743119266054).
[32m[I 2022-11-28 20:33:57,037][0m Trial 4 finished with 

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.68745,0.516055,0.352756,0.756351,0.507009
2,No log,0.679847,0.516055,0.352756,0.756351,0.507009


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to ./outputs/run-0/checkpoint-53
Configuration saved in ./outputs/run-0/checkpoint-53/config.json
Model weights saved in ./outputs/run-0/checkpoint-53/pytorch_model.bin
tokenizer config file saved in ./outputs/run-0/checkpoint-53/tokenizer_config.json
Special tokens file saved in ./outputs/run-0/checkpoint-53/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this mess

## Entrenando el modelo con los mejores hiperparámetros:
Finalmente para reproducir el mejor entrenamiento, simplemente debemos configurar los hiperparámetros en nuestro objeto de  **TrainingArgument**. Una vez hecho esto, ya podemos entrenar con **Trainer**: 

In [None]:
for n, v in best_run.hyperparameters.items():
    print(n, v)
    setattr(trainer.args, n, v)

trainer.train()