# Classificaton Fine-tuning

La siguiente tarea consiste en entrenar un modelo de HuggingFace (HF) para realizar la _task_ de _classification_. El dataset para entrenar dicho modelo está predefinido. Sin embargo, el modelo, el tokenizador y el trainer pueden ser totalmente personalizados. Es decir, que tendréis que realizar un trabajo de investigación, de prueba y error, para poder ir aprendiendo y ganando destreza con HF.

Recomendaciones:
- Durante este proceso, tendréis muchas dudas y encontraréis muchos errores. Tratad de resolverlas primero por vuestra cuenta, enteniendo la causa del error. Después con recursos online. Y, finalmente, siempre está el foro, que puede ser utilizado de forma participativa.
- No dejeis la tarea para el último día. Los modelos tardan en entrenar. Los problemas no se resuelven en la primera iteración.

Finalmente, se pide:
- Limpieza rigurosa en la presentación del notebook.
- El notebook se entrega con todas las celdas ejecutadas.
- Los comentarios (opcionales), mejor sobre el código con '#'.

Ánimo!

## Dataset

A continuación, descargarás un DatasectDict llamado _glue_. La target es la columna llamada _label_.

In [1]:
from datasets import load_dataset, DatasetDict
import os

ds = load_dataset("glue", "mnli")
os.environ["PYTORCH_MPS_HIGH_WATERMARK_RATIO"] = "0.0"
ds

  from .autonotebook import tqdm as notebook_tqdm


DatasetDict({
    train: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 392702
    })
    validation_matched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9815
    })
    validation_mismatched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9832
    })
    test_matched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9796
    })
    test_mismatched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 9847
    })
})

Lo primero que tendrás que hacer es construir un DatasetDict nuevo, llamado **ds_tarea**, que filtre el anterior DatasetDict para:
- quedarse con los registros que tengan el contenido de la columna _premise_ con menos (estrictamente) de 20 caracteres.
- que solo tenga los Datasets de _train_ y _validation_matched_

In [2]:
ds_tarea = None

def filter_short_premise(example):
    return len(example['premise']) < 20

train_filtered = ds['train'].filter(filter_short_premise)
validation_matched_filtered = ds['validation_matched'].filter(filter_short_premise)

ds_tarea = DatasetDict({
    'train': train_filtered,
    'validation_matched': validation_matched_filtered
})

ds_tarea

DatasetDict({
    train: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 13635
    })
    validation_matched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 413
    })
})

In [3]:
# Celda de control

assert len(ds_tarea['train']) == 13635
assert len(ds_tarea['validation_matched']) == 413
assert set(ds_tarea.keys()) == {'train', 'validation_matched'}

## EDA

Si tenéis que realizar alguna exploración del datos, utilizad esta sección.

In [4]:
# Celdas de libre uso

## Model y Tokenizer

Se pide guardar el modelo y el tokenizador en las variables _model_ y _tokenizer_.
Aunque no se utilicen hasta más adelante, declaradlos en esta sección.

In [5]:
tokenizer = None
model = None

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

#model_name = "bert-base-multilingual-cased" #0.54
#model_name = "distilbert-base-uncased" #0.34
#model_name = "albert-base-v2" #0.34
#model_name = "huawei-noah/TinyBERT_General_4L_312D" #0.56
#model_name = "albert-base-v2" #0.44
model_name = "bert-base-uncased" #0.57
#model_name = "distilbert-base-uncased-finetuned-sst-2-english" #0.34
#model_name = "bert-base-uncased" #0.35

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
tokenizer = AutoTokenizer.from_pretrained(model_name)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Feature Engineering

Si tenéis que realizar alguna modificación de los datos (no siempre es necesaria, pero algunos modelos preentrenados lo piden), podéis utilizar esta sección.

Al finalizar la sección, bien si modificais el DatasectDict, bien si no lo modificáis, lo guardaréis en __ds_tarea_featured__.

In [6]:
# Celdas de libre uso
ds_tarea

DatasetDict({
    train: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 13635
    })
    validation_matched: Dataset({
        features: ['premise', 'hypothesis', 'label', 'idx'],
        num_rows: 413
    })
})

In [7]:
def tokenize_function(examples):
    return tokenizer(examples['premise'], examples['hypothesis'], truncation=True)

ds_tarea_tokenized = ds_tarea.map(tokenize_function, batched=True)

ds_tarea_tokenized = ds_tarea_tokenized.remove_columns(["premise", "hypothesis"])
#ds_tarea_tokenized.set_format("torch")

ds_tarea_featured = ds_tarea_tokenized 
ds_tarea_featured

DatasetDict({
    train: Dataset({
        features: ['label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 13635
    })
    validation_matched: Dataset({
        features: ['label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 413
    })
})

In [8]:
# Celda de control

assert len(ds_tarea_featured['train']) == 13635
assert len(ds_tarea_featured['validation_matched']) == 413

## Fine-tuning

Para poder evaluar el modelo a lo largo del proceso y no esperar a tener toda la ejecución completa (que podría durar horas), se propone la creación de una métrica que muestre por pantalla la evolución del entrenamiento.

Esta métrica se declara en una función, llamada en este caso _compute_metrics_ y se le pide a los argumentos y al trainer que calculen la métrica al final de cada _epoch_ con el _evaluation_dataset_.

In [9]:
from datasets import load_metric

def compute_metrics(eval_pred):
    metric = load_metric("accuracy")
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    return metric.compute(predictions=predictions, references=labels)

A continuación, de forma libre se pide entrenar un modelo de HuggingFace deseado. Se pide usar un Trainer de HuggingFace que tenga los siguientes argumentos como mínimo (puede haber más argumentos en todas las variables):

In [10]:
from transformers import TrainingArguments, Trainer

args = TrainingArguments(
    output_dir='./finetuned1',
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_dir='./logs',
    metric_for_best_model="accuracy",
    num_train_epochs=3, 
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    learning_rate=5e-5,
    weight_decay=0.01,
    do_train=True,
    do_eval=True,
    load_best_model_at_end=True,
    dataloader_num_workers=4,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=ds_tarea_featured["train"],
    eval_dataset=ds_tarea_featured["validation_matched"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

2024-06-09 10:07:01.410452: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


A continuación se entrena el modelo. Se pide no modificar esta celda, solo ejecutar.

In [11]:
# Esta celda, celda tiene que estar ejecutada en la entrega

from time import time
os.environ["PYTORCH_MPS_HIGH_WATERMARK_RATIO"] = "0.0"
start = time()

trainer.train()

end = time()
print(f">>>>>>>>>>>>> elapsed time: {(end-start)/60:.0f}m")

  0%|          | 0/2559 [00:00<?, ?it/s]2024-06-09 10:07:22.717121: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:07:32.015936: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:07:41.226501: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:07:50.5872

{'loss': 0.8225, 'grad_norm': 4.660393714904785, 'learning_rate': 4.023055881203595e-05, 'epoch': 0.59}


 33%|███▎      | 853/2559 [18:54<3:43:01,  7.84s/it]2024-06-09 10:26:37.206581: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:27:07.374221: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:27:23.172511: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 1

{'eval_loss': 0.5469948649406433, 'eval_accuracy': 0.7820823244552058, 'eval_runtime': 121.2242, 'eval_samples_per_second': 3.407, 'eval_steps_per_second': 0.107, 'epoch': 1.0}


2024-06-09 10:28:20.467392: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:28:30.593107: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:28:40.891813: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:28:53.575574: I tensorflow/core/platform/cpu_featu

{'loss': 0.5606, 'grad_norm': 3.5282459259033203, 'learning_rate': 3.0461117624071905e-05, 'epoch': 1.17}


 59%|█████▊    | 1500/2559 [35:05<23:36,  1.34s/it]

{'loss': 0.402, 'grad_norm': 9.524494171142578, 'learning_rate': 2.0691676436107857e-05, 'epoch': 1.76}


 67%|██████▋   | 1706/2559 [39:58<1:49:51,  7.73s/it]2024-06-09 10:47:39.968948: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:48:08.355710: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:48:23.498694: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 

{'eval_loss': 0.5044950842857361, 'eval_accuracy': 0.8135593220338984, 'eval_runtime': 116.2484, 'eval_samples_per_second': 3.553, 'eval_steps_per_second': 0.112, 'epoch': 2.0}


2024-06-09 10:49:18.866888: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:49:28.414521: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:49:37.845667: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 10:49:48.464308: I tensorflow/core/platform/cpu_featu

{'loss': 0.2742, 'grad_norm': 9.794675827026367, 'learning_rate': 1.0922235248143807e-05, 'epoch': 2.34}


 98%|█████████▊| 2500/2559 [59:13<01:21,  1.38s/it]

{'loss': 0.2219, 'grad_norm': 6.694991588592529, 'learning_rate': 1.1527940601797578e-06, 'epoch': 2.93}


100%|██████████| 2559/2559 [1:00:56<00:00,  7.74s/it]2024-06-09 11:08:38.005780: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 11:09:07.757257: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 11:09:22.523612: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 

{'eval_loss': 0.6466549634933472, 'eval_accuracy': 0.8159806295399515, 'eval_runtime': 118.7122, 'eval_samples_per_second': 3.479, 'eval_steps_per_second': 0.11, 'epoch': 3.0}


100%|██████████| 2559/2559 [1:02:59<00:00,  1.48s/it]

{'train_runtime': 3779.2589, 'train_samples_per_second': 10.824, 'train_steps_per_second': 0.677, 'train_loss': 0.4496669132013533, 'epoch': 3.0}
>>>>>>>>>>>>> elapsed time: 63m





In [12]:
# Esta celda tiene que estar ejecutada en la entrega
# Se espera un eval_accuracy superior a 0.75
# A mayor accuracy no hay mayor nota, con superar el umbral de 0.75 es suficiente

results = trainer.evaluate()
final_eval_accuracy = results.get("eval_accuracy")

print(f"Final Eval Accuracy: {final_eval_accuracy:.2f}")

2024-06-09 11:10:19.733662: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 11:10:29.939553: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 11:10:39.475957: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-09 11:10:48.889245: I tensorflow/core/platform/cpu_featu

Final Eval Accuracy: 0.82



