# Fine-tuning a model with the Trainer API

HuggingFace Transformers provide a ```Trainer``` class to help fine-tune pretrained models it provides on ```your dataset```

## Step 1: Preprocess Data

In [1]:
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_datasets = load_dataset("glue", "mrpc")
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)


def tokenize_function(example):
    return tokenizer(example["sentence1"], example["sentence2"], truncation=True)


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)



Map:   0%|          | 0/408 [00:00<?, ? examples/s]

## Step 2: Define ```TrainingArguments``` class

```TrainingArguments``` will contain all the hyperparameters that the ```Trainer``` will use for training and evaluation

Only argument you have to provide is a directory where this trained model and its checkpoints along the way will be saved. For all the rest, you can leave the defaults, which should work pretty well for a basic fine-tuning

In [2]:
from transformers import TrainingArguments

training_args = TrainingArguments("test-trainer")

## Step 3: Define Model

You will notice that  you get a warning after instantiating this pretrained model. This is because BERT ```has not been pretrained on classifying pairs of sentences, so the head of the pretrained model has been discarded and a new head suitable for sequence classification has been added instead```. The warnings indicate that some weights were not used (the ones corresponding to the dropped pretraining head) and that some others were randomly initialized (the ones for the new head). ```It concludes by encouraging you to train the model, which is exactly what we are going to do now.```

In [3]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Step 4: Define ```Trainer``` 

Pass all the objects constructed up to now — the model, the training_args, the training and validation datasets, our data_collator, and our tokenizer

Note that when you pass the tokenizer as we did here, the default data_collator used by the Trainer will be a DataCollatorWithPadding as defined previously, so you can skip the line data_collator=data_collator in this call

In [None]:
from transformers import Trainer

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

## Step 5: Fine-tune model on our dataset

This will start the fine-tuning (which should take a couple of minutes on a GPU) and report the training loss every 500 steps. It won’t, however, tell you how well (or badly) your model is performing. This is because:

1. We didn’t tell the Trainer to evaluate during training by setting evaluation_strategy to either "steps" (evaluate every eval_steps) or "epoch" (evaluate at the end of each epoch).
2. We didn’t provide the Trainer with a compute_metrics() function to calculate a metric during said evaluation (otherwise the evaluation would just have printed the loss, which is not a very intuitive number).

In [None]:
trainer.train()

## Evaluation

Build a ```compute_metrics()``` function. It returns a dictionary containing the names of the metrics returned and their values

### Getting predictions (logits) from the model
Getting predictions from our model:
```
predictions = trainer.predict(tokenized_datasets["validation"])
print(predictions.predictions.shape, predictions.label_ids.shape)
(408, 2) (408,)
```
The output of the predict() method is another named tuple with three fields: ```predictions, label_ids, and metrics```.
Once we complete our ```compute_metrics()``` function and pass it to the Trainer, that field will also contain the metrics returned by ```compute_metrics()```

```predictions``` is a 2D array with shape ```(408,2)```. Those are the logits for each element of the dataset we passed to ```predict()```. To transform them into predictions that we can compare to our labels, we neeed to take the index with the maximum value on the second axis

### Getting pred from predictions (logits) to compare to labels
```
import numpy as np
preds = np.argmax(predictions.predictions, axis=-1)
```

With this, we can now compare those ```preds``` to the labels.

### Build ```compute_metrics()``` function

We will utilise metrics from HuggingFace's Evaluate library

```
import evaluate
metric = evaluate.load("glue", "mrpc") # load the metrics associated with the MRPC dataset
metric.compute(predictions=preds, references=predictions.label_ids)
{'accuracy': 0.8578431372549019, 'f1': 0.8996539792387542}
```

Wrapping it up

```
def compute_metrics(eval_preds):
    metric = evaluate.load("glue", "mrpc")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)
```

### Using ```compute_metrics()``` to report metrics at the end of each epoch

```
training_args = TrainingArguments("test-trainer", evaluation_strategy="epoch")
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)
```

It will report the validation loss and metrics at the end of each epoch on top of the training loss