# Fine-tune a Pre-trained Model Using HuggingFace Transformers
Fine-tuning a pretrained model allows you to leverage the vast amount of knowledge encoded in the model from its initial training on large datasets. This approach significantly reduces the time and computational resources required compared to training a model from scratch. It also helps achieve high performance with relatively small amounts of task-specific data, making it a powerful technique in machine learning and AI development.

## Steps for Fine-Tuning a Pretrained Model
#### Choose a Pretrained Model: 
Select a model from the Hugging Face Model Hub that suits your task. For example, if you're working on text classification, models like BERT or RoBERTa are popular choices.

#### Prepare Your Dataset: 
Ensure your dataset is properly formatted. For text tasks, this usually involves tokenizing your text data. You can use the Tokenizer provided by the Transformers library to convert your text into input IDs and attention masks.

#### Set Up Training Arguments: 
Define your training parameters using TrainingArguments. This includes specifying the output directory, evaluation strategy, learning rate, batch size, and number of epochs.

#### Create a Trainer: 
Instantiate a Trainer object, which will handle the training process. You need to provide your model, training arguments, training dataset, evaluation dataset, and a function to compute metrics.

#### Train the Model: 
Call the train() method on your Trainer object to start the fine-tuning process.

#### Evaluate the Model: 
After training, you can evaluate the model's performance on the validation dataset to check its accuracy and other metrics.

## Goal of Fine-tuning
We are going to train a model using the Yelp review dataset. The primary goal is to fine-tune the pretrained model so it can accurately classify the sentiment of Yelp reviews (e.g., positive or negative). 

### Install all the necessary libraries

In [10]:
!pip install transformers datasets evaluate accelerate



### You’ll also need to install your preferred machine learning framework - Pytorch or TensorFlow.

In [11]:
!pip install torch



### Begin by loading the Yelp Reviews dataset:

In [12]:
from datasets import load_dataset

dataset = load_dataset("yelp_review_full")
dataset["train"][100]

Downloading readme:   0%|          | 0.00/6.72k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/299M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/23.5M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/650000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/50000 [00:00<?, ? examples/s]

{'label': 0,
 'text': 'My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\\nThe cashier took my friends\'s order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited over five minutes for a gigantic order that included precisely one kid\'s meal. After watching two people who ordered after me be handed their food, I asked where mine was. The manager started yelling at the cashiers for \\"serving off their orders\\" when they didn\'t have their food. But neither cashier was anywhere near those controls, and the manager was the one serving food to customers and clearing the boards.\\nThe manager was rude when giving me my order. She didn\'t make sure that I had everything ON MY RECEIPT, and never even had the decency to apologize that I felt I was getting poor service.\\nI\'ve eaten at various McDonalds restaurants for over 30 years. 

### Tokenize the text data to prepare it for the model.

In [13]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")


def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


tokenized_datasets = dataset.map(tokenize_function, batched=True)



tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Map:   0%|          | 0/650000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

#### If you like, you can create a smaller subset of the full dataset to fine-tune on to reduce the time it takes:

In [18]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

#### Train with PyTorch Trainer

In [19]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)



model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Training hyperparameters

create a TrainingArguments class which contains all the hyperparameters you can tune as well as flags for activating different training options. 

In [20]:
from transformers import TrainingArguments

training_args = TrainingArguments(output_dir="test_trainer")

In [24]:
!pip install scikit-learn

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting scikit-learn
  Downloading scikit_learn-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.3/13.3 MB[0m [31m149.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m105.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.4.2 scikit-learn-1.5.1 threadpoolctl-3.5.0


#### Evaluate
Trainer does not automatically evaluate model performance during training. You’ll need to pass Trainer a function to compute and report metrics.

In [25]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

In [26]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

#### Trainer
Create a Trainer object with your model, training arguments, training and test datasets, and evaluation function.

In [28]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


#### Fine-tune your model by calling train()

In [30]:
trainer.train()

Step,Training Loss


TrainOutput(global_step=375, training_loss=0.977566650390625, metrics={'train_runtime': 4605.0362, 'train_samples_per_second': 0.651, 'train_steps_per_second': 0.081, 'total_flos': 789354427392000.0, 'train_loss': 0.977566650390625, 'epoch': 3.0})

#### Here's a detailed explanation of each component:

##### global_step=375:
This indicates the total number of steps (batches) the model has been trained on. Each step corresponds to one batch of data passed through the model.

##### training_loss=0.977566650390625:
The average training loss over all batches and epochs. Loss is a measure of how well the model is performing on the training data; a lower value indicates better performance. Here, the training loss is approximately 0.978.

##### metrics:train_runtime=4605.0362: 
The total time taken to complete the training, in seconds (approximately 4605 seconds, or about 1 hour and 17 minutes).

##### train_samples_per_second=0.651: 
The number of training samples processed per second. This value is relatively low, indicating the process might be computationally intensive or the hardware may not be optimal.

##### train_steps_per_second=0.081: 
The number of training steps (batches) processed per second.

##### total_flos=789354427392000.0: 
Floating-point operations per second (FLOPs) used during training. This metric gives an indication of the computational workload.

##### train_loss=0.977566650390625: 
The same as the training loss mentioned earlier.

##### epoch=3.0: 
Indicates that the training process ran for 3 epochs (full passes over the training dataset).

#### Evaluate the model
After training, you can evaluate the model to see its performance on the evaluation dataset.

In [81]:
eval_results = trainer.evaluate()
print(eval_results)

{'eval_loss': 1.0017979145050049, 'eval_accuracy': 0.606, 'eval_runtime': 306.0542, 'eval_samples_per_second': 3.267, 'eval_steps_per_second': 0.408, 'epoch': 3.0}


#### Here's a detailed explanation of each component:

##### eval_loss=1.0017979145050049:
The loss computed on the evaluation (validation) dataset. Similar to training loss, it indicates how well the model is performing on unseen data. Here, the evaluation loss is approximately 1.002.

##### eval_accuracy=0.606:
The accuracy of the model on the evaluation dataset. It represents the proportion of correctly classified instances. An accuracy of 0.606 means the model correctly classified 60.6% of the evaluation samples.

##### eval_runtime=306.0542:
The total time taken to complete the evaluation, in seconds (approximately 306 seconds, or about 5 minutes and 6 seconds).

##### eval_samples_per_second=3.267:
The number of evaluation samples processed per second. This value is higher than the training samples per second, which is common since evaluation usually involves forward passes only, without backpropagation.

##### eval_steps_per_second=0.408:
The number of evaluation steps (batches) processed per second. This value is also higher than the training steps per second, for similar reasons.

You successfully fine-tuned a pretrained model (e.g., BERT) on the Yelp review dataset. The model adapted its general language understanding to the specific task of sentiment analysis on Yelp reviews.