
# Exercise 3: Fine-Tuning Pretrained Transformer on Text Classification Task

In this lab, you will apply the concepts learned by fine-tuning a pre-trained Transformer model on a text classification task using the Hugging Face `transformers` library.

## Objectives:
- Learn to load a pre-trained model from Hugging Face.
- Fine-tune the model on a text classification dataset.
- Evaluate and save the fine-tuned model.

### Instructions:
Follow the steps below to complete the lab.


## Installing Necessary Libraries

In this lab, we will use the following Python libraries:

- **Transformers**: Hugging Face's `transformers` library provides pre-trained models for various Natural Language Processing (NLP) tasks. We will use this to load and fine-tune a pre-trained transformer model.
  
- **Datasets**: Hugging Face's `datasets` library allows easy access to a wide range of datasets and provides tools for efficient data processing.
  
- **Scikit-learn**: A widely-used library for machine learning, which includes tools for metrics, evaluation, and preprocessing. In this lab, we'll use it to calculate evaluation metrics such as accuracy, precision, recall, and F1 score.
  
- **Accelerate**: This library allows seamless usage of different hardware (CPU, GPU) for training, helping to speed up the training process by utilizing all available resources efficiently.

You can install these libraries using the following command:

```bash
!pip install transformers datasets scikit-learn accelerate


In [1]:

# Install necessary libraries
!pip install transformers datasets scikit-learn accelerate




## Step 1: Loading and Preparing the Dataset

In [2]:

# Importing necessary libraries
from datasets import load_dataset

# Load IMDb dataset
dataset = load_dataset('imdb')
print(dataset)


  from .autonotebook import tqdm as notebook_tqdm


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})


## Step 2: Tokenizing the Dataset

In [3]:

# Importing tokenizer from Hugging Face
from transformers import AutoTokenizer

# Load pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Define tokenization function
def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=128)

# Apply tokenization
tokenized_datasets = dataset.map(tokenize_function, batched=True)
print(tokenized_datasets)


Map: 100%|██████████| 25000/25000 [00:01<00:00, 16343.40 examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 50000
    })
})





## Step 3: Setting up the Trainer

In [4]:

# Import pre-trained model and Trainer utilities
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments

# Load pre-trained BERT model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Define training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=10,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy='epoch',
    logging_dir='./logs',
)

# Create trainer instance
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'].shuffle().select(range(1000)),
    eval_dataset=tokenized_datasets['test'].shuffle().select(range(1000)),
)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlock

## Step 4: Training the Model

In [5]:

# Start training
trainer.train()


                                                  
 10%|█         | 130/1250 [00:05<02:03,  9.08it/s]

{'eval_loss': 0.3791567385196686, 'eval_runtime': 1.0562, 'eval_samples_per_second': 946.791, 'eval_steps_per_second': 118.349, 'epoch': 1.0}


                                                  
 20%|██        | 253/1250 [00:11<02:17,  7.23it/s]

{'eval_loss': 0.532483696937561, 'eval_runtime': 1.037, 'eval_samples_per_second': 964.355, 'eval_steps_per_second': 120.544, 'epoch': 2.0}


                                                  
 30%|███       | 379/1250 [00:16<01:34,  9.20it/s]

{'eval_loss': 0.7745172381401062, 'eval_runtime': 1.0525, 'eval_samples_per_second': 950.128, 'eval_steps_per_second': 118.766, 'epoch': 3.0}


 40%|████      | 500/1250 [00:21<00:26, 28.20it/s]

{'loss': 0.266, 'grad_norm': 0.022533729672431946, 'learning_rate': 3e-05, 'epoch': 4.0}


                                                  
 40%|████      | 505/1250 [00:23<02:24,  5.16it/s]

{'eval_loss': 0.9026833772659302, 'eval_runtime': 1.0554, 'eval_samples_per_second': 947.469, 'eval_steps_per_second': 118.434, 'epoch': 4.0}


                                                  
 50%|█████     | 628/1250 [00:28<01:27,  7.13it/s]

{'eval_loss': 0.9903187155723572, 'eval_runtime': 1.0541, 'eval_samples_per_second': 948.69, 'eval_steps_per_second': 118.586, 'epoch': 5.0}


                                                  
 60%|██████    | 754/1250 [00:34<00:53,  9.29it/s]

{'eval_loss': 1.055458426475525, 'eval_runtime': 1.0417, 'eval_samples_per_second': 959.996, 'eval_steps_per_second': 119.999, 'epoch': 6.0}


                                                  
 70%|███████   | 880/1250 [00:39<00:40,  9.24it/s]

{'eval_loss': 1.0952606201171875, 'eval_runtime': 1.0462, 'eval_samples_per_second': 955.809, 'eval_steps_per_second': 119.476, 'epoch': 7.0}


 80%|████████  | 1000/1250 [00:44<00:08, 28.30it/s]

{'loss': 0.0148, 'grad_norm': 0.007087570149451494, 'learning_rate': 1e-05, 'epoch': 8.0}


                                                   
 80%|████████  | 1003/1250 [00:46<01:03,  3.87it/s]

{'eval_loss': 1.1020959615707397, 'eval_runtime': 1.0447, 'eval_samples_per_second': 957.207, 'eval_steps_per_second': 119.651, 'epoch': 8.0}


                                                   
 90%|█████████ | 1129/1250 [00:51<00:12,  9.40it/s]

{'eval_loss': 1.1080827713012695, 'eval_runtime': 1.0256, 'eval_samples_per_second': 975.043, 'eval_steps_per_second': 121.88, 'epoch': 9.0}


                                                   
100%|██████████| 1250/1250 [00:58<00:00, 21.41it/s]

{'eval_loss': 1.1191223859786987, 'eval_runtime': 1.0287, 'eval_samples_per_second': 972.099, 'eval_steps_per_second': 121.512, 'epoch': 10.0}
{'train_runtime': 58.3887, 'train_samples_per_second': 171.266, 'train_steps_per_second': 21.408, 'train_loss': 0.11383090591430664, 'epoch': 10.0}





TrainOutput(global_step=1250, training_loss=0.11383090591430664, metrics={'train_runtime': 58.3887, 'train_samples_per_second': 171.266, 'train_steps_per_second': 21.408, 'total_flos': 657777638400000.0, 'train_loss': 0.11383090591430664, 'epoch': 10.0})

## Step 5: Evaluating the Model

### Evaluation Metrics

we will evaluate the performance of the fine-tuned model using the following metrics:

- **Accuracy**: This measures the percentage of correctly predicted labels out of all predictions. It is useful for understanding the overall correctness of the model. 
  - **Formula**: 
    $$
    \text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}}
    $$

- **Precision**: Precision measures how many of the positive predictions made by the model are actually correct. It is especially important when the cost of false positives is high (i.e., incorrectly labeling negative examples as positive).
  - **Formula**: 
    $$
    \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
    $$

- **Recall**: Recall (also known as sensitivity or true positive rate) measures how many actual positive cases the model is able to correctly identify. It is important when the cost of missing positive examples (false negatives) is high.
  - **Formula**: 
    $$
    \text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
    $$

- **F1 Score**: The F1 score is the harmonic mean of precision and recall, providing a balance between the two. It is particularly useful when you need to balance both false positives and false negatives.
  - **Formula**: 
    $$
    \text{F1} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
    $$


In [6]:

# Importing evaluation metrics
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

# Define function to compute metrics
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    acc = accuracy_score(labels, preds)
    return {'accuracy': acc, 'f1': f1, 'precision': precision, 'recall': recall}

# Update trainer to include custom metrics
trainer.compute_metrics = compute_metrics

# Evaluate the model
eval_result = trainer.evaluate()
print(eval_result)


  0%|          | 0/125 [00:00<?, ?it/s]

100%|██████████| 125/125 [00:01<00:00, 121.68it/s]

{'eval_loss': 1.1191223859786987, 'eval_accuracy': 0.836, 'eval_f1': 0.836, 'eval_precision': 0.8228346456692913, 'eval_recall': 0.8495934959349594, 'eval_runtime': 1.0377, 'eval_samples_per_second': 963.644, 'eval_steps_per_second': 120.455, 'epoch': 10.0}





## Step 6: Saving the Fine-Tuned Model

In [7]:

# Save the fine-tuned model and tokenizer
trainer.save_model('my-fine-tuned-bert')
tokenizer.save_pretrained('my-fine-tuned-bert')


('my-fine-tuned-bert/tokenizer_config.json',
 'my-fine-tuned-bert/special_tokens_map.json',
 'my-fine-tuned-bert/vocab.txt',
 'my-fine-tuned-bert/added_tokens.json',
 'my-fine-tuned-bert/tokenizer.json')

## Step 7: Loading and Testing the Saved Model

In [8]:

# Load saved model and tokenizer
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

# Load the fine-tuned model and tokenizer
new_model = AutoModelForSequenceClassification.from_pretrained('my-fine-tuned-bert')
new_tokenizer = AutoTokenizer.from_pretrained('my-fine-tuned-bert')

# Create a classification pipeline
classifier = TextClassificationPipeline(model=new_model, tokenizer=new_tokenizer)

# Add label mapping for sentiment analysis (assuming LABEL_0 = 'negative' and LABEL_1 = 'positive')
label_mapping = {0: 'negative', 1: 'positive'}

# Test the model
result = classifier("This movie was excellent.")

# Map the result to more meaningful labels
mapped_result = {'label': label_mapping[int(result[0]['label'].split('_')[1])], 'score': result[0]['score']}
print(mapped_result)


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


{'label': 'positive', 'score': 0.9996389150619507}


## Exercise: Add a gradle UI

pretty easy, nothing crazy here.

In [1]:
import gradio as gr

# Load the fine-tuned model and tokenizer
new_model = AutoModelForSequenceClassification.from_pretrained('my-fine-tuned-bert')
new_tokenizer = AutoTokenizer.from_pretrained('my-fine-tuned-bert')
# Create a classification pipeline
classifier = TextClassificationPipeline(model=new_model, tokenizer=new_tokenizer)

def evaluate_rating(prompt):
    label_mapping = { 0: 'negative', 1: 'positive' }
    result = classifier(prompt)
    mapped_result = {'label': label_mapping[int(result[0]['label'].split('_')[1])], 'score': result[0]['score']}
    return mapped_result['score'], mapped_result['label']


def generate_interface(prompt):
    return evaluate_rating(prompt)

interface = gr.Interface(
    fn=generate_interface,
    inputs=[
        gr.Textbox(lines=2, placeholder="Enter your code here",
                   label="Prompt"),
    ],
    outputs=[
        gr.Number(label="Score"),
        gr.Textbox(label="Positive or Negative?"),
    ],
    title="Pretrained transformer with imdb dataset",
    description="This is the description"
)

interface.launch()

  from .autonotebook import tqdm as notebook_tqdm


NameError: name 'AutoModelForSequenceClassification' is not defined