<a href="https://colab.research.google.com/github/itsdivya1309/Machine-Learning/blob/main/LLMs/Text%20Classification/Text_Classification_Representation_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Classification with Representation Model

Here, we'll focus on binary sentiment classification of rotten tomatoes movie reviews.

We can accomplish this task in two ways:

### 1. Perform classification directly with a task-specific model

### 2. Perform classification indirectly with general-purpose embeddings

We'll use pre-trained models for now.

In [1]:
# Importing the dataset
!pip install datasets

from datasets import load_dataset

# Load Rotten Tomatoes Moview Review dataset
data = load_dataset('rotten_tomatoes')
data



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})

## Using a Task-specific model

We'll use the `Twitter-RoBERTa-base for Sentiment Analysis` model. This is a RoBERTa model fine-tuned on tweets for sentiment analysis.

In [2]:
!pip install transformers



In [3]:
# Loading the model
from transformers import pipeline

# Path to our model
model_path = "cardiffnlp/twitter-roberta-base-sentiment-latest"

# Load model into pipeline
pipe = pipeline(
    model=model_path,
    tokenizer=model_path,
    top_k=None,
)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [4]:
import numpy as np
from tqdm import tqdm
from transformers.pipelines.pt_utils import KeyDataset

In [5]:
sample = data['train']['text'][0]

In [6]:
output = pipe(sample)
output

[[{'label': 'positive', 'score': 0.9073736071586609},
  {'label': 'neutral', 'score': 0.0880218893289566},
  {'label': 'negative', 'score': 0.00460449093952775}]]

In [7]:
# Get the best sentiment prediction
best_prediction = max(output[0], key=lambda x: x['score'])

# Print only the most confident sentiment
print(f"Predicted Sentiment: {best_prediction['label']} (Confidence: {best_prediction['score']:.2f})")

Predicted Sentiment: positive (Confidence: 0.91)


The model classifies text into `positive`, `negative` and `neutral` categories.

In [30]:
# A list to store predictions
y_pred = []

# Iterate through test dataset
for output in tqdm(pipe(data['test']['text']), total=len(data['test'])):
    # Ensure correct label matching
    if output[0]['label']=='positive':
        if output[1]['label']=='negative':
            negative_score = output[1]['score']
            positive_score = output[0]['score']
        else:
            negative_score = output[2]['score']
            positive_score = output[0]['score']
    elif output[0]['label']=='negative':
        if output[1]['label']=='positive':
            negative_score = output[0]['score']
            positive_score = output[1]['score']
        else:
            negative_score = output[0]['score']
            positive_score = output[2]['score']
    else:
        if output[1]['label']=='negative':
            negative_score = output[1]['score']
            positive_score = output[2]['score']
        else:
            negative_score = output[2]['score']
            positive_score = output[1]['score']
    # Get predicted class (0=Negative, 1=Positive)
    assignment = np.argmax([negative_score, positive_score])
    y_pred.append(assignment)

100%|██████████| 1066/1066 [00:00<00:00, 139557.03it/s]


In [32]:
# A list to store predictions
y_pred = []

# Iterate through test dataset
for output in tqdm(pipe(data['test']['text']), total=len(data['test'])):
    # Convert output into a dictionary for easy lookup
    scores = {entry['label']: entry['score'] for entry in output}

    # Extract scores safely
    negative_score = scores.get('negative', 0)  # Default to 0 if not found
    positive_score = scores.get('positive', 0)

    # Get predicted class (0 = Negative, 1 = Positive)
    assignment = np.argmax([negative_score, positive_score])
    y_pred.append(assignment)

100%|██████████| 1066/1066 [00:00<00:00, 109699.40it/s]


### Understanding the General Output Format

When we use `pipe(text)`, the model gives us an output list where each item is a dictionary like this:

```
[{'label': 'POSITIVE', 'score': 0.98}]
```
or

```
[{'label': 'NEGATIVE', 'score': 0.85}]
```
Now, the problem is:

We don't know for sure if 'NEGATIVE' is always at index 0 and 'POSITIVE' is at index 1. The order might change depending on the model output.

Hence, we need o check the output list to check the order of class labels before assigning the scores.

We check the first prediction (`output[0]`).
If it's 'NEGATIVE', we take `output[0]['score']` as the negative score and `output[1]['score']` as the positive score. Otherwise, we swap them.

**Example Scenarios**

1. Example 1: Model Outputs NEGATIVE First

```
output = [{'label': 'NEGATIVE', 'score': 0.80}, {'label': 'POSITIVE', 'score': 0.20}]
```
`output[0]['label'] == 'NEGATIVE'`, so:

```
negative_score = 0.80  # output[0]['score']
positive_score = 0.20  # output[1]['score']
```

2. Example 2: Model Outputs POSITIVE First

```
output = [{'label': 'POSITIVE', 'score': 0.75}, {'label': 'NEGATIVE', 'score': 0.25}]
```

`output[0]['label'] == 'POSITIVE'`, so we enter the else block:

```
negative_score = 0.25  # output[1]['score']
positive_score = 0.75  # output[0]['score']
```

**The labels in the dictionary are ordered by their scores.**

This means, we won't have the output for all the texts in the same format.

In [9]:
# Evaluation
from sklearn.metrics import classification_report

def evaluate_performance(y_true, y_pred):
    """Create and print classification report"""
    report = classification_report(
        y_true, y_pred,
        target_names=['Negative Review', 'Positive Review']
    )
    print(report)

In [33]:
evaluate_performance(data['test']['label'], y_pred)

                 precision    recall  f1-score   support

Negative Review       0.76      0.88      0.81       533
Positive Review       0.86      0.72      0.78       533

       accuracy                           0.80      1066
      macro avg       0.81      0.80      0.80      1066
   weighted avg       0.81      0.80      0.80      1066



### Using another pre-trained model

Le's use the `distilbert/distilbert-base-uncased-finetuned-sst-2-english` model this time.

In [11]:
# Loading the model
model_path = 'distilbert/distilbert-base-uncased-finetuned-sst-2-english'

# Creating a pipeline
pipe_distilbert = pipeline(
    'sentiment-analysis',
    model=model_path,
    tokenizer=model_path,
    top_k=None
)

Device set to use cuda:0


In [13]:
another_sample = data['validation']['text'][-1]
sample_label = data['validation']['label'][-1]
print('Text\n',another_sample)
print('Label: ', sample_label)

Text
 the feature-length stretch . . . strains the show's concept .
Label:  0


In [14]:
model_output = pipe_distilbert(another_sample)
model_output

[[{'label': 'NEGATIVE', 'score': 0.9998082518577576},
  {'label': 'POSITIVE', 'score': 0.00019182452524546534}]]

In [16]:
# Get the best sentiment prediction
best_prediction = max(model_output[0], key=lambda x: x['score'])

# Print only the most confident sentiment
print(f"Predicted Sentiment: {best_prediction['label']} (Confidence: {best_prediction['score']:.2f})")

Predicted Sentiment: NEGATIVE (Confidence: 1.00)


In [25]:
# Make predictions on the test data

# A list to store predictions
y_pred = []

# Iterate through test dataset
for output in tqdm(pipe_distilbert(data['test']['text']), total=len(data['test'])):
    # Ensure correct label matching
    if output[0]['label']=='NEGATIVE':
        negative_score = output[0]['score']
        positive_score = output[1]['score']
    else:
        negative_score = output[1]['score']
        positive_score = output[0]['score']
    # Get predicted class (0=Negative, 1=Positive)
    assignment = np.argmax([negative_score, positive_score])
    y_pred.append(assignment)

100%|██████████| 1066/1066 [00:00<00:00, 117794.56it/s]


In [26]:
evaluate_performance(data['test']['label'], y_pred)

                 precision    recall  f1-score   support

Negative Review       0.89      0.90      0.90       533
Positive Review       0.90      0.89      0.90       533

       accuracy                           0.90      1066
      macro avg       0.90      0.90      0.90      1066
   weighted avg       0.90      0.90      0.90      1066



We can see that both the models are performing well, considering they aren't trained on the dataset. DistilBERT performs better because it was fine-tuned on the domain data.



---

## Text Classification Using Embedding Models
