In [1]:
from datasets import load_dataset

from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset

from sentence_transformers import SentenceTransformer

from sklearn.metrics import classification_report
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics.pairwise import cosine_similarity

import numpy as np

In this notebook, we will use the Rotten Tomatoes dataset, for classifying comments about movies in positive or negative reviews.

In [2]:
dataset = load_dataset('rotten_tomatoes')

# Representation models

Here, we will use a representation model - based on BERT - which was pre-trained on Twitter comments. The model will not be fine-tuned, as of now.

In [3]:
model_path = "cardiffnlp/twitter-roberta-base-sentiment-latest"

model = pipeline(
    task="sentiment-analysis",
    model=model_path,
    tokenizer=model_path,
    device="cuda:0"
)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


Generating the predictions.

In [4]:
label_map = {
    "negative": 0,
    "neutral": 1,
    "positive": 1
}

y_pred = []

for output in model(KeyDataset(dataset["test"], "text")):
    label_text = output["label"]
    y_pred.append(label_map[label_text])

Printing the classification report.

In [5]:
performance_roberta = classification_report(
    dataset['test']['label'], y_pred,
    target_names=["Negative Review", "Positive Review"]

)

print(performance_roberta)

                 precision    recall  f1-score   support

Negative Review       0.81      0.69      0.75       533
Positive Review       0.73      0.84      0.78       533

       accuracy                           0.77      1066
      macro avg       0.77      0.77      0.77      1066
   weighted avg       0.77      0.77      0.77      1066



# Embedding models

The next approach will utilyze embedding models to generate vector representations for each comment. Then, a lightweight classifier will be trained on top of those representations.

In [6]:
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")

train_embeddings = model.encode(dataset["train"]["text"])
test_embeddings = model.encode(dataset["test"]["text"])

Training a KNN and printing the performance.

In [7]:
clf = KNeighborsClassifier()
clf.fit(train_embeddings, dataset["train"]["label"])

performance_mpnet = classification_report(
    dataset['test']['label'], clf.predict(test_embeddings),
    target_names=["Negative Review", "Positive Review"]
)

print(performance_mpnet)

                 precision    recall  f1-score   support

Negative Review       0.85      0.75      0.79       533
Positive Review       0.77      0.86      0.82       533

       accuracy                           0.81      1066
      macro avg       0.81      0.81      0.81      1066
   weighted avg       0.81      0.81      0.81      1066



## Zero-shot classification

In some situations, there are no labels available to each data point. But one may want to classify the documents based on some list of groups. For that, one can embed the documents and a text that represents each label, and then classify them based on cossine similarity.

In [8]:
label_embeddings1 = model.encode(["A negative review", "A positive review"])
label_embeddings2 = model.encode(["A very negative movie review", "A very positive movie review"])

sim_matrix1 = cosine_similarity(test_embeddings, label_embeddings1)
y_pred1 = np.argmax(sim_matrix1, axis=1)

sim_matrix2 = cosine_similarity(test_embeddings, label_embeddings2)
y_pred2 = np.argmax(sim_matrix2, axis=1)

performance_zero_shot1 = classification_report(
    dataset['test']['label'], y_pred1,
    target_names=["Negative Review", "Positive Review"]
)

performance_zero_shot2 = classification_report(
    dataset['test']['label'], y_pred2,
    target_names=["Negative Review", "Positive Review"]
)

print(performance_zero_shot1)
print()
print(performance_zero_shot2)

                 precision    recall  f1-score   support

Negative Review       0.78      0.77      0.78       533
Positive Review       0.77      0.79      0.78       533

       accuracy                           0.78      1066
      macro avg       0.78      0.78      0.78      1066
   weighted avg       0.78      0.78      0.78      1066


                 precision    recall  f1-score   support

Negative Review       0.86      0.73      0.79       533
Positive Review       0.76      0.88      0.82       533

       accuracy                           0.80      1066
      macro avg       0.81      0.80      0.80      1066
   weighted avg       0.81      0.80      0.80      1066



# Generative models

Finally, we will use a generative model to classify the reviews. For that, we need to create a prompt such that the text generated by the model can be further converted into a proper label.

In [9]:
model = pipe = pipeline(
    task="text2text-generation",
    model="google/flan-t5-small",
    device="cuda:0"
)

Device set to use cuda:0


Creating the prompt and predicting the data.

In [10]:
prompt = "Is the following sentence positive or negative?"
dataset = dataset.map(lambda example: {"t5": prompt + example['text']})

y_pred = []
for output in model(KeyDataset(dataset["test"], "t5")):
    text = output[0]["generated_text"]
    y_pred.append(0 if text == "negative" else 1)

Printing the performance.

In [11]:
performance_flan = classification_report(
    dataset['test']['label'], y_pred,
    target_names=["Negative Review", "Positive Review"]
)

print(performance_flan)

                 precision    recall  f1-score   support

Negative Review       0.83      0.84      0.83       533
Positive Review       0.84      0.83      0.83       533

       accuracy                           0.83      1066
      macro avg       0.83      0.83      0.83      1066
   weighted avg       0.83      0.83      0.83      1066

