## Classification benchmarks: `ZSL`, `FSL`, `Finetuned BERT`, `Finetuned GPT`, `Prompted GPT`  

GPT models can also be used for classification tasks. Here I'm putting together a classification benchmark by comparing a ZSL, an FSL, a finetuned BERT and finetuned / prompted GPT models to see how they perform. I'll try to leverage transfer learning only, meaning I won't be training models (except for GPT SFT)

OpenAI docs: 
- https://platform.openai.com/docs/guides/fine-tuning
- https://platform.openai.com/docs/guides/fine-tuning/advanced-usage
- https://github.com/openai/openai-cookbook/blob/main/examples/Fine-tuned_classification.ipynb

In [7]:
import credentials
import json
import os
os.environ["OPENAI_API_KEY"] = credentials.openai_api

import openai

from transformers import pipeline, AutoTokenizer
from datasets import load_dataset

from sklearn.metrics import classification_report, confusion_matrix

import pandas as pd
import warnings
warnings.filterwarnings('ignore')

Load Sentiment dataset from HuggingFace

In [2]:
data = load_dataset('amazon_reviews_multi', 'en', split = 'validation',)

Found cached dataset amazon_reviews_multi (C:/Users/Rabay_Kristof/.cache/huggingface/datasets/amazon_reviews_multi/en/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609)


In [3]:
data = pd.DataFrame(data)

data['review'] = data.apply(lambda x: x['review_title'] + '. ' + x['review_body'], axis = 1)

data = data[data['stars'] != 3]
data['sentiment'] = data['stars'].apply(lambda x: 'positive' if x >= 4 else 'negative')

data.drop(labels = ['review_id', 'product_id', 'reviewer_id', 'language', 'review_title', 'review_body', 'stars', 'product_category'], axis = 1, inplace = True)
data = data.sample(frac = 0.125, random_state=43)
data.reset_index(drop = True, inplace = True)

print(data.shape)
data.head(3)

(500, 2)


Unnamed: 0,review,sentiment
0,"Needed cupcake rings, ended up with breast mil...",negative
1,One Star. This is the band I received.,negative
2,Good washer. Great product especially if you l...,positive


### 1. Zero-Shot-Classifier

Model: `MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary`

In [8]:
model_name = "MoritzLaurer/DeBERTa-v3-xsmall-mnli-fever-anli-ling-binary"
tokenizer = AutoTokenizer.from_pretrained(model_name)

classifier = pipeline("zero-shot-classification", model=model_name, tokenizer=tokenizer, use_fast=False)

In [11]:
candidate_labels = ['positive', 'negative']
sequence_to_classify = data['review'].tolist()

In [24]:
%%time
ZSL_output = classifier(sequence_to_classify, candidate_labels, multi_label=False)

CPU times: total: 6min 15s
Wall time: 1min 38s


In [26]:
ZSL_output[0]

{'sequence': 'Needed cupcake rings, ended up with breast milk steam bags- very unhappy. If I could give this 0 stars I would. It’s not at all what we ordered. Needed these cupcake rings for my daughters birthday party tomorrow and instead I am left with breast pump and breast milk accessory micro steam bags. Wtf.',
 'labels': ['negative', 'positive'],
 'scores': [0.9748134016990662, 0.02518662065267563]}

In [27]:
for i in ZSL_output:
    i['labels'] = i['labels'][0]
    i['scores'] = i['scores'][0]

In [34]:
ZSL_output = pd.DataFrame(ZSL_output)

ZSL_results = data.merge(ZSL_output, left_on = 'review', right_on = 'sequence').drop(labels = ['sequence'], axis = 1)
ZSL_results.head(3)

Unnamed: 0,review,sentiment,labels,scores
0,"Needed cupcake rings, ended up with breast mil...",negative,negative,0.974813
1,One Star. This is the band I received.,negative,positive,0.885268
2,Good washer. Great product especially if you l...,positive,positive,0.738286


In [37]:
print(classification_report(ZSL_results['sentiment'], ZSL_results['labels']))

              precision    recall  f1-score   support

    negative       0.92      0.87      0.89       253
    positive       0.87      0.92      0.89       247

    accuracy                           0.89       500
   macro avg       0.89      0.89      0.89       500
weighted avg       0.89      0.89      0.89       500



In [50]:
pd.DataFrame(confusion_matrix(ZSL_results['sentiment'], ZSL_results['labels'], labels = ['negative', 'positive']),
             columns = ['Pred - Neg', 'Pred - Pos'], 
             index=['True - Neg', 'True - Pos'])

Unnamed: 0,Pred - Neg,Pred - Pos
True - Neg,219,34
True - Pos,20,227
