This file will concern itself with the evaluation process for both the few-shot and zero-shot text classifiers explained in this project.

A **custom** evaluation dataset was built to be adapted to our custom labels.

In [1]:
##### Loading the test set from the corresponding excel file
import pandas as pd
import random

df = pd.read_excel('../data/textcat_evaluation.xlsx')
evaluation_set = [tuple(row) for row in df.to_records(index = False)]
all_labels = df['True Label'].unique().tolist()
df

Unnamed: 0,Sentence,True Label
0,Artificial intelligence is transforming variou...,Artificial Intelligence
1,Machine learning is a subset of AI that focuse...,Artificial Intelligence
2,Will artificial intelligence eventually surpas...,Artificial Intelligence
3,The Turing Test is a measure of a machine's ab...,Artificial Intelligence
4,"AI-powered virtual assistants, such as Siri an...",Artificial Intelligence
...,...,...
795,Bullying in schools involves repeated acts of ...,Violence
796,Healthcare worker violence involves physical o...,Violence
797,Human rights violations in healthcare settings...,Violence
798,Drug-related violence occurs within the contex...,Violence


In [2]:
random.seed(596)
random.shuffle(evaluation_set)
eval_length = len(evaluation_set)
all_sentences = [item[0] for item in evaluation_set]

##### Zero Shot Classification Model Evaluation

The first model to be evaluated will be the custom-trained spaCy one. We begin by loading the model.

In [3]:
from torch import cuda, device
from transformers import pipeline

print('Cuda Device Found? ', cuda.is_available())
my_device = device('cuda' if cuda.is_available() else 'cpu')

if cuda.is_available() == True:
    print('Type of Cuda Device:', cuda.get_device_name(my_device))

# model_name = "facebook/bart-large-mnli"
model_name = "facebook/bart-base"
zero_shot_classifier = pipeline("zero-shot-classification", model = model_name, device = 0, framework = "pt")
print("'{}' model successfully loaded".format(model_name))

Cuda Device Found?  True
Type of Cuda Device: NVIDIA GeForce GTX 1650 Ti


Some weights of BartForSequenceClassification were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['classification_head.out_proj.bias', 'classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Failed to determine 'entailment' label id from the label2id mapping in the model config. Setting to -1. Define a descriptive label2id mapping in the model config to ensure correct outputs.


'facebook/bart-base' model successfully loaded


In [4]:
import time

# print('Topic Classification Evaluation Started...')
start = time.time()
all_scores = zero_shot_classifier(
    sequences = all_sentences,
    candidate_labels = all_labels,
    multi_label = True,
    src_lang="en",
)
end = time.time() - start

##### Scoring Function
Computing the total true counts and accuracy of the zero shot classification model.

In [16]:
correct_count = 0
for i, score in enumerate(all_scores):
    # print("For Sentence: {}".format(score['sequence']))
    high_probas = [proba for proba in score['scores'] if proba >= 0.5]
    assigned_labels = []
    for j in range(len(high_probas)):
        assigned_labels.append(score['labels'][j])
    
    # print('True: {} | Predictions: {}'.format(evaluation_set[i][1], assigned_labels))
    # print("Prediction - It's About: {}".format(assigned_labels))


    if evaluation_set[i][1] in assigned_labels:
        correct_count += 1

print("Topic Classification | Evaluation Finished | Elapsed Time: {:.4f}s | Time per Sentence: {:.4f}s".format(end, end / eval_length))
print("Accuracy: {:1.2f}%".format((correct_count / eval_length) * 100))

Topic Classification | Evaluation Finished | Elapsed Time: 603.4790s | Time per Sentence: 0.7543s
Accuracy: 92.12%
