# Text Data Explanation Benchmarking: Multi-class Emotion Classification

This notebook demonstrates how to use the benchmark utility to benchmark the performance of partition explainer on text data. In this demo, we showcase partition explainer performance on text prediction model on emotion dataset provided by hugging face and the Emo-MobileBERT (https://huggingface.co/lordtt13/emo-mobilebert). There are a total of four emotions that the model can predict: happy, sad, angry and others. The metrics used to evaluate are "keep positive" and "keep negative". The masker used is Text Masker. The performance on each output emotion s well as the mean output are shown. 

In [1]:
import os
import copy
import shutil
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import shap
import scipy as sp
import nlp
import torch
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', None)
pd.set_option("max_rows", None)

### Load Data

In [2]:
train, test = nlp.load_dataset("emo", split = ["train", "test"])

In [3]:
id2label = {0: 'others', 1: 'happy', 2: 'sad', 3: 'angry'}
labels=list(id2label.values())
label2id = {}
for i,label in enumerate(labels):
    label2id[label]=i

In [4]:
data={'text':[],
     'emotion':[]}
for val in train:
    if id2label[val['label']]!='others':
        data['text'].append(val['text'])
        data['emotion'].append(id2label[val['label']])
        
data = pd.DataFrame(data)

### Load Model and Tokenizer

In [5]:
tokenizer = AutoTokenizer.from_pretrained("lordtt13/emo-mobilebert",use_fast=True)
model = AutoModelForSequenceClassification.from_pretrained("lordtt13/emo-mobilebert")