# **Name : Prisha Sawhney**
# **Roll Number : 102116052**
# **Group: 3CS10**

### **Task at Hand**
*We need to find which, among a given set of pre-trained text-classification models, has the best performance based on different evaluation metrics.*
*For this, we will use the method of TOPSIS - **T**echnique for **O**rder of **P**reference by **S**imilarity to **I**deal **S**olution*

# Step 1: Importing Libraries

In [31]:
from transformers import pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, hamming_loss, cohen_kappa_score, log_loss
import pandas as pd
import numpy as np


# Step 2: Importing Huggingface Models
Here, 4 models based on text-classification are being imported:
1. distilbert/distilbert-base-uncased-finetuned-sst-2-english
2. lxyuan/distilbert-base-multilingual-cased-sentiments-student
3. cardiffnlp/twitter-roberta-base-sentiment-latest
4. siebert/sentiment-roberta-large-english

In [32]:
model_names = [
    "distilbert/distilbert-base-uncased-finetuned-sst-2-english",
    "lxyuan/distilbert-base-multilingual-cased-sentiments-student",
    "cardiffnlp/twitter-roberta-base-sentiment-latest",
    "siebert/sentiment-roberta-large-english"
]

models = []

for model_name in model_names:
    model = pipeline(model=model_name)
    models.append(model)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


# Step 3: Importing the Dataset
*We have created a sample dataset for the three different genres of the world, namely Education, Sports, Politics and Finance, in order to test the model for different metrics*

In [33]:
education_df = pd.read_csv("Education.csv")
sports_df = pd.read_csv("Sports.csv")
politics_df = pd.read_csv("Politics.csv")
finance_df=pd.read_csv("Finance.csv")

In [34]:
education_df.head()

Unnamed: 0,Text,Label
0,The impact of educational reforms remains unce...,positive
1,Critics argue that recent improvements in the ...,negative
2,Innovative teaching methods have led to unexpe...,positive
3,"Despite budget constraints, the school has man...",positive
4,The true effectiveness of online learning plat...,negative


In [35]:
sports_df.head()

Unnamed: 0,Text,Label
0,The team's recent victories have raised suspic...,positive
1,"Despite their recent loss, the team's morale r...",positive
2,Rumors of match-fixing have cast a shadow over...,negative
3,The unexpected resignation of the coach has le...,negative
4,Speculations about doping allegations have led...,negative


In [36]:
politics_df.head()

Unnamed: 0,Text,Label
0,The government's recent policies have received...,positive
1,Political analysts are divided on the long-ter...,negative
2,Efforts to promote unity among political facti...,positive
3,"Despite allegations of corruption, the governm...",negative
4,The recent diplomatic initiatives have been me...,positive


In [37]:
finance_df.head()

Unnamed: 0,Text,Label
0,The financial markets are influenced by a myri...,positive
1,Financial literacy is essential for making inf...,positive
2,"The stock market can be volatile, with prices ...",positive
3,Financial regulations aim to protect investors...,positive
4,Access to credit and capital is essential for ...,positive


# Step 4: Extracting the test Labels
*We need to extract the testing labels and convert them to 0's and 1's instead of "Positive" and "Negative"*

In [38]:
df = [education_df,sports_df,politics_df,finance_df]

# Step 5: Creating Metric Dataframes

*Initializing empty dataframes to store the metrics obtained from the different models*

In [39]:
education_result=pd.DataFrame(columns=['Model','Accuracy','Precision','Recall','F1_Score','Hamming_Loss', "Cohen\'s_Kappa", "Log_Loss"], index=[0,1,2,3])
sports_result=pd.DataFrame(columns=['Model','Accuracy','Precision','Recall','F1_Score','Hamming_Loss', "Cohen\'s_Kappa", "Log_Loss"], index=[0,1,2,3])
politics_result=pd.DataFrame(columns=['Model','Accuracy','Precision','Recall','F1_Score','Hamming_Loss', "Cohen\'s_Kappa", "Log_Loss"], index=[0,1,2,3])
finance_result=pd.DataFrame(columns=['Model','Accuracy','Precision','Recall','F1_Score','Hamming_Loss', "Cohen\'s_Kappa", "Log_Loss"], index=[0,1,2,3])


# Step 6: Testing the imported models on the created datasets

*Accessing different dataframes and finding out the desired metrics for all the models specified above to store in their respective resultant dataframes*

In [40]:
for idx,genre in enumerate(df, start=1):
    
    for i in range(len(models)):
        
        model_pred=[]

        # Predicted labels
        for str in genre['Text']:
            model_pred.append(1 if models[i](str)[0]['label'].lower()=="positive" else 0)

        # True Labels
        model_actual = genre['Label'].apply(lambda x: {True:1, False:0}[x=="positive"])

        # Metrics
        accuracy = accuracy_score(model_actual, model_pred)
        precision = precision_score(model_actual, model_pred)
        recall = recall_score(model_actual, model_pred)
        f1 = f1_score(model_actual, model_pred)
        hamming = hamming_loss(model_actual, model_pred)
        kappa = cohen_kappa_score(model_actual, model_pred)
        ll = log_loss(model_actual, model_pred)

        if idx==1:
            # Education
            education_result.loc[i] = [f"Model {i+1}", accuracy, precision, recall, f1, hamming, kappa, ll]
        elif idx==2:
            # Sports
            sports_result.loc[i] = [f"Model {i+1}", accuracy, precision, recall, f1, hamming, kappa, ll]
        elif idx==3:
            # Politics
            politics_result.loc[i] = [f"Model {i+1}", accuracy, precision, recall, f1, hamming, kappa, ll]
        else:
            # Finance
            finance_result.loc[i] = [f"Model {i+1}", accuracy, precision, recall, f1, hamming, kappa, ll]

In [41]:
education_result

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Cohen's_Kappa,Log_Loss
0,Model 1,0.576923,0.611111,0.423077,0.5,0.423077,0.153846,15.249238
1,Model 2,0.634615,0.6,0.807692,0.688525,0.365385,0.269231,13.169796
2,Model 3,0.634615,0.888889,0.307692,0.457143,0.365385,0.269231,13.169796
3,Model 4,0.673077,0.714286,0.576923,0.638298,0.326923,0.346154,11.783502


In [42]:
sports_result

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Cohen's_Kappa,Log_Loss
0,Model 1,0.839286,0.827586,0.857143,0.842105,0.160714,0.678571,5.79273
1,Model 2,0.803571,0.742857,0.928571,0.825397,0.196429,0.607143,7.080003
2,Model 3,0.892857,1.0,0.785714,0.88,0.107143,0.785714,3.86182
3,Model 4,0.910714,0.896552,0.928571,0.912281,0.089286,0.821429,3.218183


In [43]:
politics_result

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Cohen's_Kappa,Log_Loss
0,Model 1,0.849057,0.947368,0.72,0.818182,0.150943,0.693198,5.440551
1,Model 2,0.754717,0.772727,0.68,0.723404,0.245283,0.504673,8.840896
2,Model 3,0.54717,1.0,0.04,0.076923,0.45283,0.042169,16.321654
3,Model 4,0.924528,0.92,0.92,0.92,0.075472,0.848571,2.720276


In [44]:
finance_result

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Cohen's_Kappa,Log_Loss
0,Model 1,0.8125,0.903226,0.823529,0.861538,0.1875,0.573123,6.758185
1,Model 2,0.770833,0.870968,0.794118,0.830769,0.229167,0.478261,8.260004
2,Model 3,0.416667,1.0,0.176471,0.3,0.583333,0.111111,21.025464
3,Model 4,0.895833,0.914286,0.941176,0.927536,0.104167,0.742489,3.754547


# Step 7: Calculating the Performace of model using TOPSIS


### TOPSIS Code

*Function for Normalizing the input dataframe*

In [45]:
def normalize(df):
    divisor = df.apply(lambda x: x**2).apply(sum).apply(lambda x: x**0.5)
    df = df.div(divisor)
    return df

*Function for Weighted Normalization*

In [46]:
def weight_normalized(df, weights):
    df = df.mul(weights)
    return df

*Function for finding out the best and worst ideal outputs from the given dataframes*

In [47]:
def best_worst(df, impacts):
    best=[]
    worst=[]
    for i in range(len(impacts)):
        if impacts[i]=='+':
            best.append(max(df.iloc[:,i]))
            worst.append(min(df.iloc[:,i]))
        else:
            best.append(min(df.iloc[:,i]))
            worst.append(max(df.iloc[:,i]))
    return (best,worst)

*Function for calculating the performance by finding out the TOPSIS Score*

In [48]:
def calc_performance(df, best, worst):
    s_best=[]
    s_worst=[]
    for i in range(len(df)):
        s_best.append((sum((df.loc[i] - best)**2))**0.5)
        s_worst.append((sum((df.loc[i] - worst)**2))**0.5)
    s_total = [i+j for i,j in zip(s_worst,s_best)]
    performance = [i/j for i,j in zip(s_worst,s_total)]
    df.loc[:,'Topsis Score'] = performance

*Function for ranking the Models based on the TOPSIS Scores*

In [49]:
def rank(df):
    sorted_array = df.loc[:,'Topsis Score'].argsort()
    ranks = np.empty_like(sorted_array)
    ranks[sorted_array] = np.arange(len(sorted_array))
    n=len(sorted_array)
    ranks = [n-i for i in ranks]
    df.loc[:,'Rank'] = ranks

*Final Function for TOPSIS which calls all the above defined functions*

In [50]:
def topsis(input, weights, impacts):
    df=input.iloc[:,1:]

    df = normalize(df)
    df = weight_normalized(df,weights)

    (best,worst) = best_worst(df,impacts)
    calc_performance(df,best,worst)
    rank(df)
    return df

*Initializing the weights and impacts accordingly and calling the TOPSIS function for all the four genres, namely- Education, Sports, Politics and Finance*

In [51]:
weights=[1,1,1,1,1,1,1]
impacts = ["+", "+", "+", "+","-", "+", "-"]
result1 = topsis(education_result, weights, impacts)
result2 = topsis(sports_result, weights, impacts)
result3 = topsis(politics_result, weights, impacts)
result4 = topsis(finance_result, weights, impacts)

*Inserting the initial column of models*

In [52]:
result1.insert(0,"Model",education_result['Model'])
result2.insert(0,"Model",sports_result['Model'])
result3.insert(0,"Model",politics_result['Model'])
result4.insert(0,"Model",finance_result['Model'])

# Step 8: Analyzing the Outputs
*Different domains are tested for Topsis Score with the following results*

## 1. Education

In [53]:
result1

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Cohen's_Kappa,Log_Loss,Topsis Score,Rank
0,Model 1,0.457336,0.428517,0.377075,0.431859,0.568987,0.286446,0.568987,0.156381,4
1,Model 2,0.50307,0.420725,0.719871,0.59469,0.491398,0.50128,0.491398,0.677306,1
2,Model 3,0.50307,0.623297,0.274236,0.394842,0.491398,0.50128,0.491398,0.381939,3
3,Model 4,0.533559,0.500864,0.514193,0.551309,0.439672,0.644503,0.439672,0.675294,2


*In this domain, we can clearly observe that Model 2 has the final rank 1 based on the highest Topsis Score of 0.677. Hence, the best model in this domain is **Model2 - lxyuan/distilbert-base-multilingual-cased-sentiments-student***

## 2. Sports

In [54]:
result2

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Cohen's_Kappa,Log_Loss,Topsis Score,Rank
0,Model 1,0.486453,0.474611,0.488678,0.486427,0.554964,0.465916,0.554964,0.335874,3
1,Model 2,0.465753,0.42602,0.529401,0.476775,0.678289,0.416872,0.678289,0.125254,4
2,Model 3,0.517503,0.573488,0.447955,0.508316,0.369976,0.539481,0.369976,0.795196,2
3,Model 4,0.527853,0.514162,0.529401,0.526962,0.308313,0.564003,0.308313,0.904564,1


*In this domain, we can clearly observe that Model 4 has the final rank 1 based on the highest Topsis Score of 0.904. Hence, the best model in this domain is **Model4 - siebert/sentiment-roberta-large-english***

## 3. Politics

In [55]:
result3

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Cohen's_Kappa,Log_Loss,Topsis Score,Rank
0,Model 1,0.543036,0.518298,0.532414,0.572134,0.278524,0.574271,0.278524,0.80623,2
1,Model 2,0.482699,0.422753,0.502835,0.505858,0.452602,0.41809,0.452602,0.612326,3
2,Model 3,0.349957,0.547093,0.029579,0.05379,0.835573,0.034934,0.835573,0.076622,4
3,Model 4,0.591306,0.503325,0.680307,0.643333,0.139262,0.702988,0.139262,0.97166,1


*In this domain, we can clearly observe that Model 4 has the final rank 1 based on the highest Topsis Score of 0.971. Hence, the best model in this domain is **Model4 - siebert/sentiment-roberta-large-english***

## 4. Finance

In [56]:
result4

Unnamed: 0,Model,Accuracy,Precision,Recall,F1_Score,Hamming_Loss,Cohen's_Kappa,Log_Loss,Topsis Score,Rank
0,Model 1,0.544033,0.489101,0.551999,0.55813,0.283052,0.541347,0.283052,0.810146,2
1,Model 2,0.516134,0.471633,0.532285,0.538197,0.345953,0.451745,0.345953,0.71867,3
2,Model 3,0.278991,0.541505,0.118285,0.194349,0.880608,0.104951,0.880608,0.04785,4
3,Model 4,0.599831,0.49509,0.630856,0.600885,0.157251,0.701324,0.157251,0.9677,1


*In this domain, we can clearly observe that Model 4 has the final rank 1 based on the highest Topsis Score of 0.967. Hence, the best model in this domain is **Model4 - siebert/sentiment-roberta-large-english***


*Hence, we have the following result*

| Domain | Best Model | Model Name |
|-----------------|-----------------|-----------------|
| Education    | Model 2    | lxyuan/distilbert-base-multilingual-cased-sentiments-student    |
| Sports    | Model 4    | siebert/sentiment-roberta-large-english    |
| Politics    | Model 4    | siebert/sentiment-roberta-large-english    |
| Finance    | Model 4    | siebert/sentiment-roberta-large-english    |
