# Final Masters Project

## Name: Sreekanth Palagiri, Student ID: R00184198

## Project Topic: Evaluation of Ensemble Approach for Sentiment Analysis on a Small Dataset

##NoteBook: Ensemble of Models


### **Mount google drive**

In [1]:
from google.colab import drive 
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [2]:
!pip install flair
!pip install sentencepiece
!pip install transformer

Collecting flair
[?25l  Downloading https://files.pythonhosted.org/packages/f0/3a/1b46a0220d6176b22bcb9336619d1731301bc2c75fa926a9ef953e6e4d58/flair-0.8.0.post1-py3-none-any.whl (284kB)
[K     |████████████████████████████████| 286kB 723kB/s 
[?25hCollecting sqlitedict>=1.6.0
  Downloading https://files.pythonhosted.org/packages/5c/2d/b1d99e9ad157dd7de9cd0d36a8a5876b13b55e4b75f7498bc96035fb4e96/sqlitedict-1.7.0.tar.gz
Collecting gdown==3.12.2
  Downloading https://files.pythonhosted.org/packages/50/21/92c3cfe56f5c0647145c4b0083d0733dd4890a057eb100a8eeddf949ffe9/gdown-3.12.2.tar.gz
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting torch<=1.7.1,>=1.5.0
[?25l  Downloading https://files.pythonhosted.org/packages/90/5d/095ddddc91c8a769a68c791c019c5793f9c4456a688ddd235d6670924ecb/torch-1.7.1-cp37-cp37m-manylinux1_x86_64.whl (776.8MB)
[K     |████████████████████████

### **Load Data and Preprocess**

In [3]:
import pandas as pd
import numpy as np

df=pd.read_csv("/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/sentimentpolarity.csv")
print(df.groupby(['label']).size())
df.head()

label
0    1000
1    1000
dtype: int64


Unnamed: 0,text,label
0,[ferrera] has the charisma of a young woman wh...,1
1,"both flawed and delayed , martin scorcese's ga...",1
2,"for his first attempt at film noir , spielberg...",1
3,easily one of the best and most exciting movie...,1
4,this director's cut -- which adds 51 minutes -...,0


**Preprocessor to Remove all special characters except emoticons**

In [4]:
import re

def preprocessor(text):
    text = re.sub('<[^>]*>', '', text)
    emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text)
    text = re.sub('[^A-Za-z0-9\']+', ' ', text.lower()) +\
        ' '.join(emoticons).replace('-', '')
    return text

#'[^A-Za-z0-9\']+'

print(df['text'][19])
print(preprocessor(df['text'][19]))

the only fun part of the movie is playing the obvious game . you try to guess the order in which the kids in the house will be gored . 
the only fun part of the movie is playing the obvious game you try to guess the order in which the kids in the house will be gored 


In [5]:
df['text'] = df['text'].apply(preprocessor)

### **Seperate Into Train and Test Sets**

In [6]:
df_train=df.iloc[0:int(len(df)*0.85)].reset_index(drop=True)
df_test=df.iloc[int(len(df)*0.85):].reset_index(drop=True)

In [7]:
from sklearn.model_selection import train_test_split

df_test, df_eval, sentiment_test, sentiment_eval = train_test_split(df_test['text'], df_test['label'], 
                                                                      random_state=1, test_size=.30, 
                                                                      shuffle=False)


print('Length of train set:',len(df_test),'Length of test set:',len(df_eval))


Length of train set: 210 Length of test set: 90


### **Load All Models and Predict to prepare for Emsemble Model**

****

In [8]:
def tokenizer(text):
  return [stemmer.stem(word) for word in text.split()]

**Logistic**

In [9]:
from joblib import load
from sklearn.feature_extraction.text import CountVectorizer

tfidf=load('/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/tfidf_logistic.joblib')
model_reg=load('/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/clf_logistic.joblib')

**LSTM Model**

In [11]:
import io
import json
from tensorflow import keras

with open('/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/tokenizer.json') as f:
    data = json.load(f)
    tokenizer = keras.preprocessing.text.tokenizer_from_json(data)

model_lstm=keras.models.load_model('/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/model_lstm.h5')

**Flair Model**



In [10]:
from flair.models import TextClassifier

model_flair=TextClassifier.load('/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/resources/taggers/trec/best-model.pt')

2021-05-03 16:48:38,376 loading file /content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/resources/taggers/trec/best-model.pt


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




**Bert Model**

In [12]:
import torch
from transformers import BertForSequenceClassification 

bertmodel = BertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                      num_labels=len(df.label.unique()),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

bertmodel.load_state_dict(torch.load('/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/BERT_ft_epoch8.model',map_location=torch.device('cpu')))

device= torch.device('cuda' if torch.cuda.is_available() else 'cpu')
bertmodel.to(device)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

**roBERTA Model**

In [13]:
import torch
from transformers import RobertaForSequenceClassification 

robertamodel = RobertaForSequenceClassification.from_pretrained("roberta-base",
                                                      num_labels=len(df.label.unique()),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

robertamodel.load_state_dict(torch.load('/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/roBERTa_ft_epoch8.model',map_location=torch.device('cpu')))

device= torch.device('cuda' if torch.cuda.is_available() else 'cpu')
robertamodel.to(device)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=481.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=501200538.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.weight', 'classifie

RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0): RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerN

**XLNet Model**

In [14]:
import torch
from transformers import XLNetForSequenceClassification 

xlnet = XLNetForSequenceClassification.from_pretrained('xlnet-base-cased',
                                                      num_labels=len(df.label.unique()),
                                                      output_attentions=False,
                                                      output_hidden_states=False)

xlnet.load_state_dict(torch.load('/content/gdrive/My Drive/Colab Notebooks/Masters Project/Sentence Polarity Dataset/Models/XLnet_ft_epoch3.model',map_location=torch.device('cpu')))

device= torch.device('cuda' if torch.cuda.is_available() else 'cpu')
xlnet.to(device)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=760.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=467042463.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForSequenceClassification: ['lm_loss.weight', 'lm_loss.bias']
- This IS expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLNetForSequenceClassification were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: ['sequence_summary.summary.weight', 'sequence_summary.summary.bias', 'logits_proj.weight', 'logits_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions a

XLNetForSequenceClassification(
  (transformer): XLNetModel(
    (word_embedding): Embedding(32000, 768)
    (layer): ModuleList(
      (0): XLNetLayer(
        (rel_attn): XLNetRelativeAttention(
          (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (ff): XLNetFeedForward(
          (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (layer_1): Linear(in_features=768, out_features=3072, bias=True)
          (layer_2): Linear(in_features=3072, out_features=768, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (1): XLNetLayer(
        (rel_attn): XLNetRelativeAttention(
          (layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (ff): XLNetFeedForward(
          (layer_norm): LayerNorm((768,), eps=1e

### **Getting Predictions on Test Data Set**

**Logistic Models**

In [15]:
from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()

def stemm(text):
  return ' '.join([stemmer.stem(word) for word in text.split()])

In [17]:
df_test_stem=df_test.apply(stemm)
df_eval_stem=df_eval.apply(stemm)

In [18]:
probas=[]
probas_eval=[]

probas.append(model_reg.predict_proba(tfidf.transform(df_test_stem)))
probas_eval.append(model_reg.predict_proba(tfidf.transform(df_eval_stem)))

In [19]:
predictions=[]
predictions_eval=[]

predictions.append(model_reg.predict(tfidf.transform(df_test_stem)))
predictions_eval.append(model_reg.predict(tfidf.transform(df_eval_stem)))

**LSTM Model**

In [20]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

max_seq_length= 500

test_sequences = tokenizer.texts_to_sequences(df_test)
test_sequences = pad_sequences(test_sequences,maxlen =max_seq_length)

lstm_pred=model_lstm.predict(test_sequences)
probas.append(lstm_pred)

In [21]:
max_seq_length= 500

eval_sequences = tokenizer.texts_to_sequences(df_eval)
eval_sequences = pad_sequences(eval_sequences,maxlen =max_seq_length)

lstm_pred_eval=model_lstm.predict(eval_sequences)
probas_eval.append(lstm_pred_eval)

In [22]:
preds = np.argmax(lstm_pred, axis=1).flatten()
predictions.append(preds)

preds= np.argmax(lstm_pred_eval, axis=1).flatten()
predictions_eval.append(preds)

**Flair Model**

In [23]:
from flair.data import Sentence

results=[]
for i in df_test.index:
    sentence=Sentence(df_test[i])
    model_flair.predict(sentence)
    if sentence.get_labels()[0].value=='Positive':
      score=1-sentence.get_labels()[0].score
    else:
      score=sentence.get_labels()[0].score
    results.append([score,1-score])
probas.append(np.array(results))

In [24]:
preds = np.argmax(np.array(results), axis=1).flatten()
predictions.append(preds)

In [25]:
results=[]
for i in df_eval.index:
    sentence=Sentence(df_eval[i])
    model_flair.predict(sentence)
    if sentence.get_labels()[0].value=='Positive':
      score=1-sentence.get_labels()[0].score
    else:
      score=sentence.get_labels()[0].score
    results.append([score,1-score])
probas_eval.append(np.array(results))

In [26]:
preds = np.argmax(np.array(results), axis=1).flatten()
predictions_eval.append(preds)

**Bert Model**

In [27]:
from transformers import BertTokenizer

tokenizerbert = BertTokenizer.from_pretrained(
                  'bert-base-uncased',
                  do_lower_case=True) 


encoded_data_test=tokenizerbert.batch_encode_plus(
                        df_test.values,              # Same we are doing for validation set.
                        add_special_tokens=True,
                        return_attention_mask=True,
                        padding='longest',
                        max_length=256,
                        truncation=True,
                        return_tensors='pt')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




In [28]:
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler

input_ids_test= encoded_data_test['input_ids']
attention_masks_test= encoded_data_test['attention_mask']

dataset_test= TensorDataset(input_ids_test, attention_masks_test,)

dataloader_test = DataLoader(
    dataset_test, 
    sampler=SequentialSampler(dataset_test), 
    batch_size=4
    )

In [29]:
encoded_data_eval=tokenizerbert.batch_encode_plus(
                        df_eval.values,              # Same we are doing for validation set.
                        add_special_tokens=True,
                        return_attention_mask=True,
                        padding='longest',
                        max_length=256,
                        truncation=True,
                        return_tensors='pt')

input_ids_eval= encoded_data_eval['input_ids']
attention_masks_eval= encoded_data_eval['attention_mask']

dataset_eval= TensorDataset(input_ids_eval, attention_masks_eval,)

dataloader_eval = DataLoader(
    dataset_eval, 
    sampler=SequentialSampler(dataset_eval), 
    batch_size=4
    )

In [30]:
import torch.nn.functional as F

def predict_bert(dataloader_test):
  
    bertmodel.eval()
    all_logits = []
    
    for batch in dataloader_test:
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {
            'input_ids':      batch[0],
            'attention_mask': batch[1],
            }

        with torch.no_grad():        
            outputs = bertmodel(**inputs)
            
        # since we have no loss, the only thing returned is logits
        logits = outputs[0]
        all_logits.append(logits)
    
    all_logits = torch.cat(all_logits, dim=0)
    preds_flat = np.argmax(all_logits.cpu().numpy(), axis=1).flatten()

    probs = F.softmax(all_logits, dim=1).cpu().numpy()

    # get highest prob dimension as prediction
    
    return preds_flat, probs



In [31]:
preds, probs=predict_bert(dataloader_test)
probas.append(probs) 
predictions.append(preds)

In [32]:
preds, probs=predict_bert(dataloader_eval)
probas_eval.append(probs) 
predictions_eval.append(preds)

**roBERTa Model**

In [33]:
from transformers import RobertaTokenizer

tokenizerroberta = RobertaTokenizer.from_pretrained(
                  'roberta-base') 


encoded_data_test_r=tokenizerroberta.batch_encode_plus(
                        df_test.values,              # Same we are doing for validation set.
                        add_special_tokens=True,
                        return_attention_mask=True,
                        padding='longest',
                        max_length=256,
                        truncation=True,
                        return_tensors='pt')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355863.0, style=ProgressStyle(descript…




In [34]:
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler

input_ids_test_r= encoded_data_test_r['input_ids']
attention_masks_test_r= encoded_data_test_r['attention_mask']

dataset_test_r= TensorDataset(input_ids_test_r, attention_masks_test_r,)

dataloader_test_r = DataLoader(
    dataset_test_r, 
    sampler=SequentialSampler(dataset_test_r), 
    batch_size=4
    )

In [35]:
encoded_data_eval_r=tokenizerroberta.batch_encode_plus(
                        df_eval.values,              # Same we are doing for validation set.
                        add_special_tokens=True,
                        return_attention_mask=True,
                        padding='longest',
                        max_length=256,
                        truncation=True,
                        return_tensors='pt')


input_ids_eval_r= encoded_data_eval_r['input_ids']
attention_masks_eval_r= encoded_data_eval_r['attention_mask']

dataset_eval_r= TensorDataset(input_ids_eval_r, attention_masks_eval_r,)

dataloader_eval_r = DataLoader(
    dataset_eval_r, 
    sampler=SequentialSampler(dataset_eval_r), 
    batch_size=4
    )

In [36]:
import torch.nn.functional as F

def predict_roberta(dataloader_test):
  
    robertamodel.eval()
    all_logits = []
    
    for batch in dataloader_test:
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {
            'input_ids':      batch[0],
            'attention_mask': batch[1],
            }

        with torch.no_grad():        
            outputs = robertamodel(**inputs)
            
        # since we have no loss, the only thing returned is logits
        logits = outputs[0]
        all_logits.append(logits)
    
    all_logits = torch.cat(all_logits, dim=0)
    preds_flat = np.argmax(all_logits.cpu().numpy(), axis=1).flatten()

    probs = F.softmax(all_logits, dim=1).cpu().numpy()

    # get highest prob dimension as prediction
    
    return preds_flat, probs

In [37]:
preds, probs=predict_roberta(dataloader_test_r)
probas.append(probs) 
predictions.append(preds)

In [38]:
preds, probs=predict_roberta(dataloader_eval_r )
probas_eval.append(probs)
predictions_eval.append(preds) 

**XLNet**

In [39]:
from transformers import XLNetTokenizer

tokenizerxlnet = XLNetTokenizer.from_pretrained(
                  'xlnet-base-cased') 


encoded_data_test_x=tokenizerxlnet.batch_encode_plus(
                        df_test.values,              # Same we are doing for validation set.
                        add_special_tokens=True,
                        return_attention_mask=True,
                        padding='longest',
                        max_length=256,
                        truncation=True,
                        return_tensors='pt')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=798011.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1382015.0, style=ProgressStyle(descript…




In [40]:
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler

input_ids_test_x= encoded_data_test_x['input_ids']
attention_masks_test_x= encoded_data_test_x['attention_mask']

dataset_test_x= TensorDataset(input_ids_test_x, attention_masks_test_x,)

dataloader_test_x = DataLoader(
    dataset_test_x, 
    sampler=SequentialSampler(dataset_test_x), 
    batch_size=4
    )

In [41]:
encoded_data_eval_x=tokenizerxlnet.batch_encode_plus(
                        df_eval.values,              # Same we are doing for validation set.
                        add_special_tokens=True,
                        return_attention_mask=True,
                        padding='longest',
                        max_length=256,
                        truncation=True,
                        return_tensors='pt')

In [42]:
input_ids_eval_x= encoded_data_eval_x['input_ids']
attention_masks_eval_x= encoded_data_eval_x['attention_mask']

dataset_eval_x= TensorDataset(input_ids_eval_x, attention_masks_eval_x,)

dataloader_eval_x = DataLoader(
    dataset_eval_x, 
    sampler=SequentialSampler(dataset_eval_x), 
    batch_size=4
    )

In [43]:
import torch.nn.functional as F

def predict_xlnet(dataloader_test):
  
    xlnet.eval()
    all_logits = []
    
    for batch in dataloader_test:
        
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {
            'input_ids':      batch[0],
            'attention_mask': batch[1],
            }

        with torch.no_grad():        
            outputs = xlnet(**inputs)
            
        # since we have no loss, the only thing returned is logits
        logits = outputs[0]
        all_logits.append(logits)
    
    all_logits = torch.cat(all_logits, dim=0)
    preds_flat = np.argmax(all_logits.cpu().numpy(), axis=1).flatten()

    probs = F.softmax(all_logits, dim=1).cpu().numpy()

    # get highest prob dimension as prediction
    
    return preds_flat, probs

In [45]:
preds, probs=predict_xlnet(dataloader_test_x)
probas.append(probs) 
predictions.append(preds)

In [46]:
preds, probs=predict_xlnet(dataloader_eval_x )
probas_eval.append(probs)
predictions_eval.append(preds) 

**Concatenate all Predictions to get one row for each record**

In [47]:
predictions=np.array(predictions)
predictions_eval=np.array(predictions_eval)

In [48]:
probas=np.array(probas)
probas_eval=np.array(probas_eval)

### **Method 1: Using Probabilities and Weighted Majority Voting**

### **Finding Weights of each Model**

Reference: https://machinelearningmastery.com/weighted-average-ensemble-for-deep-learning-neural-networks/

In [49]:
import random
from numpy.linalg import norm

weights = [random.uniform(0, 1)for _ in range(6)]
l1norm = norm(weights,1)
weights= weights / l1norm
print(weights)


[0.17947549 0.15838622 0.03858824 0.11475357 0.17061668 0.33817979]


**Calculate Accuracy with Initial Weights**

In [52]:
from sklearn.metrics import accuracy_score

weightedavg = np.average(probas, axis=0, weights=weights)
result = np.argmax(weightedavg, axis=1)
accuracy_score(result,sentiment_test)


0.9095238095238095

**Find Optimal Weights**

In [53]:
from numpy.linalg import norm

def loss_func(weights):
  l1norm = norm(weights,1)
  weights= weights / l1norm
  weightedavg = np.average(probas, axis=0, weights=weights)
  result = np.argmax(weightedavg, axis=1)
  return 1 - accuracy_score(result,sentiment_test)

In [54]:
from scipy.optimize import differential_evolution

bound_w = [(0.0, 1.0)  for _ in range(6)]
result = differential_evolution(loss_func, bound_w, maxiter=1000000, tol=1e-7,disp=True)

differential_evolution step 1: f(x)= 0.0761905
differential_evolution step 2: f(x)= 0.0761905
differential_evolution step 3: f(x)= 0.0761905
differential_evolution step 4: f(x)= 0.0761905
differential_evolution step 5: f(x)= 0.0761905
differential_evolution step 6: f(x)= 0.0761905
differential_evolution step 7: f(x)= 0.0761905
differential_evolution step 8: f(x)= 0.0761905
differential_evolution step 9: f(x)= 0.0761905
differential_evolution step 10: f(x)= 0.0761905
differential_evolution step 11: f(x)= 0.0761905
differential_evolution step 12: f(x)= 0.0761905
differential_evolution step 13: f(x)= 0.0761905
differential_evolution step 14: f(x)= 0.0761905
differential_evolution step 15: f(x)= 0.0761905
differential_evolution step 16: f(x)= 0.0761905
differential_evolution step 17: f(x)= 0.0761905
differential_evolution step 18: f(x)= 0.0761905
differential_evolution step 19: f(x)= 0.0761905
differential_evolution step 20: f(x)= 0.0761905
differential_evolution step 21: f(x)= 0.0761905
d

In [55]:
weights=result['x']
l1norm = norm(weights,1)
final_weights= weights / l1norm
print(final_weights)

[0.2205789  0.20746041 0.16512409 0.08558366 0.0434653  0.27778764]


In [56]:
weightedavg = np.average(probas, axis=0, weights=final_weights)
result = np.argmax(weightedavg, axis=1)
print(accuracy_score(result,sentiment_test))

0.9238095238095239


### **Test on Evaluation Set**

In [57]:
weightedavg_eval = np.average(probas_eval, axis=0, weights=weights)
result_eval = np.argmax(weightedavg_eval, axis=1)
print(accuracy_score(result_eval,sentiment_eval))

0.8


### **Method 2 - Using Predictions Directly for Majority Vote Method**

In [58]:
predictions_t=predictions.T
predictions_eval_t=predictions_eval.T

In [59]:
final_preds=np.array([np.argmax(np.bincount(predictions_t[i],weights=[1,1,1,1,1,1])) for i in range(predictions_t.shape[0])])
accuracy_score(final_preds,sentiment_test)

0.9047619047619048

In [60]:
final_preds_eval=np.array([np.argmax(np.bincount(predictions_eval_t[i],weights=[1,1,1,1,1,1])) for i in range(predictions_eval_t.shape[0])])
accuracy_score(final_preds_eval,sentiment_eval)

0.8555555555555555