<a href="https://colab.research.google.com/github/Confirmation-Bias-Analyser/Confirmation-Bias-Model/blob/main/Subjectivity_Model_Deployment_for_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!pip install transformers
!pip install anytree

# Import essential libraries

In [2]:
from transformers import BertTokenizer, TFBertForSequenceClassification, InputExample, InputFeatures
import tensorflow as tf
import pandas as pd
import re
from google.colab import files, drive
drive.mount('/content/drive')

# The shutil module offers a number of high-level 
# operations on files and collections of files.
import os
import shutil

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Declare relevant function

In [3]:
def cleanComments(comments_array):
    sentences = []

    for i in comments_array:
        sequence = i.replace('\n', ' ') # Remove new line characters
        sequence = sequence.replace('\.', '')
        sequence = sequence.replace('.', '')
        sequence = sequence.replace(",", " ")
        sequence = sequence.replace("'", " ")
        sequence = sequence.replace('\\', '')
        sequence = sequence.replace('\'s', '')
        sequence = sequence.replace('&gt;', '') # Remove ampersand
        sequence = re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", sequence) # Remove the user name
        sentences.append(sequence)

    return sentences

# Load Model

In [4]:
saved_path = '/content/drive/MyDrive/Final Year Project/Key Notebooks/Confirmation Bias Analyser/'

tokenizer = BertTokenizer.from_pretrained(saved_path + 'subjectivity_tokenizer')
model = TFBertForSequenceClassification.from_pretrained(saved_path + 'saved_subjectivity_model')

model.summary()

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at /content/drive/MyDrive/Final Year Project/Key Notebooks/Confirmation Bias Analyser/saved_subjectivity_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bert (TFBertMainLayer)      multiple                  109482240 
                                                                 
 dropout_37 (Dropout)        multiple                  0         
                                                                 
 classifier (Dense)          multiple                  1538      
                                                                 
Total params: 109,483,778
Trainable params: 109,483,778
Non-trainable params: 0
_________________________________________________________________


# Predict Sequences

In [5]:
test_df = pd.read_csv(saved_path + 'reddit_data.csv')
pred_sentences = cleanComments(test_df['comment'])

In [6]:
tf_batch = tokenizer(pred_sentences, max_length=128, padding=True, truncation=True, return_tensors='tf')
tf_outputs = model(tf_batch)
tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)

labels = [0,1]
label = tf.argmax(tf_predictions, axis=1)
label = label.numpy()

count_pos = 0
count_neg = 0
result = []

for i in range(len(pred_sentences)):
  result.append(float(tf_predictions[i,1]))

  if labels[label[i]] == 1:
    count_pos += 1
  else:
    count_neg += 1

# Understand the subjectivity and objectivity of sequences

In [7]:
print('Objective:', count_pos)
print('Subjective:', count_neg)

print('Total:', count_pos + count_neg)

Objective: 97
Subjective: 84
Total: 181


In [8]:
test_df['sentiment'] = result
test_df.to_csv('conversation_sentiment.csv')
test_df

Unnamed: 0,user_name,id,timestamp,reply_to,comment,url,link_title,sentiment
0,MapleViolet,hpr2kav,2021-12-24 08:55:24,rmqevj,All I know is - anyone trying to pull a fast o...,,,0.998633
1,HaddockFillet,hra9zzo,2022-01-05 08:14:35,hpr2kav,Why does she think it is OK to lie about such ...,,,0.000084
2,applescript16,hpntm2t,2021-12-23 16:24:16,rmqevj,Here’s some perspective: \n\n1) The public nat...,,,0.829385
3,iluj13,hpnwekg,2021-12-23 17:01:54,hpntm2t,Well said. It’s only a problem if your party i...,,,0.700985
4,forzenrose,hpnzb4r,2021-12-23 17:41:51,hpnwekg,&gt;Transparency and finding out the truth is ...,,,0.000424
...,...,...,...,...,...,...,...,...
176,neekchan,hpr9xwq,2021-12-24 09:55:08,rmqevj,I agree. \n\nAnd as a swing voter the PAP drag...,,,0.999677
177,[deleted],hpobx5f,2021-12-23 20:25:53,rmqevj,This dum bish needs to learn from the pap fuck...,,,0.987460
178,A-Chicken,hpr73zc,2021-12-24 09:32:25,rmqevj,"I'm sorry, but this is the opposition we're ta...",,,0.471892
179,PublicWar5,hpp9czp,2021-12-24 00:57:35,rmqevj,"Honestly I hate the COP, I hate how much of a ...",,,0.997284
