# Prediction of Twitter Data using Trained Word2Seq and Word2Vec Models

This notebook performs prediction on the Twitter dataset gathered by the University of Malaya Halal research group.  
Therefore, the data is not publicly exposed and can be made available upon further request.  
The prediction of the Twitter data uses the trained **Word2Vec** and **Word2Seq** models.  
The list of models available are:
* Word2Seq Convolutional Neural Network
* Word2Seq Long Short Term Memory
* Word2Seq Convolutional Neural Network + Long Short Term Memory
* Word2Seq Convolutional Neural Nwtwork + Bi-directional Recurrent Neural Network + Bi-directional Long Short Term Memory
* Word2Vec Convolutional Neural Network
* Word2Vec Long Short Term Memory
* Word2Vec Convolutional Neural Network + Long Short Term Memory
* Word2Vec Convolutional Neural Nwtwork + Bi-directional Recurrent Neural Network + Bi-directional Long Short Term Memory

###### Import the required libraries

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
from tensorflow.python.keras.preprocessing import sequence as keras_seq
from tensorflow.python.keras.models import load_model
from tensorflow.python.keras.preprocessing import text as keras_text, sequence as keras_seq
from sklearn.utils import shuffle
from tensorflow import set_random_seed
import os
import gc

# Set seed
myrand=58584
np.random.seed(myrand)
set_random_seed(myrand)

WORDS_SIZE=8000

# To allow dynamic GPU memory allowcation for model training
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  
config.log_device_placement = True

sess = tf.Session(config=config)
set_session(sess)

Using TensorFlow backend.


######  Load and prepare the collected Twitter data

In [2]:
data = pd.read_excel('../../data.xlsx',sheet_name='Sheet1')
data.text = data.text.astype(str)
data.shape

(105542, 5)

In [3]:
data.head(n=10)

Unnamed: 0,file_name,hash,text,timestamp,user_
0,halal_skincare.json,0e8df15d4bd22ee2a025e2ba244afc4e,Darah ko ni Dah kira Halal JAKIM,2014-01-06T08:46:19.000Z,Mylea_Skincare
1,halal_skincare.json,8b8a05bb640c3a08f13da6beefb2c458,Menurut kajian ada sesetengah pemakan rasuah o...,2014-01-06T08:45:11.000Z,Mylea_Skincare
2,halal_skincare.json,c6863b3517ccedc7c05b9acf9fb71d1b,we have a full range of cleansing and skincare...,2014-01-01T23:15:59.000Z,halalcosco
3,halal_skincare.json,5f5722fc7e900ec7074d702a8f3d3343,Inovasi skin care terkini bebas mercury dan no...,2014-01-01T04:35:54.000Z,rullynursesi
4,halal_trip.json,1bb31c3e8faebddec3c158a017be21f0,Wuih suami istri ikutan open trip I thought th...,2018-04-07T08:49:15.000Z,sayannisa
5,halal_trip.json,0d6140e38f2aa4f4414c3080ba651e6a,Love it check it out halalexpo,2018-04-06T12:11:39.000Z,MillanUS
6,halal_trip.json,23cbd1544d1e4359a248ba9898efa3db,Trip hobi traveling generasi muslim milenial d...,2018-04-06T00:10:58.000Z,Irsyad_af21
7,halal_skincare.json,937a238c510e64d0c33110a6639412dc,Halal is a requirement not only for food and b...,2013-12-30T19:18:30.000Z,Famiza72
8,halal_skincare.json,b1f410a86f23cd081b78c74cae03d74c,Love my pretty purchases from latifahalalbeaut...,2013-12-30T14:08:47.000Z,BlossomAndBean
9,halal_skincare.json,3730d033aaa900c211c296a40da48db2,Soyeux Skin Care adalah produk halal dan selam...,2013-12-25T03:52:14.000Z,soyeuxofficial


###### Load and prepare the tokkenizer for Word2Seq and Word2Vec

In [4]:
mydata = pd.read_csv('../../../../../Master (Sentiment Analysis)/Paper/Paper 3/Datasets/eRezeki/eRezeki_(text_class)_unclean.csv',header=0,encoding='utf-8')
mydata = mydata.loc[mydata['sentiment'] != "neutral"]
mydata['sentiment'] = mydata['sentiment'].map({'negative': 0, 'positive': 1})

mydata1 = pd.read_csv('../../../../../Master (Sentiment Analysis)/Paper/Paper 3/Datasets/IMDB/all_random.csv',header=0,encoding='utf-8')
mydata = mydata.append(mydata1)
mydata = shuffle(mydata)

mydata1 = pd.read_csv('../../../../../Master (Sentiment Analysis)/Paper/Paper 3/Datasets/Amazon(sports_outdoors)/Amazon_UCSD.csv',header=0,encoding='utf-8')
mydata1['feedback'] = mydata1['feedback'].astype(str)
mydata = mydata.append(mydata1)
mydata = shuffle(mydata)

mydata1 = pd.read_csv('../../../../../Master (Sentiment Analysis)/Paper/Paper 3/Datasets/Yelp(zhang_paper)/yelp_zhang.csv',header=0,encoding='utf-8')
mydata1['feedback'] = mydata1['feedback'].astype(str)
mydata = mydata.append(mydata1)

del(mydata1)
gc.collect()

mydata = shuffle(mydata)
mydata = shuffle(mydata)
mydata = shuffle(mydata)

###### Create tokkenizer from full list of texts

In [5]:
tokenizer = keras_text.Tokenizer(char_level=False)
tokenizer.fit_on_texts(list(mydata['feedback']))
tokenizer.num_words=WORDS_SIZE

###### Load the trained models

Create dictionary for different input sizes for each model

In [6]:
models_list = os.listdir('../Models/')
input_sizes = {'word2seq_cnn':700,
               'word2seq_cnn_birnn_bilstm':100,
               'word2seq_cnn_lstm':500,
               'word2seq_lstm':100,
               'word2vec_cnn':700,
               'word2vec_cnn_birnn_bilstm':100,
               'word2vec_cnn_lstm':500,
               'word2vec_lstm':100}

###### Function for sequence data matrix creation from Twitter data

In [7]:
def create_seq(input_size):
    list_tokenized = tokenizer.texts_to_sequences(list(data.text))
    x_data = keras_seq.pad_sequences(list_tokenized, 
                                     maxlen=input_size,
                                     padding='post')
    x_data = x_data.astype(np.int64)
    return(x_data)

###### Function for predict the data

In [8]:
def predict_data(model,x_data):
    sentiment = model.predict_classes(x_data)
    sentiment = sentiment.astype(str)
    sentiment[sentiment=='1'] = "Positive"
    sentiment[sentiment=='0'] = "Negative"
    probability = model.predict_proba(x_data)
    positive_probability = probability[:,1]
    negative_probabiltiy = probability[:,0]
    return(sentiment, positive_probability, negative_probabiltiy)

###### Function to add new column to the excel dataframe

In [9]:
def add_columns(data, model_name, sentiment, positive_probability, negative_probabiltiy):
    name_1 = '%s_sentiment' % (model_name)
    name_2 = '%s_posProb' % (model_name)
    name_3 = '%s_negProb' % (model_name)
    data[name_1] = sentiment
    data[name_2] = positive_probability
    data[name_3] = negative_probabiltiy
    return(data)

###### Start looping to predict the sentiment

In [10]:
for name in models_list:
    x_data = create_seq(input_sizes[name])
    mydir = '../Models/%s/%s.hdf5' % (name,name)
    model = load_model(mydir)
    sentiment, positive_prob, negative_prob = predict_data(model, x_data)
    data = add_columns(data, name, sentiment, positive_prob, negative_prob)

data.head(n=10)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Unnamed: 0,file_name,hash,text,timestamp,user_,word2seq_cnn_sentiment,word2seq_cnn_posProb,word2seq_cnn_negProb,word2seq_cnn_birnn_bilstm_sentiment,word2seq_cnn_birnn_bilstm_posProb,...,word2vec_cnn_negProb,word2vec_cnn_birnn_bilstm_sentiment,word2vec_cnn_birnn_bilstm_posProb,word2vec_cnn_birnn_bilstm_negProb,word2vec_cnn_lstm_sentiment,word2vec_cnn_lstm_posProb,word2vec_cnn_lstm_negProb,word2vec_lstm_sentiment,word2vec_lstm_posProb,word2vec_lstm_negProb
0,halal_skincare.json,0e8df15d4bd22ee2a025e2ba244afc4e,Darah ko ni Dah kira Halal JAKIM,2014-01-06T08:46:19.000Z,Mylea_Skincare,Positive,0.824971,0.175029,Positive,0.74786,...,0.412001,Positive,0.890676,0.109324,Positive,0.838263,0.161737,Positive,0.794887,0.205113
1,halal_skincare.json,8b8a05bb640c3a08f13da6beefb2c458,Menurut kajian ada sesetengah pemakan rasuah o...,2014-01-06T08:45:11.000Z,Mylea_Skincare,Positive,0.865165,0.134835,Positive,0.792321,...,0.143463,Positive,0.896188,0.103812,Positive,0.824261,0.175739,Positive,0.887403,0.112597
2,halal_skincare.json,c6863b3517ccedc7c05b9acf9fb71d1b,we have a full range of cleansing and skincare...,2014-01-01T23:15:59.000Z,halalcosco,Positive,0.894654,0.105346,Positive,0.969022,...,0.072593,Positive,0.781111,0.218889,Positive,0.934829,0.065171,Positive,0.994195,0.005805
3,halal_skincare.json,5f5722fc7e900ec7074d702a8f3d3343,Inovasi skin care terkini bebas mercury dan no...,2014-01-01T04:35:54.000Z,rullynursesi,Positive,0.976185,0.023815,Positive,0.918586,...,0.444528,Positive,0.892971,0.107029,Positive,0.571484,0.428516,Positive,0.84839,0.15161
4,halal_trip.json,1bb31c3e8faebddec3c158a017be21f0,Wuih suami istri ikutan open trip I thought th...,2018-04-07T08:49:15.000Z,sayannisa,Positive,0.925633,0.074367,Positive,0.928461,...,0.111627,Positive,0.697596,0.302404,Positive,0.785474,0.214526,Positive,0.949898,0.050102
5,halal_trip.json,0d6140e38f2aa4f4414c3080ba651e6a,Love it check it out halalexpo,2018-04-06T12:11:39.000Z,MillanUS,Positive,0.997361,0.002639,Positive,0.994626,...,0.001862,Positive,0.997401,0.002599,Positive,0.995919,0.004081,Positive,0.997612,0.002388
6,halal_trip.json,23cbd1544d1e4359a248ba9898efa3db,Trip hobi traveling generasi muslim milenial d...,2018-04-06T00:10:58.000Z,Irsyad_af21,Positive,0.968688,0.031311,Positive,0.964089,...,0.914599,Positive,0.537492,0.462508,Positive,0.73164,0.26836,Positive,0.93358,0.06642
7,halal_skincare.json,937a238c510e64d0c33110a6639412dc,Halal is a requirement not only for food and b...,2013-12-30T19:18:30.000Z,Famiza72,Negative,0.37843,0.62157,Positive,0.731976,...,0.010642,Negative,0.485554,0.514446,Positive,0.86516,0.13484,Positive,0.965564,0.034436
8,halal_skincare.json,b1f410a86f23cd081b78c74cae03d74c,Love my pretty purchases from latifahalalbeaut...,2013-12-30T14:08:47.000Z,BlossomAndBean,Positive,0.956832,0.043168,Positive,0.992863,...,0.000121,Positive,0.997387,0.002613,Positive,0.99453,0.00547,Positive,0.996901,0.003099
9,halal_skincare.json,3730d033aaa900c211c296a40da48db2,Soyeux Skin Care adalah produk halal dan selam...,2013-12-25T03:52:14.000Z,soyeuxofficial,Positive,0.979334,0.020666,Positive,0.941147,...,0.121889,Positive,0.855013,0.144987,Positive,0.976979,0.023021,Positive,0.964069,0.035931


###### Calcuate the weighted sentiments through average of probabilities
* Weighted probability of positive sentiment
* Weighted probability of negative sentiment
* Weighted sentiment

In [208]:
data['weighted_sentiment'] = np.str
data['weighted_posProb'] = np.zeros
data['weighted_negProb'] = np.zeros

for i in range(data.shape[0]):
    data.loc[i,'weighted_posProb'] = np.mean(data.iloc[i,[6, 9, 12, 15,18, 21, 24, 27]].values)
    data.loc[i,'weighted_negProb'] = np.mean(data.iloc[i,[7, 10, 13, 16, 19, 22, 25, 28]].values)
    
    if data.loc[i,'weighted_posProb'] > data.loc[i,'weighted_negProb']:
        data.loc[i,'weighted_sentiment'] = "Positive"
    elif data.loc[i,'weighted_posProb'] == data.loc[i,'weighted_negProb']:
        data.loc[i,'weighted_sentiment'] = "Neutral"
    else:
        data.loc[i,'weighted_sentiment'] = "Negative"
    
data.head(n=10)

Unnamed: 0,file_name,hash,text,timestamp,user_,word2seq_cnn_sentiment,word2seq_cnn_posProb,word2seq_cnn_negProb,word2seq_cnn_birnn_bilstm_sentiment,word2seq_cnn_birnn_bilstm_posProb,...,word2vec_cnn_birnn_bilstm_negProb,word2vec_cnn_lstm_sentiment,word2vec_cnn_lstm_posProb,word2vec_cnn_lstm_negProb,word2vec_lstm_sentiment,word2vec_lstm_posProb,word2vec_lstm_negProb,weighted_sentiment,weighted_posProb,weighted_negProb
0,halal_skincare.json,0e8df15d4bd22ee2a025e2ba244afc4e,Darah ko ni Dah kira Halal JAKIM,2014-01-06T08:46:19.000Z,Mylea_Skincare,Positive,0.824971,0.175029,Positive,0.74786,...,0.109324,Positive,0.838263,0.161737,Positive,0.794887,0.205113,Positive,0.780199,0.219801
1,halal_skincare.json,8b8a05bb640c3a08f13da6beefb2c458,Menurut kajian ada sesetengah pemakan rasuah o...,2014-01-06T08:45:11.000Z,Mylea_Skincare,Positive,0.865165,0.134835,Positive,0.792321,...,0.103812,Positive,0.824261,0.175739,Positive,0.887403,0.112597,Positive,0.863168,0.136832
2,halal_skincare.json,c6863b3517ccedc7c05b9acf9fb71d1b,we have a full range of cleansing and skincare...,2014-01-01T23:15:59.000Z,halalcosco,Positive,0.894654,0.105346,Positive,0.969022,...,0.218889,Positive,0.934829,0.065171,Positive,0.994195,0.005805,Positive,0.901803,0.0981971
3,halal_skincare.json,5f5722fc7e900ec7074d702a8f3d3343,Inovasi skin care terkini bebas mercury dan no...,2014-01-01T04:35:54.000Z,rullynursesi,Positive,0.976185,0.023815,Positive,0.918586,...,0.107029,Positive,0.571484,0.428516,Positive,0.84839,0.15161,Positive,0.811292,0.188708
4,halal_trip.json,1bb31c3e8faebddec3c158a017be21f0,Wuih suami istri ikutan open trip I thought th...,2018-04-07T08:49:15.000Z,sayannisa,Positive,0.925633,0.074367,Positive,0.928461,...,0.302404,Positive,0.785474,0.214526,Positive,0.949898,0.050102,Positive,0.881008,0.118992
5,halal_trip.json,0d6140e38f2aa4f4414c3080ba651e6a,Love it check it out halalexpo,2018-04-06T12:11:39.000Z,MillanUS,Positive,0.997361,0.002639,Positive,0.994626,...,0.002599,Positive,0.995919,0.004081,Positive,0.997612,0.002388,Positive,0.996597,0.00340308
6,halal_trip.json,23cbd1544d1e4359a248ba9898efa3db,Trip hobi traveling generasi muslim milenial d...,2018-04-06T00:10:58.000Z,Irsyad_af21,Positive,0.968688,0.031311,Positive,0.964089,...,0.462508,Positive,0.73164,0.26836,Positive,0.93358,0.06642,Positive,0.749523,0.250477
7,halal_skincare.json,937a238c510e64d0c33110a6639412dc,Halal is a requirement not only for food and b...,2013-12-30T19:18:30.000Z,Famiza72,Negative,0.37843,0.62157,Positive,0.731976,...,0.514446,Positive,0.86516,0.13484,Positive,0.965564,0.034436,Positive,0.729718,0.270282
8,halal_skincare.json,b1f410a86f23cd081b78c74cae03d74c,Love my pretty purchases from latifahalalbeaut...,2013-12-30T14:08:47.000Z,BlossomAndBean,Positive,0.956832,0.043168,Positive,0.992863,...,0.002613,Positive,0.99453,0.00547,Positive,0.996901,0.003099,Positive,0.987233,0.0127671
9,halal_skincare.json,3730d033aaa900c211c296a40da48db2,Soyeux Skin Care adalah produk halal dan selam...,2013-12-25T03:52:14.000Z,soyeuxofficial,Positive,0.979334,0.020666,Positive,0.941147,...,0.144987,Positive,0.976979,0.023021,Positive,0.964069,0.035931,Positive,0.927667,0.0723332


###### Save the data to a new excel file

In [209]:
data.columns
data.to_excel('../../data_predicted.xlsx')