In this project, we utilize additive attention model to do the task of sentiment analysis on a dataset of 1500 customer feedbacks. The word tokens are initally mapped to one-hot vectors and the performance of the model reaches above 90% accuracy on the training dataset after few iterations (~20 iterations). The accuracy of this model on the testing set is ~87%. Alternatively, the tokens are mapped to GloVe word embeddings; the accuracy of the model converges relatively slowly to ~98% and ~90% on the training and testing datasets respectively. 

In [1]:
#Loading the packages required: 
from tensorflow.keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply, Softmax
from tensorflow.keras.layers import RepeatVector, Dense, Activation, Lambda
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.activations import softmax
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import load_model, Model
import tensorflow.keras.backend as K
import tensorflow as tf

import numpy as np
import pandas as pd
from faker import Faker
import random
from tqdm import tqdm
from babel.dates import format_date
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split 

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize, RegexpTokenizer
from nltk.stem import PorterStemmer,LancasterStemmer
import re

In [2]:
def pred(y): 
    '''
    This function maps the probabilities outputed by the model back to the rankings list 
    and outputs the ranking with the highest probability. 
    
    inputs: 
    y  (1,m)     : Probability output of the RNN model 
    
    outputs: 
    res (string) : The ranking corresponding to the most probable outcome. 
    
    '''
    y = y.tolist()
    #ranking = ['Below Average' , 'Average' , 'Above Average']
    res = ranking[y.index(max(y))]
    return(res)


In [3]:
def vec_output(y): 
    """
    This function takes the y_test dataset and returns a one-hot vector for each sample. 
    
    """
    m = len(ranking)
    txt = y
    v = np.zeros(m) 
    j = ranking.index(txt)
    v[j] = 1
    return v 

In [4]:
#Loading the data: 
CustomerFeed = 'Canva_reviews.xlsx'
df = pd.read_excel(CustomerFeed)

print(df)

                                               reviewId            userName  \
0     gp:AOqpTOFxf3fttcT5DSvFIn9KPp5FErgH9yC533Fmoxv...      Donna Caritero   
1     gp:AOqpTOEq6rNIWLnPV4KFTctWvm0mpGEQljtD6mvy1H-...  Soumi Mukhopadhyay   
2     gp:AOqpTOE86hSyPRHZgYt28Uk5zGe4FZGb1hkmtFDiYJ2...   Theknown _unknown   
3     gp:AOqpTOHSuKkVTcM3QgCCKysHQlxEnk2ocOKsUMiMIJy...        Anthony Dean   
4     gp:AOqpTOEOrZt5H6jXPiplJyffCd5ZBnVXACTWgwNsF1R...   Neha Diana Wesley   
...                                                 ...                 ...   
1495  gp:AOqpTOHhnXMpylU3f-1V1KbR2hwWArOilxPlKI6K4xY...            Reen Ali   
1496  gp:AOqpTOEcz62DHS-amqTB5xGMhM4_R0UJpcv_HDNny9i...     Shaurya Chilwal   
1497  gp:AOqpTOFMqEqa_kpp29Q8wjcBmKUCAvOQGQx4KZQ8b83...           GK Gaming   
1498  gp:AOqpTOGY4z3pUxeiqGzn2ad3Noxqlbm-9DZ3ksHqD1_...    1203_Vani Sharma   
1499  gp:AOqpTOFVGZ0MXyR-Gv_d2cYf2KD709Hwple_u7OZE4y...           MeLLy EcK   

                                              userI

In [5]:
df = df[["review", "Sentiment"]]
df.head()

Unnamed: 0,review,Sentiment
0,Overall it's really an amazing app. I've been ...,Negative
1,Hey! Yes I gave a 5 star rating... coz I belie...,Positive
2,Canva used to be a good app! But recently I've...,Negative
3,"It's a brilliant app, but I have just one prob...",Negative
4,This was such a great app. I used to make BTS ...,Negative


In [50]:
def edit_txt(review):
    """
    This function receives a text and returns it edited as follows: 
    1, all words converted to lower case 
    2, integers removed
    3, tokenize the words 
    4, punctuation removed 
    5, common words that are unnecessary are removed. 
    """
    
    review_edited = []

    #Converting to lower case: 
    review_edited = review.lower() 
    
    #Removing integers: 
    pattern = r'[0-9]'
    # Match all digits in the string and replace them with an empty string
    review_edited = re.sub(pattern, '', review_edited) 

    #Tokenize the comment: 
    review_edited = word_tokenize(review_edited) 

    #Removing punctuation 
    tokenizer = RegexpTokenizer(r'\w+')
    review_edited = [''.join(tokenizer.tokenize(word)) for word in review_edited if len(tokenizer.tokenize(word))>0]

    #Removing common words: 
    #remove_list = stopwords.words('english') 
    #to_remove = [ "not",'don',"don't",'should',"should've", 'ain','aren',"aren't",'couldn',"couldn't",'didn',"didn't",'doesn',"doesn't",'hadn',"hadn't",'hasn',"hasn't",'haven',"haven't",'isn',"isn't",'mightn',"mightn't",'mustn',"mustn't",'needn',"needn't",'shan',"shan't",'shouldn',"shouldn't",'wasn',"wasn't",'weren',"weren't",'won',"won't",'wouldn', "wouldn't"]
 
    #review_edited = [word for word in review_edited if not word in remove_list]
    return(review_edited) 

   


In [7]:
# Extract the reviews: 
x = df["review"] 

In [51]:
len(x)

1500

In [8]:
#Modify the text to test the function reviews_edited: 
reviews_edited = [edit_txt(review) for review in x]
print(reviews_edited[13])
print(x[13])

['unable', 'save', 'work', 'nothing', 'works']
Unable to save my work. Nothing works :(


In [9]:
# Define the target dataset and extract the unique rankings: 
y = df["Sentiment"].tolist()
ranking = np.unique(y)
ranking = ranking.tolist()
ranking

['Negative', 'Positive']

In [10]:
# Creating the dictionary: 
Split = [] 
Dic = []
dictionary = np.unique([word for review in reviews_edited for word in review]).tolist()

# Add extra padding to limit the length of the input: 
dictionary = dictionary + ["<pad>"]
dictionary[1:10]

['aa',
 'aap',
 'ability',
 'able',
 'absolutely',
 'acc',
 'accepted',
 'access',
 'accessibilities']

In [11]:
# Split the dataset into training and testing datasets: 
#x = x.to_list()
X_train, X_test, y_train, y_test = train_test_split(x,y, 
                                   random_state=104,  
                                   test_size=0.25,  
                                   shuffle=True) 

In [12]:
# Apply the edit_txt function to both text corpus: 
X_train = [edit_txt(comment) for comment in X_train]
X_test = [edit_txt(comment) for comment in X_test]

In [13]:
X_train[0]

['spend',
 'much',
 'time',
 'working',
 'poster',
 'app',
 'allowing',
 'download',
 'simply',
 'wasted',
 'hard',
 'work']

Our Objective is to apply the attention model once to the one-hot vector representations of the dataset and once to the dataset when encoded with global vector embeddings (using gvec_input function). Using one-hot vectors seems to not be the most efficient way of encoding the input since the vectors tend to be large (2197 etries) and they don't carry much information.  

In [16]:
# Starting with the one-hot vector representations: 
def vec_input(x,h, dictionary): 
    
    """
    This function takes any input (a sentence from customers), x, and returns one-hot vectors based on 
    words introduced in the vocabulary. This function returns k vectors where k is the number of words in the 
    sentence. Every vector corresponds to a word in the dictionary and has entries = 0 except the entry that 
    corresponds to the word in the dictionary.

    Furthermore, the length of the sequences is limited to h. Inputs shorter than h will have padded enteries and inputs longer than h 
    will be shortened. 
    
    inputs: 
    
    x (string) : a statement from customers. 
    
    outputs: 
    v (m,n)    : where m is the number of words in the sentence and n is the number of total words in the dictionary. 
    
    """
    m = len(dictionary)
    txt = x
    txt = (txt[:h] if len(txt) > h else txt + ['<pad>'] * (h - len(txt)))
    n = len(txt)
    v = np.zeros((n, m))
    
    for i in range(0, n): 
        j = dictionary.index(txt[i])
        v[i,j] = 1
        
    return(v)

In [17]:
Tx = 30 
X_trainmod = np.array([vec_input(X_train[i],Tx, dictionary) for i in range(len(X_train))])
X_testmod = np.array([vec_input(X_test[i],Tx, dictionary) for i in range(len(X_test))])

In [18]:
print(X_trainmod[0])
print(X_trainmod.shape) #( 1125 samples, 30 max length, 2197 length of the dictionary) 

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]]
(1125, 30, 2197)


In [19]:
y_trainmod = (np.array([vec_output(y_train[i]) for i in range(len(y_train))])).reshape(len(y_train),1,len(ranking))
y_testmod  = (np.array([vec_output(y_test[i]) for i in range(len(y_test))])).reshape(len(y_test),1,len(ranking))

In [20]:
print(y_trainmod[0:2])
print(y_trainmod.shape)
print(y_testmod[0:2])
print(y_testmod.shape)

[[[1. 0.]]

 [[1. 0.]]]
(1125, 1, 2)
[[[0. 1.]]

 [[0. 1.]]]
(375, 1, 2)


In [21]:
def NeuralAttention(a,s_prev,b): 
    """
    Implements one step of attention mechanism
    
    Arguments:
    a -- output of the Bi-LSTM of shape (m, Tx, 2* n_a)  #(#samples, #rows, #columns)
    s_prev -- previous hidden state of the LSTM of shape (m, n_s)
    Tx -- length of the input sequence (Global Variable)

    Returns:
    context -- context vector, input of the next LSTM cell
    """
    #Create copies of s_prev 
    s_prev = RepeatVector(Tx)(s_prev) #what about all samples together 
    
    #Concatenate s_prev and a: 
    concat = Concatenate(axis = -1)([a,s_prev])
    
    #Run through the first layer of FFN with activation tanh and with 10 neurons: 
    dense1 = Dense(b, activation = "tanh")(concat)
    
    #Run through the final layer of FFN with activation ReLU and 1 neuron: 
    energies = Dense(1,activation = "relu")(dense1)
    
    #Run through a Softmax function to find alphas: 
    alphas = Softmax(axis = 1)(energies)
    
    #Multiply the alphas with their respective a<t'>: 
    Context = Dot(axes=1)([alphas,a]) 
    
    return(Context)



In [22]:
def modelf(Tx, n_a, n_s, len_dict, len_rank,b):
    """
    Arguments:
    Tx -- length of the input sequence
    Ty -- length of the output sequence
    n_a -- hidden state size of the Bi-LSTM
    n_s -- hidden state size of the post-attention LSTM
    len_dict -- size of the python dictionary of words 
    len_rank-- number of categories in the output 

    Returns:
    model -- Keras model instance
    """

    
    # Define the inputs of your model with a shape (Tx, human_vocab_size)
    # Define s0 (initial hidden state) and c0 (initial cell state)
    # for the decoder LSTM with shape (n_s,)
    X = Input(shape=(Tx, len_dict))
    # initial hidden state
    s0 = Input(shape=(n_s,), name='s0')
    # initial cell state
    c0 = Input(shape=(n_s,), name='c0')
    # hidden state
    s = s0
    # cell state
    c = c0
    
    # Initialize empty list of outputs
    outputs = []
    
    # Define your pre-attention Bi-LSTM. a is a list of all the hidden states. 
    a = Bidirectional(LSTM(units=n_a, return_sequences=True))(X) 
    
    
    # Perform one step of the attention mechanism to get back the context vector at step t 
    context = NeuralAttention(a,s,b)
        
    # Apply the post-attention LSTM cell to the "context" vector while also inputting the previous hidden state and cell state. 
    s, _, c = LSTM(n_s, return_state = True)(context, initial_state=[s, c])
       
    # Apply Dense layer to the hidden state output of the post-attention LSTM 
    out = Dense(len_rank,activation = "tanh")(s)
        #out = output_layer(s)
        # Run through a Softmax function: 
    res = Softmax(axis = 1)(out)
    # Append "out" to the "outputs" list 
    outputs.append(res)
    
    # Create model instance taking three inputs and returning the list of outputs.
    model = Model(inputs = [X,s0,c0], outputs = outputs)
    
    return model

In [23]:
n_a = 32 # number of units for the pre-attention, bi-directional LSTM's hidden state 'a'
n_s = 64 # number of units for the post-attention LSTM's hidden state "s"
Tx  = 30 # maximum length of the input 
b   = 50 # number of hidden neurons in the pre-attention bi-directional LSTM 
len_dict = len(dictionary)
len_rank = len(ranking)
model = modelf(Tx, n_a, n_s, len_dict, len_rank,b) 
model.summary()

In [24]:
opt = Adam(0.005,beta_1 = 0.9, beta_2 = 0.999, decay = 0.01) 
model.compile(loss = "categorical_crossentropy", optimizer = opt, metrics = ["accuracy"])



In [30]:
m = X_trainmod.shape[0]
s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))
outputs = list(y_trainmod.swapaxes(0,1))

In [31]:
# The loss of the model is significantly less when the text corpus is edited. 
model.fit([X_trainmod, s0, c0], outputs, epochs=100, batch_size=200)

Epoch 1/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step - accuracy: 0.9971 - loss: 0.1327
Epoch 2/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.9964 - loss: 0.1342
Epoch 3/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.9956 - loss: 0.1358
Epoch 4/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.9965 - loss: 0.1340
Epoch 5/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.9965 - loss: 0.1340
Epoch 6/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step - accuracy: 0.9973 - loss: 0.1324
Epoch 7/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 81ms/step - accuracy: 0.9953 - loss: 0.1363
Epoch 8/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step - accuracy: 0.9968 - loss: 0.1333
Epoch 9/100
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[3

<keras.src.callbacks.history.History at 0x2851c5450>

In [32]:
m = X_testmod.shape[0]
s00 = np.zeros((m, n_s))
c00 = np.zeros((m, n_s))
outputs = list(y_testmod.swapaxes(0,1))
model.evaluate([X_testmod,s00,c00], outputs)

[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.8721 - loss: 0.3808


[0.38185855746269226, 0.8693333268165588]

 Usually, the ones the model has a hard time classifying are the feedbacks in which customers talk both about the pros and cons of the product. Take a look to see if the issue with some predictions is with the dataset or the model incapability. However the additive attention model seems to work pretty well even with the one-hot vectors. reaching an accuracy above 90% in less than 100 iterations. The model reaches accuracy of ~99% on the training dataset and ~87% on the testing set after a very small number of iteractions ~250. 

In [29]:
s00 = np.zeros((20, n_s))
c00 = np.zeros((20, n_s))

predictions = model.predict([X_testmod[0:20], s00, c00])
predictions = np.argmax(predictions, axis = -1)
output = [ranking[int(x)] for x in predictions]
for i in range(len(output)): 
    print(f"Comment: {X_test[i]} \n Ranking: {y_test[i]}, prediction: {output[i]}\n")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 134ms/step
Comment: ['waaa', 'interesting', 'picture', 'thank', 'canva', 'helped', 'edit', 'picture', 'thanks', 'give', 'stars'] 
 Ranking: Positive, prediction: Positive

Comment: ['app', 'best', 'app', 'ever', 'used', 'requesting', 'let', 'us', 'add', 'music', 'also', 'let', 'us', 'search', 'music', 'add', 'keep'] 
 Ranking: Positive, prediction: Negative

Comment: ['great', 'app', 'totally', 'recommend', 'small', 'business', 'owners', 'etc', 'would', 'get', 'like', 'templates', 'resizing', 'options', 'free', 'version', 'plz'] 
 Ranking: Negative, prediction: Positive

Comment: ['love', 'designs', 'may', 'free', 'thanks', 'canva', 'team'] 
 Ranking: Positive, prediction: Positive

Comment: ['useful', 'app', 'indeed', 'everyone', 'making', 'design', 'needs', 'love'] 
 Ranking: Positive, prediction: Positive

Comment: ['nice', 'logo', 'thumbnails', 'editing', 'facebook', 'instagram', 'tube'] 
 Ranking: Positive, prediction: 

# Global Vector Word Embeddings: 

In [33]:
# Load the word embeddings (Glove word embeddings) 
embeddings_dict = {}
with open("glove.6B.50d.txt", 'r') as f:
    for line in f:
        values = line.split()
        word = values[0]
        vector = np.asarray(values[1:], "float32")
        embeddings_dict[word] = vector

words =  list(embeddings_dict.keys())
vectors = [embeddings_dict[word] for word in words]

In [35]:
# Encoding the input with Glove Word Embeddings: 
def gvec_input(x,m,e): 
    """
    
    This function takes any input, x, and returns a glove vector based on the 
    words introduced in the vocabulary (400,000 words). This function returns k vectors where k is the number of words in the 
    sentence. Every vector corresponds to a word in the dictionary and each entry will describe a feature of the word. 
    
    inputs: 
    
    x (string) : a statement from customers. 
    m (int)    : size of the sequence 
    e (int)    : size of the embeddings 
    outputs: 
    v (m,n)    : where m is the number of words in the sentence and n = 50 is the number of total features describing a word. 

    
    """
    n = len(x)
    gv = np.zeros((n,m, e))
    
    for i in range(0, n): #looping over each comment 
        txt = x[i] #select the ith comment  
        txt = (txt[:m] if len(txt) > m else txt + ['<pad>'] * (m - len(txt))) #shorten or add extra padding
        for l in range(m): #looping over each word 
            
            # add the embedding of all ones for pads
            if txt[l] == "<pad>": 
                gv[i,l,:] = np.zeros(e) 
                
            # if a word is not is the list of Glove embeddings, then assign an array which is the average of all embeddings:    
            elif txt[l] not in words: 
                gv[i,l,:] = np.mean(vectors, axis = 0)
            # add the word embeddings: 
            else: 
                gv[i,l,:] = embeddings_dict[txt[l]]
    return(gv)

In [36]:
# Limit the length of the sequence: 
m = 30 
# The length of the embeddings: 
e = 50
X_trainmod = gvec_input(X_train,m,e) 
X_testmod = gvec_input(X_test,m,e)

In [37]:
print(X_trainmod.shape)
print(X_testmod.shape)

(1125, 30, 50)
(375, 30, 50)


In [38]:
def NeuralAttention(a,s_prev,b): 
    """
    Implements one step of attention mechanism
    
    Arguments:
    a -- output of the Bi-LSTM of shape (m, Tx, 2* n_a)  #(#samples, #rows, #columns)
    s_prev -- previous hidden state of the LSTM of shape (m, n_s)
    Tx -- length of the input sequence (Global Variable)

    Returns:
    context -- context vector, input of the next LSTM cell
    """
    #Create copies of s_prev 
    s_prev = RepeatVector(Tx)(s_prev) #what about all samples together 
    
    #Concatenate s_prev and a: 
    concat = Concatenate(axis = -1)([a,s_prev])
    
    #Run through the first two layers of FFN with activation tanh and with 10 neurons: 
    dense1 = Dense(b, activation = "tanh")(concat)
    dense2 = Dense(b,activation = "tanh")(dense1)
    #Run through the final layer of FFN with activation ReLU and 1 neuron: 
    energies = Dense(1,activation = "relu")(dense1)
    
    #Run through a Softmax function to find alphas: 
    alphas = Softmax(axis = 1)(energies)
    
    #Multiply the alphas with their respective a<t'>: 
    Context = Dot(axes=1)([alphas,a]) 
    
    return(Context)



In [39]:
# Needs modification since the length of the input has now changed: 
def modelf(Tx, n_a, n_s, e, len_rank,b):
    """
    Arguments:
    Tx -- length of the input sequence
    Ty -- length of the output sequence
    n_a -- hidden state size of the Bi-LSTM
    n_s -- hidden state size of the post-attention LSTM
    len_e -- length of each word embedding vector 
    len_rank-- number of categories in the output 

    Returns:
    model -- Keras model instance
    """

    
    # Define the inputs of your model with a shape (Tx, human_vocab_size)
    # Define s0 (initial hidden state) and c0 (initial cell state)
    # for the decoder LSTM with shape (n_s,)
    X = Input(shape=(Tx, e))
    # initial hidden state
    s0 = Input(shape=(n_s,), name='s0')
    # initial cell state
    c0 = Input(shape=(n_s,), name='c0')
    # hidden state
    s = s0
    # cell state
    c = c0
    
    # Initialize empty list of outputs
    outputs = []
    
    # Define your pre-attention Bi-LSTM. a is a list of all the hidden states. 
    a = Bidirectional(LSTM(units=n_a, return_sequences=True))(X) 
    
    
    # Perform one step of the attention mechanism to get back the context vector at step t 
    context = NeuralAttention(a,s,b)
        
    # Apply the post-attention LSTM cell to the "context" vector while also inputting the previous hidden state and cell state. 
    s, _, c = LSTM(n_s, return_state = True)(context, initial_state=[s, c])
       
    # Apply Dense layer to the hidden state output of the post-attention LSTM 
    out = Dense(len_rank,activation = "tanh")(s)
        #out = output_layer(s)
        # Run through a Softmax function: 
    res = Softmax(axis = 1)(out)
    # Append "out" to the "outputs" list 
    outputs.append(res)
    
    # Create model instance taking three inputs and returning the list of outputs.
    model = Model(inputs = [X,s0,c0], outputs = outputs)
    
    return model

In [40]:
# Redefine the model: 
n_a = 64 # number of units for the pre-attention, bi-directional LSTM's hidden state 'a'
n_s = 128 # number of units for the post-attention LSTM's hidden state "s"
Tx  = 30 # maximum length of the input 
b   = 30 # number of hidden neurons in the pre-attention bi-directional LSTM 
e = 50
len_rank = len(ranking)
model = modelf(Tx, n_a, n_s, e, len_rank,b) 
model.summary()

In [41]:
# Compile the model introducing the loss function: 
opt = Adam(0.002,beta_1 = 0.9, beta_2 = 0.999, decay = 0.01) 
model.compile(loss = "categorical_crossentropy", optimizer = opt, metrics = ["accuracy"])

In [44]:
m = X_trainmod.shape[0]
s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))
outputs = list(y_trainmod.swapaxes(0,1))

In [47]:
model.fit([X_trainmod, s0, c0], outputs, epochs=200, batch_size=200)

Epoch 1/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step - accuracy: 0.9844 - loss: 0.1582
Epoch 2/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.9858 - loss: 0.1554
Epoch 3/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - accuracy: 0.9797 - loss: 0.1676
Epoch 4/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - accuracy: 0.9813 - loss: 0.1643
Epoch 5/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - accuracy: 0.9816 - loss: 0.1638
Epoch 6/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.9840 - loss: 0.1590
Epoch 7/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - accuracy: 0.9846 - loss: 0.1578
Epoch 8/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.9841 - loss: 0.1586
Epoch 9/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[3

<keras.src.callbacks.history.History at 0x283fda050>

In [50]:
# Save the weights of the model for future use 
model.save_weights('/Users/apple/attention_GloV1125.weights.h5')

In [51]:
# Load the model 
model.load_weights('/Users/apple/attention_GloV1125.weights.h5')

In [48]:
m = X_testmod.shape[0]
s00 = np.zeros((m, n_s))
c00 = np.zeros((m, n_s))
outputs = list(y_testmod.swapaxes(0,1))
model.evaluate([X_testmod,s00,c00], outputs)

[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9025 - loss: 0.3168  


[0.32937073707580566, 0.8933333158493042]

In [49]:
m = X_testmod[0:30].shape[0]
s00 = np.zeros((m, n_s))
c00 = np.zeros((m, n_s))
predictions = model.predict([X_testmod[0:30], s00, c00])
predictions = np.argmax(predictions, axis = -1)
output = [ranking[int(x)] for x in predictions]
for i in range(len(output)): 
    print(f"Comment: {X_test[i]}\n, Ranking: {y_test[i]}, prediction: {output[i]}\n\n")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 131ms/step
Comment: ['waaa', 'interesting', 'picture', 'thank', 'canva', 'helped', 'edit', 'picture', 'thanks', 'give', 'stars']
, Ranking: Positive, prediction: Positive


Comment: ['app', 'best', 'app', 'ever', 'used', 'requesting', 'let', 'us', 'add', 'music', 'also', 'let', 'us', 'search', 'music', 'add', 'keep']
, Ranking: Positive, prediction: Positive


Comment: ['great', 'app', 'totally', 'recommend', 'small', 'business', 'owners', 'etc', 'would', 'get', 'like', 'templates', 'resizing', 'options', 'free', 'version', 'plz']
, Ranking: Negative, prediction: Negative


Comment: ['love', 'designs', 'may', 'free', 'thanks', 'canva', 'team']
, Ranking: Positive, prediction: Positive


Comment: ['useful', 'app', 'indeed', 'everyone', 'making', 'design', 'needs', 'love']
, Ranking: Positive, prediction: Positive


Comment: ['nice', 'logo', 'thumbnails', 'editing', 'facebook', 'instagram', 'tube']
, Ranking: Positive, predict

Note: The length of the sequence is set to Tx. Sequences shorter than that length will be given extra padding; these paddings are then represented by a vector of zeros in our embeddings. However, note that when Softmax is making a prediction on the input, it will also assign a probability based on the paddings added. It is desireable to find a way in which the model does not look at the embeddings of all ones for its predictions. On the other hand, for the sequential learning, when an RNN is fed the data, the model is lastly fed the extra paddings. So the information about the beginning of the sequence wanes off after multiple loops of ones. One solution would be for the seuqential model to stop if it reaches vectors of all ones and jump to making a prediction. For the dot product attention model, we can replace the embeddings by a vector of all zeros. 