#                                      DISNEYLAND DREAMS - Analyzing Sentiments and Creating Personalized Chatbot Experiences

This project titled, “DISNEYLAND DREAMS – Analyzing Sentiments and creating Personalized Chatbot Experiences” aims in leveraging Natural Language Processing techniques to analyze customer sentiments expressed in textual data such as reviews and feedback with respect to Disneyland Park locations. 

This analysis aims to gain insights on visitor perceptions, emotions, and preferences regarding attractions and experiences at Disneyland locations in California, Paris, and Hong Kong. It can be accomplished by delving into the architecture of neural network model for understanding and predicting the user emotions from text messages.

It is an important factor when it comes to monitoring Customer Satisfaction, Brand Reputation management, Predictive Analytics on future trends and personalized customer experience.

# Sentimental Analysis using Recurrent Neural Network variants:

**Importing basic libraries**:

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

**Defining Disney dataset**:

In [3]:
balanced_disneydf = pd.read_csv(r'C:\Users\priya\Downloads\Projectwork\Balanced_Disneydf.csv')  
balanced_disneydf

Unnamed: 0,Rating,Review_Text,Branch,Sentiment,Stemmed_Text,Lemmatized_Text
0,4,Had a better than expected time with my 2.5 ye...,Disneyland_California,1,better expect time year old girl infant son ...,better expected time year old girl infant so...
1,5,Disney Land is the perfect home away from home...,Disneyland_California,1,disney land perfect home away home moment wal...,disney land perfect home away home moment wal...
2,5,Truly Disney..... A place that showcases abou...,Disneyland_HongKong,1,truli disney place showcas disney charact lo...,truly disney place showcase disney character ...
3,4,My wife and I visited Disneyland Park at the s...,Disneyland_Paris,1,wife visit disneyland park start novemb ever ...,wife visited disneyland park start november e...
4,3,As it says I've been here 3 times and this was...,Disneyland_Paris,1,say time worst yet level ride closur peak s...,say time worst yet level ride closure peak ...
...,...,...,...,...,...,...
7621,2,I will start off by saying that comments aroun...,Disneyland_Paris,0,start say comment around non disney thing most...,start saying comment around non disney thing m...
7622,2,Disneyland is a great place to spend time with...,Disneyland_Paris,0,disneyland great place spend time children rea...,disneyland great place spend time child ready ...
7623,2,This was my first trip to Disneyland and I was...,Disneyland_Paris,0,first trip disneyland pleasantli surpris much ...,first trip disneyland pleasantly surprised muc...
7624,2,"The lines are low, thats good, the staff are r...",Disneyland_Paris,0,line low that good staff rude guess that eu...,line low thats good staff rude guess thats ...


In [4]:
balanced_disneydf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7626 entries, 0 to 7625
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Rating           7626 non-null   int64 
 1   Review_Text      7626 non-null   object
 2   Branch           7626 non-null   object
 3   Sentiment        7626 non-null   int64 
 4   Stemmed_Text     7626 non-null   object
 5   Lemmatized_Text  7626 non-null   object
dtypes: int64(2), object(4)
memory usage: 357.6+ KB


In [5]:
# To identify the number of rows and columns in a dataset
balanced_disneydf.shape

(7626, 6)

In [6]:
# Check for null values
balanced_disneydf.isnull().sum()

Rating             0
Review_Text        0
Branch             0
Sentiment          0
Stemmed_Text       0
Lemmatized_Text    0
dtype: int64

## Sentimental Analysis using Recurrent Neural Network:

In [7]:
from keras.preprocessing.text import one_hot
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.callbacks import Callback
from sklearn.model_selection import train_test_split
from keras.layers import Dense,Embedding,LSTM,Dropout,BatchNormalization,Flatten
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import regularizers
from keras.initializers import glorot_normal




In [8]:
rnn_disneydf = balanced_disneydf.copy()
rnn_disneydf.drop(columns=['Rating','Review_Text'],inplace=True,axis=1)
rnn_disneydf

Unnamed: 0,Branch,Sentiment,Stemmed_Text,Lemmatized_Text
0,Disneyland_California,1,better expect time year old girl infant son ...,better expected time year old girl infant so...
1,Disneyland_California,1,disney land perfect home away home moment wal...,disney land perfect home away home moment wal...
2,Disneyland_HongKong,1,truli disney place showcas disney charact lo...,truly disney place showcase disney character ...
3,Disneyland_Paris,1,wife visit disneyland park start novemb ever ...,wife visited disneyland park start november e...
4,Disneyland_Paris,1,say time worst yet level ride closur peak s...,say time worst yet level ride closure peak ...
...,...,...,...,...
7621,Disneyland_Paris,0,start say comment around non disney thing most...,start saying comment around non disney thing m...
7622,Disneyland_Paris,0,disneyland great place spend time children rea...,disneyland great place spend time child ready ...
7623,Disneyland_Paris,0,first trip disneyland pleasantli surpris much ...,first trip disneyland pleasantly surprised muc...
7624,Disneyland_Paris,0,line low that good staff rude guess that eu...,line low thats good staff rude guess thats ...


In [9]:
# Convert the columns of a dataframe into numpy arrays to perform One hot encoding
labels = rnn_disneydf['Sentiment'].values
texts  = rnn_disneydf['Stemmed_Text'].values
print(labels)
#labels1 = rnn_disneydf['Sentiment']
#print(labels1)
labels.shape
#np.array(labels)

[1 1 1 ... 0 0 0]


(7626,)

In [10]:
# Create a Tokenizer object and fit it on the texts
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
print(tokenizer.word_index)



In [11]:
# Convert texts to sequences of integers
sequenced_txt = tokenizer.texts_to_sequences(texts)
#print(sequenced_txt)

total_words = len(tokenizer.word_index) + 1
print("Total Unique words: ",total_words)
print("Maximum length of text is ",max([len(x) for x in sequenced_txt]))

Total Unique words:  18528
Maximum length of text is  1350


In [12]:
# Padding each encoded sentence to have a max_length
max_length= 1350
padded_txt = pad_sequences(sequenced_txt,max_length,padding="post")
print(padded_txt)

[[  83   76    4 ...    0    0    0]
 [   3  151  595 ...    0    0    0]
 [ 419    3   17 ...    0    0    0]
 ...
 [  57   84    6 ...    0    0    0]
 [  14  698 1404 ...    0    0    0]
 [  12  128    8 ...    0    0    0]]


In [13]:
from keras.utils import to_categorical

# Perform Label encoding
encoded_labels = to_categorical(labels)
print("Encoded Labels:",encoded_labels)
encoded_labels.shape

Encoded Labels: [[0. 1.]
 [0. 1.]
 [0. 1.]
 ...
 [1. 0.]
 [1. 0.]
 [1. 0.]]


(7626, 2)

In [14]:
# Splitting the dataset into training and testing sets
#rnn_X_train,rnn_X_test,rnn_y_train,rnn_y_test = train_test_split(padded_txt,np.array(labels),test_size=0.2)
rnn_X_train,rnn_X_test,rnn_y_train,rnn_y_test = train_test_split(padded_txt,encoded_labels,test_size=0.2)
print(rnn_X_train.shape)
print(rnn_y_train.shape)
print(rnn_X_test.shape)
print(rnn_y_test.shape)

(6100, 1350)
(6100, 2)
(1526, 1350)
(1526, 2)


## Model Training and Evaluation:

## Long Short Term Memory (LSTM) model  building and training:

In [20]:
# Define a RNN LSTM model

model = Sequential()
embedded_vector_size = 1350
model.add(Embedding(18528,embedded_vector_size,input_length=max_length,embeddings_initializer=glorot_normal()))
model.add(LSTM(128, return_sequences=True,dropout=0.6))
#model.add(LSTM(100, return_sequences=True,dropout=0.6))
#model.add(LSTM(100, return_sequences=True,dropout=0.6))
#model.add(LSTM(100, dropout=0.6))
#model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(2, activation='softmax'))

In [21]:
# Compiling the model
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=["accuracy"])
#model.compile(optimizer='adam', loss='binary_crossentropy',metrics=["accuracy"])
print(model.summary())
print("Model Creation Completed !")

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 1350, 1350)        25012800  
                                                                 
 lstm_1 (LSTM)               (None, 1350, 128)         757248    
                                                                 
 dense_2 (Dense)             (None, 1350, 64)          8256      
                                                                 
 dense_3 (Dense)             (None, 1350, 2)           130       
                                                                 
Total params: 25778434 (98.34 MB)
Trainable params: 25778434 (98.34 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None
Model Creation Completed !


## Model Training and Evaluation:

In [18]:
history = model.fit(rnn_X_train,
                    rnn_y_train,
                    validation_data=(rnn_X_test,rnn_y_test),
                    #validation_split = 0.2,
                    epochs = 15,
                    batch_size = 64)
                    #callbacks = EarlyStopping(monitor ='val_loss',
                                              #patience = 3,
                                              #restore_best_weights = True))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [19]:
# Evaluate the model on test data
loss, accuracy = model.evaluate(rnn_X_test, rnn_y_test)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Test Loss: 1.5404380559921265
Test Accuracy: 0.8381389379501343
