<a href="https://colab.research.google.com/github/studentsept/-Project-Solution-C116/blob/main/Project%20C-115.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Project description :** 

1) In google colab, you have to import the product dataset.

2) Perform tokenization and padding operations on the dataset.

3) Create a Machine learning model out of it.

4) Predict the sentiments associated with the customer reviews using that model.

5) Once this thing works in google colab, you can download the model and the dataset, and try it out in Visual Studio code editor as well.

6) If you want to try it out in Visual studio code editor as well, the link for project solution is given in the last cell.


In [None]:
!git clone https://github.com/procodingclass/product_dataset.git

fatal: destination path 'product_dataset' already exists and is not an empty directory.


In [None]:
import pandas as pd
dataframe = pd.read_excel("/content/product_dataset/updated_product_dataset.xlsx")
dataframe.head()

Unnamed: 0,Emotion,Text
0,Positive,close approximation red octane mat bought one ...
1,Neutral,little lumpy mat great foam padding itâ€™s use...
2,Positive,great pad love ddr not want metal pad get work...
3,Positive,excellent pad great product highly responsive ...
4,Positive,awesome great ddr pad works perfectly pc stepm...


In [None]:
dataframe["Emotion"].unique()

array(['Positive', 'Neutral', 'Negative'], dtype=object)

In [None]:
encode_emotions = {"Neutral": 0, "Positive": 1, "Negative": 2}

In [None]:
dataframe.replace(encode_emotions, inplace = True)
dataframe.head()

Unnamed: 0,Emotion,Text
0,1,close approximation red octane mat bought one ...
1,0,little lumpy mat great foam padding itâ€™s use...
2,1,great pad love ddr not want metal pad get work...
3,1,excellent pad great product highly responsive ...
4,1,awesome great ddr pad works perfectly pc stepm...


In [None]:
# Convert Dataframe to list of dataset

training_sentences = []
training_labels = []

for i in range(len(dataframe)):
  sentence = dataframe.loc[i, "Text"]
  training_sentences.append(sentence)
  label = dataframe.loc[i, "Emotion"]
  training_labels.append(label)


In [None]:
training_sentences[10], training_labels[10]

('arrived early included blank case wont able test game get switch tried brothers device recognized card smash digital download already arrived ahead schedule well packed even blank case wasnt expectingread full review',
 1)

In [None]:
# Tokenization : Importing Tokenizer from tensorflow

import tensorflow as tf

from tensorflow.keras.preprocessing.text import Tokenizer

# Defining parameters for Tokenizer
vocab_size = 40000
embedding_dim = 16
oov_tok = "<OOV>"
training_size = 20000

tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)

# Create a word_index dictionary
word_index = tokenizer.word_index

training_sequences = tokenizer.texts_to_sequences(training_sentences)

In [None]:
# pad sequences
from tensorflow.keras.preprocessing.sequence import pad_sequences

padding_type='post'
max_length = 100
trunc_type='post'


training_padded = pad_sequences(training_sequences, maxlen=max_length, 
                                padding=padding_type, truncating=trunc_type)

In [None]:
import numpy as np

training_padded = np.array(training_padded)
training_labels = np.array(training_labels)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.layers import Conv1D, Dropout, MaxPooling1D

model = tf.keras.Sequential([
        Embedding(vocab_size, embedding_dim, input_length=max_length),
        Dropout(0.2),
        Conv1D(filters = 256, kernel_size = 3, activation = "relu"),
        MaxPooling1D(pool_size = 3),
        Conv1D(filters = 128, kernel_size = 3, activation = "relu"),
        MaxPooling1D(pool_size = 3),
        LSTM(128),
        Dense(128, activation = "relu"),
        Dropout(0.2),
        Dense(64, activation = "relu"),
        Dense(3, activation = "softmax")
])
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

In [None]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 100, 16)           640000    
                                                                 
 dropout_2 (Dropout)         (None, 100, 16)           0         
                                                                 
 conv1d_2 (Conv1D)           (None, 98, 256)           12544     
                                                                 
 max_pooling1d_2 (MaxPooling  (None, 32, 256)          0         
 1D)                                                             
                                                                 
 conv1d_3 (Conv1D)           (None, 30, 128)           98432     
                                                                 
 max_pooling1d_3 (MaxPooling  (None, 10, 128)          0         
 1D)                                                  

In [None]:
num_epochs = 30
history = model.fit(training_padded, training_labels, epochs=num_epochs, verbose=2)

Epoch 1/30
671/671 - 36s - loss: 0.2740 - accuracy: 0.9266 - 36s/epoch - 53ms/step
Epoch 2/30
671/671 - 33s - loss: 0.1600 - accuracy: 0.9487 - 33s/epoch - 50ms/step
Epoch 3/30
671/671 - 33s - loss: 0.1159 - accuracy: 0.9586 - 33s/epoch - 49ms/step
Epoch 4/30
671/671 - 33s - loss: 0.0900 - accuracy: 0.9666 - 33s/epoch - 50ms/step
Epoch 5/30
671/671 - 34s - loss: 0.0766 - accuracy: 0.9736 - 34s/epoch - 50ms/step
Epoch 6/30
671/671 - 34s - loss: 0.0586 - accuracy: 0.9800 - 34s/epoch - 50ms/step
Epoch 7/30
671/671 - 36s - loss: 0.0529 - accuracy: 0.9818 - 36s/epoch - 53ms/step
Epoch 8/30
671/671 - 35s - loss: 0.0465 - accuracy: 0.9842 - 35s/epoch - 53ms/step
Epoch 9/30
671/671 - 35s - loss: 0.0389 - accuracy: 0.9876 - 35s/epoch - 52ms/step
Epoch 10/30
671/671 - 35s - loss: 0.0361 - accuracy: 0.9885 - 35s/epoch - 52ms/step
Epoch 11/30
671/671 - 35s - loss: 0.0321 - accuracy: 0.9904 - 35s/epoch - 52ms/step
Epoch 12/30
671/671 - 34s - loss: 0.0298 - accuracy: 0.9914 - 34s/epoch - 51ms/step
E

In [None]:
model.save("Customer_Review_Text_Emotion.h5")

In [None]:
sentence = ["Great phone do buy it. It is an awesome purchase with great battery life"]
sequences = tokenizer.texts_to_sequences(sentence)
padded = pad_sequences(sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)
result = model.predict(padded)
print(result)
label = np.argmax(result , axis=1)
label = int(label)

# encode_emotions = {"Neutral": 0, "Positive": 1, "Negative": 2}
for emotion in encode_emotions:
  if encode_emotions[emotion] == label:
    print(f"sentiment : {emotion} , label : {label}")

[[1.1348995e-07 9.9999988e-01 3.9972776e-09]]
sentiment : Positive , label : 1


**Link for project solution [Visual Studio code editor]**

1) Link : https://github.com/procodingclass/PRO-C115-Project-Reference-Code.git

2) Refer project document to learn how to run this project using Visual studio code editor.