# Sentiment Analysis with Deep Learning

# Phase 3- Prediction

This notebook consists the functions and code for scraping comments from Amazon website, pre-prosessing them and prediction with the model.  

### CHRISP-DM phase

Deployment phase for CRISP-DM can be found in this noteboook.

#### 6.Deployment

Generally this will mean deploying a code representation of the model into an operating system to score or categorize new unseen data as it arises and to create a mechanism for the use of that new information in the solution of the original business problem. Importantly, the code representation must also include all the data prep steps leading up to modeling so that the model will treat new raw data in the same manner as during model development.

### Table of Contents

- 1.Import Libraries
- 2.Define Functions 
- 3.Scraping Comments
- 4.Pre-processing the New Data
- 5.Prediction

## 1. Import Libraries

In [1]:
import pandas as pd 
import numpy as np
import string
from nltk.corpus import stopwords
stop = stopwords.words('english')
from sklearn.metrics import accuracy_score
np.random.seed(0)
import pickle

from keras.models import Model, Sequential, Input
from keras.models import load_model
from keras.preprocessing import text, sequence
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing import text, sequence

import urllib.request
import urllib.parse
import urllib.error
from bs4 import BeautifulSoup
import ssl
import joblib

Using TensorFlow backend.


### Load the trained model

In [2]:
#model = load_model('cnn_2cnv.h5')

model = joblib.load("models/joblib_RL_Model.pkl")

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


In [3]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 128, 64)           524288    
_________________________________________________________________
spatial_dropout1d_1 (Spatial (None, 128, 64)           0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 128, 100)          25700     
_________________________________________________________________
batch_normalization_1 (Batch (None, 128, 100)          400       
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 100)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 50)               

### Load the tokenizer

Open the saved tokenizer with pickle. This tokenizer was trained in the pre-prosessing notebook and saved with pickle. We need this for creating the pipe line for the new data we will scrape from Amazon website.

In [4]:
with open('tokenizer.pickle', 'rb') as handle:
    tokenizer=pickle.load(handle)

### 2. Functions

In [5]:
def scrape_reviews ():
    """This function scrapes customer reviews from Amazon web 
    site at a given product page URL. Uses input method to receive
    the URL from the user"""
    # For ignoring SSL certificate errors
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    url=input("Enter Amazon Product Url- ")
    html_page = urllib.request.urlopen(url) #Make a get request to retrieve the page
    soup = BeautifulSoup(html_page, 'html.parser')

    reviews=[]
    ratings=[]

    review_row=soup.findAll('div', attrs={'data-hook': 'review'})
    for row in review_row:
        ratings.append(row.find('span',  attrs={'class':'a-icon-alt'}).text.strip()[0])
        reviews.append(row.find('div', attrs={'data-hook': 'review-collapsed'}).text.strip())
    
    print('There are {} reviews in for this product on this page'.format(len(reviews)) )
    #print(reviews)
    
    return reviews, ratings

def punctuationRemover(p):
    '''
    Input: Takes a string. You may have to use str() to force it. 
    Removes all punctuation by checking every single character.
    Output: Returns a string.
    '''
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~1234567890''' 
    no_punctuations = ''

    for words in p: # You may not have to loop this high
        for char in words:
            if char in punctuations:
                no_punctuations = no_punctuations + ' '
            if char not in punctuations:
                no_punctuations = no_punctuations + char    
    return(no_punctuations)


def removeStopWords(str):
    """it takes a string and removes the stopwords from it. 
    Stopwords are available in the "stop" list"""
    #select english stopwords
    stop = set(stopwords.words("english"))
    #add custom words
    stop.update(('arnt','this','when','cant','these'))
    #remove stop words
    new_str = ' '.join([word.lower() for word in str if word.lower() not in stop]) 
    return new_str

def lemmatize_verbs(words):
    """Lemmatize verbs in list of tokenized words"""
    lemmatizer = WordNetLemmatizer()
    lemmas = []
    for word in words:
        lemma = lemmatizer.lemmatize(word, pos='v')
        lemmas.append(lemma)
    return lemmas  

## 3. Scraping Comments 

In this part we will scrape the comments from the website for a product. We will also scrape the rating to evaluate the model's prediction fro unseen data.

In [6]:
review_list, ratings = scrape_reviews()

Enter Amazon Product Url- https://www.amazon.com/Amplified-Digital-Antenna-Skywire-Antennas/dp/B07DK1M5JF/ref=lp_3230976011_1_2_sspa?s=tv&ie=UTF8&qid=1582172255&sr=1-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExUTFQT0xCUVdHOERXJmVuY3J5cHRlZElkPUEwMjY5OTU4OE5BTFdMQVoyNUEmZW5jcnlwdGVkQWRJZD1BMDk4NzQ2MzNPRFQ5SUJDNVVINEYmd2lkZ2V0TmFtZT1zcF9hdGZfYnJvd3NlJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ==
There are 6 reviews in for this product on this page


In [7]:
print(review_list, ratings)

['I like that this antenna is light weight.--- It has a long chord that will reach windows where you can get the most of your viewing experiences...--- The picture quality is nice and works great with HD broadcasting.--- It has a large receptor plane which is nice BC it is attractive and never an eye sore!--- If you spend the time to understand the technology- you will grow to love this NIFTY antenna...--- The antenna works well with my LG smart TV, the TV has CH+ so the Antenna is more than able to comply with all demands.We love the football games on our 43" TV- the Picture quality- is outstanding and the refresh rate is never a problem with this antenna...The antenna works well with my LG smart TV, the TV has CH+ so the Antenna is more than able to comply with all demands.I also like that the seller provides excellent services after the sell.', 'Great amplified had digital antennae. It helps us get the local stations and some others in clearly. It’s easy to set up. We are pleased wi

## 4. Pre-process the New Data

Remove all stopwords and punctuation from the comment by using the pre-processing functions. Also split in to words list. 

In [10]:
review_list_no_punc=[punctuationRemover(p) for p in review_list]

In [13]:
review_list_no_stop=[removeStopWords(p) for p in review_list_no_punc]

In [14]:
test=review_list_no_stop

In [15]:
test_vektor=tokenizer.texts_to_sequences(test)
test_vector=sequence.pad_sequences(test_vektor, maxlen=128)

In [16]:
test_vector

array([[ 675, 1100, 1718,  466,  845,  512,  535,  675,  466, 4186,  535,
        1285,  535,  890,  890,  803,  890, 1718,  466,  675,  466, 1129,
         675,  466, 1718,  675,  466,  890,  466, 1100,  466,  675,  845,
         675,  435, 1285,  466,  816, 1718, 1718,  890,  466,  890,  890,
        1718,  466,  890,  466,  890,  890,  816,  675,  988,  816,  466,
        1285, 1285,  816, 1718, 1285,  803,  675, 1100, 1718,  466, 1100,
        1718,  512, 1718, 1718,  466,  890,  466,  890,  890,  675,  466,
        1718,  890,  435, 1285,  466,  512,  845, 1285,  816, 1718, 1285,
        1285,  466,  890, 1285, 1285,  988,  466, 1718, 1718,  466,  466,
        1285, 1285,  466,  675,  845,  675, 1100,  466,  466,  531,  512,
         466, 1285, 1285,  466,  890,  466,  675, 1100,  512,  466, 1129,
         466,  675, 1718,  466,  466, 1285, 1285],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,

## 5. Prediction

In [17]:
prediction=model.predict([test_vector])

In [None]:
print(model.predict([test_vector]))
print([round(prediction[0][0]) for prediction[0][0] in prediction])

In [None]:
ratings

### Scraping and predicting multiple customer reviews from a website

In [20]:
def web_comments ():
    review_list, ratings = scrape_reviews() 
    review_list_no_punc=[punctuationRemover(p) for p in review_list]
    review_list_no_stop=[removeStopWords(p) for p in review_list_no_punc]
    test=review_list_no_stop
    test_vektor=tokenizer.texts_to_sequences(test)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    prediction=model.predict([test_vector])
    
    neg_com=[]
    pos_com=[]
    label=[int(round(prediction[0][0])) for prediction[0][0] in prediction]
    labels=[]
    for i in label:
        if i==1:
            labels.append('P')
        else: 
            labels.append('N')
        
    for j,i in enumerate(labels):
        if i=='N':
            neg_com.append(review_list[j])
            
            
    for j,i in enumerate(labels):
        if i=='P':
            pos_com.append(review_list[j])
   
    print ("")
    print ("Predictions")
    print(labels)
    print ("Actual Rates")
    print(ratings)
    print ("")
    
    print("List of negative comments")
    print("============================")
    for i in neg_com:
        print (i)
        print ("")
    


## Live Demonstration

In [21]:
web_comments()


Enter Amazon Product Url- https://www.amazon.com/Amplified-Digital-Antenna-Skywire-Antennas/dp/B07DK1M5JF/ref=lp_3230976011_1_2_sspa?s=tv&ie=UTF8&qid=1582172255&sr=1-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExUTFQT0xCUVdHOERXJmVuY3J5cHRlZElkPUEwMjY5OTU4OE5BTFdMQVoyNUEmZW5jcnlwdGVkQWRJZD1BMDk4NzQ2MzNPRFQ5SUJDNVVINEYmd2lkZ2V0TmFtZT1zcF9hdGZfYnJvd3NlJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ==
There are 6 reviews in for this product on this page

Predictions
['P', 'P', 'P', 'P', 'P', 'P']
Actual Rates
['5', '5', '5', '5', '5', '5']

List of negative comments


## Conclusion

I have created a function that scrapes comments from a product page on Amazon website.

You can easily use this model for classification for any amazon product. 
All you need is;

- Run the web-comments  function
- It will ask you to input the amazon product page URL
- Just copy and paste the url and press ENTER

The function will give you how many comments are there in total. It also labels each one of them for you as P or N. 
And filters the Negative one so that you would see why your customer is complaining .

To get valuable insight to revise your customer service you would read and  analyze  only 3 comments instead of 20.

I provides huge time and money saving in the long run


## Future Work
***Work in progress***

### Predicting the tone of a single comment such as email or text message

In [None]:
def single_comment ():
    comment=input("Enter comment here- ")
    
    comment_nopunc=[punctuationRemover(comment)]
    comment_no_stop= "".join([word.lower() for word in comment_nopunc if  word.lower() not in (stop)])
    print(comment_no_stop)

    test=comment_no_stop
    test_vektor=tokenizer.texts_to_sequences(test)
    print(test_vektor)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    print(test_vector)
    prediction=model.predict([test_vector])

    print (int(round(prediction[0][0])))

single_comment()

In [None]:
#comment=input("Enter comment here- ")

comment=["I love this product!"]
    
comment_nopunc=[punctuationRemover(comment)]
comment_no_stop= "".join([word.lower() for word in comment_nopunc if  word.lower() not in (stop)])
print(comment_no_stop)

test=comment_no_stop
test_vektor=tokenizer.texts_to_sequences (test)                             
print(test_vektor)
test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
prediction=model.predict([test_vector])

print (int(round(prediction[0][0])))

## Creating Web App for the model 

***Work in progress***

For customer use I would  Create a web app out of this prediction model.
Web scraping can be adapted to any product website other than Amazon. 


### Connecting the prediction to Anvil 
I would like to run this prediction as a web app. I used one interface to get input and give the results to the user. It is called Anvil. 

In [None]:
import anvil.server

anvil.server.connect("O5X2QNJXLPWEQ2MQIAJTAQHP-AYZTXNLHKK3ZZOB2")


In [None]:
@anvil.server.callable
def say_hello(name):
  print("Hello from the uplink, %s!" % name)

anvil.server.wait_forever()

In [None]:
import anvil.media
@anvil.server.collable
def sentiment(file):
    with anvil.media.Tempfile(file) as filename:
        text = url_box(filename)
        
        
    test_vektor=tokenizer.texts_to_sequences(text)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    
    score=model.predict(text_vector)
    return score

In [None]:
$ ipython nbconvert --to FORMAT notebook.ipynb


There is a problem in the connection. It kept throwing errors on the Anvil platform. So moved on. 

### Creatin app with SPYRE 

In [None]:
from spyre import server

app=server.App()

app.Launch()

In [1]:
!pip install spyre

Collecting spyre
[31m  ERROR: Could not find a version that satisfies the requirement spyre (from versions: none)[0m
[31mERROR: No matching distribution found for spyre[0m


There is a version missmatch here I guess. 

### Using MLFlow for machine learnig life cycle 

I will try to redo this project in this a platform to see the difference and also for easy deployment. 