# Sentiment Analysis with Deep Learning

# Phase 3- Prediction

This notebook consists the functions and code for scraping comments from Amazon website, pre-prosessing them and prediction with the model.  

### CHRISP-DM phase

Deployment phase for CRISP-DM can be found in this noteboook.

#### 6.Deployment

Generally this will mean deploying a code representation of the model into an operating system to score or categorize new unseen data as it arises and to create a mechanism for the use of that new information in the solution of the original business problem. Importantly, the code representation must also include all the data prep steps leading up to modeling so that the model will treat new raw data in the same manner as during model development.

### Table of Contents

- 1.Import Libraries
- 2.Define Functions 
- 3.Scraping Comments
- 4.Pre-processing the New Data
- 5.Prediction

## 1. Import Libraries

In [None]:
import pandas as pd 
import numpy as np
import string
from nltk.corpus import stopwords
stop = stopwords.words('english')
from sklearn.metrics import accuracy_score
np.random.seed(0)
import pickle

from keras.models import Model, Sequential, Input
from keras.models import load_model
from keras.preprocessing import text, sequence
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing import text, sequence

import urllib.request
import urllib.parse
import urllib.error
from bs4 import BeautifulSoup
import ssl
import joblib

### Load the trained model

In [None]:
#model = load_model('cnn_2cnv.h5')

model = joblib.load("models/joblib_RL_Model.pkl")

In [None]:
model.summary()

### Load the tokenizer

Open the saved tokenizer with pickle. This tokenizer was trained in the pre-prosessing notebook and saved with pickle. We need this for creating the pipe line for the new data we will scrape from Amazon website.

In [None]:
with open('tokenizer.pickle', 'rb') as handle:
    tokenizer=pickle.load(handle)

### 2. Functions

In [None]:

def scrape_reviews ():
    # For ignoring SSL certificate errors
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    url=input("Enter Amazon Product Url- ")
    html_page = urllib.request.urlopen(url) #Make a get request to retrieve the page
    soup = BeautifulSoup(html_page, 'html.parser')

    reviews=[]
    ratings=[]

    review_row=soup.findAll('div', attrs={'data-hook': 'review'})
    for row in review_row:
        ratings.append(row.find('span',  attrs={'class':'a-icon-alt'}).text.strip()[0])
        reviews.append(row.find('div', attrs={'data-hook': 'review-collapsed'}).text.strip())
    
    print('There are {} reviews in for this product on this page'.format(len(reviews)) )
    #print(reviews)
    
    return reviews, ratings

def punctuationRemover(p):
    '''
    Input: Takes a string. You may have to use str() to force it. 
    Removes all punctuation by checking every single character.
    Output: Returns a string.
    '''
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~1234567890''' 
    no_punctuations = ''

    for words in p: # You may not have to loop this high
        for char in words:
            if char in punctuations:
                no_punctuations = no_punctuations + ' '
            if char not in punctuations:
                no_punctuations = no_punctuations + char    
    return(no_punctuations)

def no_stopword (p):
    token= ' '.join([word.lower() for word in p.split() if word.lower not in (stop)])
    return token

def removeStopWords(str):
    #select english stopwords
    cachedStopWords = set(stopwords.words("english"))
    #add custom words
    cachedStopWords.update(('arnt','this','when','cant','these'))
    #remove stop words
    new_str = ' '.join([word.lower() for word in str if word.lower() not in cachedStopWords]) 
    return new_str



def lemmatize_verbs(words):
    """Lemmatize verbs in list of tokenized words"""
    lemmatizer = WordNetLemmatizer()
    lemmas = []
    for word in words:
        lemma = lemmatizer.lemmatize(word, pos='v')
        lemmas.append(lemma)
    return lemmas

    

## 3. Scraping Comments 

In this part we will scrape the comments from the website for a product. We will also scrape the rating to evaluate the model's prediction fro unseen data.

In [None]:
review_list, ratings = scrape_reviews()



In [None]:
print(review_list, ratings)

## 4. Pre-process the New Data

Remove all stopwords and punctuation from the comment by using the pre-processing functions. Also split in to words list. 

In [None]:
review_list_no_punc=[punctuationRemover(p) for p in review_list]

In [None]:
review_list_no_stop=[no_stopword(p) for p in review_list_no_punc]

In [None]:
test=review_list_no_stop

In [None]:
test_vektor=tokenizer.texts_to_sequences(test)
test_vector=sequence.pad_sequences(test_vektor, maxlen=128)

In [None]:
test_vector

## 5. Prediction

In [None]:
prediction=model.predict([test_vector])

In [None]:
print(model.predict([test_vector]))
print([round(prediction[0][0]) for prediction[0][0] in prediction])

In [None]:
ratings

### Scraping and predicting multiple customer reviews from a website

In [142]:
def web_comments ():
    review_list, ratings = scrape_reviews() 
    review_list_no_punc=[punctuationRemover(p) for p in review_list]
    review_list_no_stop=[no_stopword(p) for p in review_list_no_punc]
    test=review_list_no_stop
    test_vektor=tokenizer.texts_to_sequences(test)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    prediction=model.predict([test_vector])
    
    neg_com=[]
    pos_com=[]
    label=[int(round(prediction[0][0])) for prediction[0][0] in prediction]
    labels=[]
    for i in label:
        if i==1:
            labels.append('P')
        else: 
            labels.append('N')
        
    for i in label:
        if i==0:
            neg_com.append(review_list[i])
            
            
    for i in label:
        if i==1:
            pos_com.append(review_list[i])
   
    print ("")
    print ("Predictions")
    print(labels)
    print ("Actual Rates")
    print(ratings)
    print ("")
    
    print("List of negative comments")
    print("============================")
    for i in neg_com:
        print (i)
        print ("")
    
web_comments()

KeyboardInterrupt: 

### Predicting the tone of a single comment 

In [129]:
def single_comment ():
    comment=input("Enter comment here- ")
    
    comment_nopunc=[punctuationRemover(comment)]
    comment_no_stop= "".join([word.lower() for word in comment_nopunc if  word.lower() not in (stop)])
    print(comment_no_stop)

    test=comment_no_stop
    test_vektor=tokenizer.texts_to_sequences(test)
    print(test_vektor)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    print(test_vector)
    prediction=model.predict([test_vector])

    print (int(round(prediction[0][0])))



single_comment()

Enter comment here- love love lobe
love love lobe
[[1285], [], [1100], [466], [], [1285], [], [1100], [466], [], [1285], [], [435], [466]]
[[   0    0    0 ...    0    0 1285]
 [   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0 1100]
 ...
 [   0    0    0 ...    0    0    0]
 [   0    0    0 ...    0    0  435]
 [   0    0    0 ...    0    0  466]]
0


In [135]:
#comment=input("Enter comment here- ")

comment="I love this product!"
    
comment_nopunc=[punctuationRemover(comment)]
comment_no_stop= "".join([word.lower() for word in comment_nopunc if  word.lower() not in (stop)])
print(comment_no_stop)

test=comment_no_stop
test_vektor=tokenizer.texts_to_sequences (test)                             
print(test_vektor)
test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
prediction=model.predict([test_vector])

print (int(round(prediction[0][0])))

i love this product 
[[], [], [1285], [], [1100], [466], [], [], [1718], [], [], [], [845], [675], [], [], [535], [512], [], []]
0


### Connecting the prediction to Anvil 
I would like to run this prediction as a web app. I used one interface to get input and give the results to the user. It is called Anvil. 

In [None]:
import anvil.server

anvil.server.connect("O5X2QNJXLPWEQ2MQIAJTAQHP-AYZTXNLHKK3ZZOB2")


In [None]:
@anvil.server.callable
def say_hello(name):
  print("Hello from the uplink, %s!" % name)

anvil.server.wait_forever()

In [None]:
import anvil.media
@anvil.server.collable
def sentiment(file):
    with anvil.media.Tempfile(file) as filename:
        text = url_box(filename)
        
        
    test_vektor=tokenizer.texts_to_sequences(text)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    
    score=model.predict(text_vector)
    return score