# Sentiment Analysis with Deep Learning

# Phase 3- Prediction

This notebook consists the functions and code for scraping comments from Amazon website, pre-prosessing them and prediction with the model.  

### CHRISP-DM phase

Deployment phase for CRISP-DM can be found in this noteboook.

#### 6.Deployment

Generally this will mean deploying a code representation of the model into an operating system to score or categorize new unseen data as it arises and to create a mechanism for the use of that new information in the solution of the original business problem. Importantly, the code representation must also include all the data prep steps leading up to modeling so that the model will treat new raw data in the same manner as during model development.

### Table of Contents

- 1.Import Libraries
- 2.Define Functions 
- 3.Scraping Comments
- 4.Pre-processing the New Data
- 5.Prediction

## 1. Import Libraries

In [None]:
import pandas as pd 
import numpy as np
import string
from nltk.corpus import stopwords
stop = stopwords.words('english')
from sklearn.metrics import accuracy_score
np.random.seed(0)
import pickle

from keras.models import Model, Sequential, Input
from keras.models import load_model
from keras.preprocessing import text, sequence
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing import text, sequence
import nltk
from nltk.stem import WordNetLemmatizer

import urllib.request
import urllib.parse
import urllib.error
from bs4 import BeautifulSoup
import ssl
import joblib

### Load the trained model

In [None]:
#model = load_model('cnn_2cnv.h5')

model = joblib.load("models/joblib_RL_Model.pkl")

In [None]:
model.summary()

### Load the tokenizer

Open the saved tokenizer with pickle. This tokenizer was trained in the pre-prosessing notebook and saved with pickle. We need this for creating the pipe line for the new data we will scrape from Amazon website.

In [None]:
with open('tokenizer.pickle', 'rb') as handle:
    tokenizer=pickle.load(handle)

### 2. Functions

In [None]:
def scrape_reviews ():
    """This function scrapes customer reviews from Amazon web 
    site at a given product page URL. Uses input method to receive
    the URL from the user"""
    # For ignoring SSL certificate errors
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    url=input("Enter Amazon Product Url- ")
    html_page = urllib.request.urlopen(url) #Make a get request to retrieve the page
    soup = BeautifulSoup(html_page, 'html.parser')

    reviews=[]
    ratings=[]

    review_row=soup.findAll('div', attrs={'data-hook': 'review'})
    for row in review_row:
        ratings.append(row.find('span',  attrs={'class':'a-icon-alt'}).text.strip()[0])
        reviews.append(row.find('div', attrs={'data-hook': 'review-collapsed'}).text.strip())
    
    print('There are {} reviews in for this product on this page'.format(len(reviews)) )
    #print(reviews)
    
    return reviews, ratings

def punctuationRemover(p):
    '''
    Input: Takes a string. You may have to use str() to force it. 
    Removes all punctuation by checking every single character.
    Output: Returns a string.
    '''
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~1234567890''' 
    no_punctuations = ''

    for words in p: # You may not have to loop this high
        for char in words:
            if char in punctuations:
                no_punctuations = no_punctuations + ' '
            if char not in punctuations:
                no_punctuations = no_punctuations + char    
    return(no_punctuations)


def removeStopWords(str):
    """it takes a string and removes the stopwords from it. 
    Stopwords are available in the "stop" list"""
    #select english stopwords
    stop = set(stopwords.words("english"))
    #add custom words
    stop.update(('arnt','this','when','cant','these'))
    #remove stop words
    new_str = ' '.join([word.lower() for word in str if word.lower() not in stop]) 
    return new_str

def lemmatize_verbs(words):
    """Lemmatize verbs in list of tokenized words"""
    lemmatizer = WordNetLemmatizer()
    lemmas = []
    for word in words:
        lemma = lemmatizer.lemmatize(word, pos='v')
        lemmas.append(lemma)
    return lemmas  

## 3. Scraping Comments 

In this part we will scrape the comments from the website for a product. We will also scrape the rating to evaluate the model's prediction fro unseen data.

In [None]:
review_list, ratings = scrape_reviews()

In [None]:
print(review_list, ratings)

## 4. Pre-process the New Data

Remove all stopwords and punctuation from the comment by using the pre-processing functions. Also split in to words list. 

In [None]:
#test
review_list=['This is for practicing your chords or fingering for a guitar, and does not make a pleasant sound if strummed. This is a neat tool to use on the go, if youre waiting in line at the DMV, passing time in a waiting room or something like that. Its great in the sense that you dont have to disturb other people around you while pracitcing.','Great product! Comfortable to use!']

In [None]:
review_list_no_punc=[punctuationRemover(p) for p in review_list]
review_list_no_punc=[nltk.word_tokenize(words) for words in review_list_no_punc]
review_list_no_stop=[removeStopWords(p) for p in review_list_no_punc]
review_list_no_stop=[nltk.word_tokenize(words) for words in review_list_no_stop]
review_list_lemm=[removeStopWords(p) for p in review_list_no_punc]

In [None]:
test=review_list_no_stop

In [None]:
print(test)

In [None]:
test_vektor=tokenizer.texts_to_sequences(test)
test_vector=sequence.pad_sequences(test_vektor, maxlen=128)

In [None]:
test_vector

## 5. Prediction

In [None]:
prediction=model.predict([test_vector])

In [None]:
print(model.predict([test_vector]))
print([round(prediction[0][0]) for prediction[0][0] in prediction])

In [None]:
ratings

## Web Comments

### Scraping and predicting multiple customer reviews from a website

In [None]:
def web_comments ():
    #scrape comments from the website
    review_list, ratings = scrape_reviews() 
    #pre-process the data
    review_list_no_punc=[punctuationRemover(p) for p in review_list]
    review_list_no_punc=[nltk.word_tokenize(words) for words in review_list_no_punc]
    review_list_no_stop=[removeStopWords(p) for p in review_list_no_punc]
    review_list_no_stop=[nltk.word_tokenize(words) for words in review_list_no_stop]
    review_list_lemm=[removeStopWords(p) for p in review_list_no_punc]
    
    test=review_list_lemm
    test_vektor=tokenizer.texts_to_sequences(test)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    prediction=model.predict([test_vector])

    neg_com=[]
    pos_com=[]
    label=[int(round(prediction[0][0])) for prediction[0][0] in prediction]
    labels=[]
    for i in label:
        if i==1:
            labels.append('P')
        else: 
            labels.append('N')
        
    for j,i in enumerate(labels):
        if i=='N':
            neg_com.append(review_list[j])
            
            
    for j,i in enumerate(labels):
        if i=='P':
            pos_com.append(review_list[j])
   
    print ("")
    print ("Predictions")
    print(labels)
    print ("Actual Rates")
    print(ratings)
    print ("")
    
    print("List of negative comments")
    print("============================")
    for i in neg_com:
        print (i)
        print ("")
    


In [80]:
web_comments()

Enter Amazon Product Url- https://www.amazon.com/Practice-Fretboard-Ohuhu-Fingerings-Changes/dp/B07X41KVZP?ref_=BSellerC&pf_rd_p=ee73cb28-41b3-5fca-80dd-bbb0229a3312&pf_rd_s=merchandised-search-10&pf_rd_t=101&pf_rd_i=11971311&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=FKB15JYBFWS3QSPPA0K1&pf_rd_r=FKB15JYBFWS3QSPPA0K1&pf_rd_p=ee73cb28-41b3-5fca-80dd-bbb0229a3312


HTTPError: HTTP Error 503: Service Unavailable

## Single Comment

### Predicting the tone of a single comment such as email or text message

In [None]:
#test
#review_list=['This is for practicing your chords or fingering for a guitar, and does not make a pleasant sound if strummed. This is a neat tool to use on the go, if youre waiting in line at the DMV, passing time in a waiting room or something like that. Its great in the sense that you dont have to disturb other people around you while pracitcing.','Great product! Comfortable to use!']

In [78]:
def single_comment ():
    review_list=[input("Enter Text - ")]
    review_list_no_punc=[punctuationRemover(p) for p in review_list]
    review_list_no_punc=[nltk.word_tokenize(words) for words in review_list_no_punc]
    review_list_no_stop=[removeStopWords(p) for p in review_list_no_punc]
    review_list_no_stop=[nltk.word_tokenize(words) for words in review_list_no_stop]
    review_list_lemm=[lemmatize_verbs(p) for p in review_list_no_punc]
    test=review_list_no_stop
    print(test)
    test_vektor=tokenizer.texts_to_sequences(test)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    test_vector
    prediction=model.predict([test_vector])
    print(model.predict([test_vector]))
    print([round(prediction[0][0]) for prediction[0][0] in prediction])

In [79]:
single_comment()

Enter Text - This is for practicing your chords or fingering for a guitar, and does not make a pleasant sound if strummed. This is a neat tool to use on the go, if youre waiting in line at the DMV, passing time in a waiting room or something like that. Its great in the sense that you dont have to disturb other people around you while pracitcing.
[['practicing', 'chords', 'fingering', 'guitar', 'make', 'pleasant', 'sound', 'strummed', 'neat', 'tool', 'use', 'go', 'youre', 'waiting', 'line', 'dmv', 'passing', 'time', 'waiting', 'room', 'something', 'like', 'great', 'sense', 'dont', 'disturb', 'people', 'around', 'pracitcing']]
[[0.80374455]]
[1.0]


## Live Demonstration

In [None]:
web_comments()

In [None]:
single_comment()

## Conclusion

I have created a function that scrapes comments from a product page on Amazon website.

You can easily use this model for classification for any amazon product. 
All you need is;

- Run the web-comments  function
- It will ask you to input the amazon product page URL
- Just copy and paste the url and press ENTER

The function will give you how many comments are there in total. It also labels each one of them for you as P or N. 
And filters the Negative one so that you would see why your customer is complaining .

To get valuable insight to revise your customer service you would read and  analyze  only 3 comments instead of 20.

I provides huge time and money saving in the long run


## Future Work
***Work in progress***

### Creating Web App for the model 



For customer use I would  Create a web app out of this prediction model.
Web scraping can be adapted to any product website other than Amazon. 


####  Connecting the prediction to Anvil 
I would like to run this prediction as a web app. I used one interface to get input and give the results to the user. It is called Anvil. 

In [None]:
import anvil.server

anvil.server.connect("O5X2QNJXLPWEQ2MQIAJTAQHP-AYZTXNLHKK3ZZOB2")


In [None]:
@anvil.server.callable
def say_hello(name):
  print("Hello from the uplink, %s!" % name)

anvil.server.wait_forever()

In [None]:
import anvil.media
@anvil.server.collable
def sentiment(file):
    with anvil.media.Tempfile(file) as filename:
        text = url_box(filename)       
    test_vektor=tokenizer.texts_to_sequences(text)
    test_vector=sequence.pad_sequences(test_vektor, maxlen=128)
    score=model.predict(text_vector)
    return score

In [None]:
$ ipython nbconvert --to FORMAT notebook.ipynb

There is a problem in the connection. It kept throwing errors on the Anvil platform. So moved on. 

#### Creatin app with SPYRE 

In [None]:
from spyre import server
app=server.App()
app.Launch()

In [None]:
!pip install spyre

There is a version missmatch here I guess. 

#### Using MLFlow for machine learnig life cycle 

I will try to redo this project in this a platform to see the difference and also for easy deployment. 