# What is Sentiment Analysis?

# Sentiment Analysis (also known as opinion mining or emotion # AI) is a sub-field of NLP that # measures the inclination of # people’s opinions (Positive/Negative/Neutral) within the
unstructured text.

Sentiment Analysis can be performed using two approaches: Rule-based, Machine Learning based.
Few applications of Sentiment Analysis

    Market analysis
    Social media monitoring
    Customer feedback analysis – Brand sentiment or reputation analysis
    Market research

What is Natural Language Processing(NLP)?

Natural Language is the way we, humans, communicate with each other. It could be Speech or Text. NLP is the automatic manipulation of the natural language by software. NLP is a higher-level term and is the combination of Natural Language Understanding (NLU) and Natural Language Generation  (NLG).

    NLP = NLU + NLG

Some of the Python Natural Language Processing (NLP) libraries are:

    Natural Language Toolkit (NLTK)
    TextBlob
    SpaCy
    Gensim
    CoreNLP

I hope we have got a basic understanding of the terms Sentiment Analysis, NLP.

This article focusses on the Rule-based approach of Sentiment Analysis

Data preprocessing steps:

    Cleaning the text
    Tokenization
    Enrichment – POS tagging
    Stopwords removal
    Obtaining the stem words


 

In [None]:
# https://www.analyticsvidhya.com/blog/2021/06/web-scraping-with-python-beautifulsoup-library/
# https://www.consumeraffairs.com/food/dominos.html?page=2#scroll_to_reviews=true

# Web Scraping With Python: BeautifulSoup Library

base_url = "https://www.consumeraffairs.com/food/dominos.html?page="
query_parameter = "?page="+str(i) # i represents the page number


In [None]:
!pip install pandas requests BeautifulSoup4

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

In [2]:
# Define the base URL
base_url = "https://www.consumeraffairs.com/food/dominos.html"

# Create an empty list to store all review
all_pages_reviews =[]


In [3]:
# Create a Scraper function
def scraper():
	# Web scraping - fetching the reviews from the webpage using BeautifulSoup

	# loop through a range of page numbers 
	for i in range(1,6): # fetching reviews from five pages

		# Creating an empty list to store the reviews of each page
		pagewise_reviews = [] 

		# Query parameter
		query_parameter = "?page="+str(i)

		# Constructing the URL
		url = base_url + query_parameter
		
		# Send HTTP request to the URL
		response = requests.get(url)

		# Create a soup object and parse the HTML page
		soup = bs(response.content, 'html.parser') 

		# Finding all the elements having reviews using class attribute
		rev_div = soup.findAll("div",attrs={"class","rvw-bd"}) 

		# loop through all the divs and append 
		for j in range(len(rev_div)):
			# finding all the p tags to fetch only the review text
			pagewise_reviews.append(rev_div[j].find("p").text)

		# writing all the reviews into a list
		for k in range(len(pagewise_reviews)):
			all_pages_reviews.append(pagewise_reviews[k]) 

	# return the final list of reviews
	return all_pages_reviews

# Driver code
reviews = scraper()

In [4]:
# Storing in a dataframe
i = range(1, len(reviews)+1)
reviews_df = pd.DataFrame({'review':reviews}, index=i)

# Writing to a text file
reviews_df.to_csv('datasets\\webreviews.txt', sep='\t')

In [7]:
# Problem 2
# https://www.analyticsvidhya.com/blog/2021/06/rule-based-sentiment-analysis-in-python/
# Using Sentiment Analysis


# Creating a pandas dataframe from reviews.txt file
data = pd.read_csv('datasets\\webreviews.txt',sep='\t')

In [8]:
data.head(10)

Unnamed: 0.1,Unnamed: 0,review
0,1,Omg they have the best gluten free pizza ever!...
1,2,Just another good experience with the Domino's...
2,3,"Review for Dominos del Amo bl, Lakewood CA. Ve..."
3,4,I called because my food was cold and not done...
4,5,"OMG, hands down the best pizza I've had from D..."
5,6,The location that was sending my pizza to make...
6,7,For the price is poor value. Delivery plus tip...
7,8,"Versova, I have paid extra amount. When I ask ..."
8,9,The Domino's pizza in Lomita Ca. 90717 on Waln...
9,10,I’m the last year or so they have added jalape...


In [9]:
data.columns

Index(['Unnamed: 0', 'review'], dtype='object')

In [10]:
mydata = data.drop('Unnamed: 0', axis=1)
mydata.head()

Unnamed: 0,review
0,Omg they have the best gluten free pizza ever!...
1,Just another good experience with the Domino's...
2,"Review for Dominos del Amo bl, Lakewood CA. Ve..."
3,I called because my food was cold and not done...
4,"OMG, hands down the best pizza I've had from D..."


Step 1: Cleaning the text

In this step, we need to remove the special characters, numbers from the text. We can use the regular expression operations library of Python.

In [12]:
import re

In [13]:
# Define a function to clean the text
def clean(text):
# Removes all special characters and numericals leaving the alphabets
    text = re.sub('[^A-Za-z]+', ' ', text)
    return text

# Cleaning the text in the review column
mydata['Cleaned Reviews'] = mydata['review'].apply(clean)
mydata.head()

Unnamed: 0,review,Cleaned Reviews
0,Omg they have the best gluten free pizza ever!...,Omg they have the best gluten free pizza ever ...
1,Just another good experience with the Domino's...,Just another good experience with the Domino s...
2,"Review for Dominos del Amo bl, Lakewood CA. Ve...",Review for Dominos del Amo bl Lakewood CA Very...
3,I called because my food was cold and not done...,I called because my food was cold and not done...
4,"OMG, hands down the best pizza I've had from D...",OMG hands down the best pizza I ve had from Do...


In [15]:
!pip install nltk

Collecting nltk
  Downloading nltk-3.6.7-py3-none-any.whl (1.5 MB)
Collecting regex>=2021.8.3
  Downloading regex-2021.11.10-cp36-cp36m-win_amd64.whl (272 kB)
Installing collected packages: regex, nltk
  Attempting uninstall: regex
    Found existing installation: regex 2020.11.13
    Uninstalling regex-2020.11.13:
      Successfully uninstalled regex-2020.11.13
Successfully installed nltk-3.6.7 regex-2021.11.10


In [16]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize
from nltk import pos_tag
nltk.download('stopwords')
from nltk.corpus import stopwords
nltk.download('wordnet')
from nltk.corpus import wordnet

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\rajkumar.mo\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\rajkumar.mo\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\rajkumar.mo\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\wordnet.zip.


In [19]:
nltk.download('omw-1.4')

[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\rajkumar.mo\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\omw-1.4.zip.


True

In [21]:
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\rajkumar.mo\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.


True

In [22]:
# POS tagger dictionary
pos_dict = {'J':wordnet.ADJ, 'V':wordnet.VERB, 'N':wordnet.NOUN, 'R':wordnet.ADV}
def token_stop_pos(text):
    tags = pos_tag(word_tokenize(text))
    newlist = []
    for word, tag in tags:
        if word.lower() not in set(stopwords.words('english')):
            newlist.append(tuple([word, pos_dict.get(tag[0])]))
    return newlist

mydata['POS tagged'] = mydata['Cleaned Reviews'].apply(token_stop_pos)
mydata.head()

Unnamed: 0,review,Cleaned Reviews,POS tagged
0,Omg they have the best gluten free pizza ever!...,Omg they have the best gluten free pizza ever ...,"[(Omg, None), (best, a), (gluten, a), (free, a..."
1,Just another good experience with the Domino's...,Just another good experience with the Domino s...,"[(another, None), (good, a), (experience, n), ..."
2,"Review for Dominos del Amo bl, Lakewood CA. Ve...",Review for Dominos del Amo bl Lakewood CA Very...,"[(Review, n), (Dominos, n), (del, None), (Amo,..."
3,I called because my food was cold and not done...,I called because my food was cold and not done...,"[(called, v), (food, n), (cold, a), (done, v),..."
4,"OMG, hands down the best pizza I've had from D...",OMG hands down the best pizza I ve had from Do...,"[(OMG, n), (hands, v), (best, a), (pizza, n), ..."


In [23]:
# Sample - Example - WORDS
from nltk.tokenize import word_tokenize
text="Hello there! Welcome to the programming world."
print(word_tokenize(text))

['Hello', 'there', '!', 'Welcome', 'to', 'the', 'programming', 'world', '.']


In [24]:
# Sample - Example - LINE

from nltk.tokenize import sent_tokenize
text="It’s easy to point out someone else’s mistake. Harder to recognize your own."
print(sent_tokenize(text))

['It’s easy to point out someone else’s mistake.', 'Harder to recognize your own.']


In [26]:
#https://www.geeksforgeeks.org/part-speech-tagging-stop-words-using-nltk-python/
tokens ="This is an article on Sentiment Analysis"
pos = nltk.pos_tag(tokens)
pos

TypeError: tokens: expected a list of strings, got a string

In [28]:
#Explanation: lemmatize is a function that takes pos_tag tuples, and gives the Lemma 
#for each word in pos_tag based on the pos of that word. We applied it to the ‘POS tagged’ 
#column and created a column ‘Lemma’ to store the output.

from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
def lemmatize(pos_data):
    lemma_rew = " "
    for word, pos in pos_data:
        if not pos:
            lemma = word
            lemma_rew = lemma_rew + " " + lemma
        else:
            lemma = wordnet_lemmatizer.lemmatize(word, pos=pos)
            lemma_rew = lemma_rew + " " + lemma
    return lemma_rew

mydata['Lemma'] = mydata['POS tagged'].apply(lemmatize)
mydata.head()

Unnamed: 0,review,Cleaned Reviews,POS tagged,Lemma
0,Omg they have the best gluten free pizza ever!...,Omg they have the best gluten free pizza ever ...,"[(Omg, None), (best, a), (gluten, a), (free, a...",Omg best gluten free pizza ever love us glut...
1,Just another good experience with the Domino's...,Just another good experience with the Domino s...,"[(another, None), (good, a), (experience, n), ...",another good experience Domino Pizza store K...
2,"Review for Dominos del Amo bl, Lakewood CA. Ve...",Review for Dominos del Amo bl Lakewood CA Very...,"[(Review, n), (Dominos, n), (del, None), (Amo,...",Review Dominos del Amo bl Lakewood CA highly...
3,I called because my food was cold and not done...,I called because my food was cold and not done...,"[(called, v), (food, n), (cold, a), (done, v),...",call food cold do right miss item call answe...
4,"OMG, hands down the best pizza I've had from D...",OMG hands down the best pizza I ve had from Do...,"[(OMG, n), (hands, v), (best, a), (pizza, n), ...",OMG hand best pizza Domino pizza Southaven M...


In [29]:
mydata[['review','Lemma']]

Unnamed: 0,review,Lemma
0,Omg they have the best gluten free pizza ever!...,Omg best gluten free pizza ever love us glut...
1,Just another good experience with the Domino's...,another good experience Domino Pizza store K...
2,"Review for Dominos del Amo bl, Lakewood CA. Ve...",Review Dominos del Amo bl Lakewood CA highly...
3,I called because my food was cold and not done...,call food cold do right miss item call answe...
4,"OMG, hands down the best pizza I've had from D...",OMG hand best pizza Domino pizza Southaven M...
...,...,...
125,S. P road Gaya Domino's denied to provide ever...,P road Gaya Domino deny provide every day va...
126,I ordered a 12inch pizza and a pasta bowl. I f...,order inch pizza pasta bowl find long black ...
127,I place the order at Domino's at 1801 Valley V...,place order Domino Valley View Drive p Satur...
128,After an hour passed and refused cold uncooked...,hour pass refuse cold uncooked way pizza man...


Sentiment Analysis using TextBlob:

TextBlob is a Python library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more.

The two measures that are used to analyze the sentiment are:

    Polarity – talks about how positive or negative the opinion is
    Subjectivity – talks about how subjective the opinion is

    TextBlob(text).sentiment gives us the Polarity, Subjectivity values.
    Polarity ranges from -1 to 1 (1 is more positive, 0 is neutral, -1 is more negative)
    Subjectivity ranges from 0 to 1(0 being very objective and 1 being very subjective)

Sentiment Analysis using TextBlob:

In [31]:
!pip install textblob

Collecting textblob
  Downloading textblob-0.17.1-py2.py3-none-any.whl (636 kB)
Installing collected packages: textblob
Successfully installed textblob-0.17.1


In [33]:
from textblob import TextBlob
res = TextBlob("I love horror films").sentiment
res

Sentiment(polarity=0.5, subjectivity=0.6)

In [37]:
# function to calculate subjectivity
def getSubjectivity(review):
    return TextBlob(review).sentiment.subjectivity
    # function to calculate polarity
def getPolarity(review):
    return TextBlob(review).sentiment.polarity

# function to analyze the reviews
def analysis(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'

In [38]:
fin_data = pd.DataFrame(mydata[['review', 'Lemma']])

In [39]:
# fin_data['Subjectivity'] = fin_data['Lemma'].apply(getSubjectivity) 
fin_data['Polarity'] = fin_data['Lemma'].apply(getPolarity) 
fin_data['Analysis'] = fin_data['Polarity'].apply(analysis)
fin_data.head()

Unnamed: 0,review,Lemma,Polarity,Analysis
0,Omg they have the best gluten free pizza ever!...,Omg best gluten free pizza ever love us glut...,0.470455,Positive
1,Just another good experience with the Domino's...,another good experience Domino Pizza store K...,0.233333,Positive
2,"Review for Dominos del Amo bl, Lakewood CA. Ve...",Review Dominos del Amo bl Lakewood CA highly...,0.3735,Positive
3,I called because my food was cold and not done...,call food cold do right miss item call answe...,0.217143,Positive
4,"OMG, hands down the best pizza I've had from D...",OMG hand best pizza Domino pizza Southaven M...,0.538889,Positive


In [40]:
fin_data['Analysis'].value_counts()

Positive    67
Negative    56
Neutral      7
Name: Analysis, dtype: int64

In [44]:
# Sentiment Analysis using VADER

#VADER stands for Valence Aware Dictionary and Sentiment Reasoner. Vader sentiment not only tells if the statement is positive or negative along with the intensity of emotion.
#Sentiment Analysis using VADER
!pip install vaderSentiment


Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


In [45]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

vs = analyzer.polarity_scores("I love horror films")
vs


{'neg': 0.374, 'neu': 0.202, 'pos': 0.424, 'compound': 0.128}

In [48]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
# function to calculate vader sentiment
def vadersentimentanalysis(review):
    vs = analyzer.polarity_scores(review)
    return vs['compound']

fin_data['Vader Sentiment'] = fin_data['Lemma'].apply(vadersentimentanalysis)

# function to analyse
def vader_analysis(compound):
    if compound >= 0.5:
        return 'Positive'
    elif compound <= -0.5 :
        return 'Negative'
    else:
        return 'Neutral'

fin_data['Vader Analysis'] = fin_data['Vader Sentiment'].apply(vader_analysis)
fin_data.head()

Unnamed: 0,review,Lemma,Polarity,Analysis,Vader Sentiment,Vader Analysis
0,Omg they have the best gluten free pizza ever!...,Omg best gluten free pizza ever love us glut...,0.470455,Positive,0.9872,Positive
1,Just another good experience with the Domino's...,another good experience Domino Pizza store K...,0.233333,Positive,0.9371,Positive
2,"Review for Dominos del Amo bl, Lakewood CA. Ve...",Review Dominos del Amo bl Lakewood CA highly...,0.3735,Positive,0.9897,Positive
3,I called because my food was cold and not done...,call food cold do right miss item call answe...,0.217143,Positive,0.7579,Positive
4,"OMG, hands down the best pizza I've had from D...",OMG hand best pizza Domino pizza Southaven M...,0.538889,Positive,0.9312,Positive


In [49]:
vader_counts = fin_data['Vader Analysis'].value_counts()
vader_counts

Neutral     58
Positive    47
Negative    25
Name: Vader Analysis, dtype: int64