# Sentiment Analysis using NLTK Vader , TextBlob and Pattern

***Let's Import the libraries and the data set***

In [None]:
#downloading the regular expression library
!pip install regex

import warnings
warnings.filterwarnings('ignore')


In [None]:
# importing the libraries
import re           #importing the regu+lar expression library
import pandas as pd
import numpy as np

***Reading the data***

In [None]:
text = open("hotel_reviews.txt", "r")

***Cleaning the Data***

In [None]:
#Function to remove special characters and numbers from the text

def remove(x):
    pattern = "['\n',@\'?\.$%_0-9]"
    x = [re.sub(pattern, '', i) for i in x]
    return x

In [None]:
clean = remove(text)

In [None]:
#Converting the list type data to string type
def listToString(s):  
    
    # initialize an empty string
    str1 = ""  
    
    # traverse in the string 
    for ele in s:  
        str1 += ele   
    
    # return string   
    return str1  

In [None]:
clean_text = listToString(clean)


In [None]:
#Splitting the text
splitted_text = clean_text.split('\t')


In [None]:
#loading the clean data in pandas dataframe

data = clean_text
df = pd.DataFrame([x.split(',') for x in data.split('\t')])

#Naming the column of the dataframe 
df.columns = ["Reviews"]
df.head()

In [None]:
#Removing the '\n' identifier from sentences
df_clean = df[~df['Reviews'].astype(str).str.endswith('\n')]

In [None]:
df_clean.head(10)

In [None]:
#Removing the empty row from the top
df_final= df_clean.drop([0])

### Sentiment Analysis Using NLTK Vader

**Downloading the NLTK library**

In [None]:
!pip install nltk
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [None]:
sia = SentimentIntensityAnalyzer()

The NLTK Vader provides the sentiment of the text by providing scores in three different categories i.e. negative, neutral, and positive.

Along with this, the compound is also calculated for each text passed to the NLTK Vader function. The compound attribute is basically a summarized result of all the three categories score.

The value of compound ranges from -1(most extreme negative) and +1 (most extreme positive). This is normalized value as it helps in better analysis and further usage.

In [None]:
df_final['neg'] = df_final['Reviews'].apply(lambda x:sia.polarity_scores(x)['neg'])
df_final['neu'] = df_final['Reviews'].apply(lambda x:sia.polarity_scores(x)['neu'])
df_final['pos'] = df_final['Reviews'].apply(lambda x:sia.polarity_scores(x)['pos'])
df_final['compound'] = df_final['Reviews'].apply(lambda x:sia.polarity_scores(x)['compound'])

In [None]:
df_final.head()

In [None]:
vader_comp = df_final["compound"].mean()

#### NLTK Vader

In [None]:
print("The average polarity of hotel review is:",vader_comp)

### Sentiment Analysis Using TextBlob

**Let's download the Textblob library**

In [None]:
!pip install textblob

In [None]:
#importing the textblob library 
from textblob import TextBlob

Let's create a function to calculate the sentiment attributes of our dataset easily. 

We will create a new dataframe df_textblob to store the results of sentiment score for each sentence in our dataset.

In [None]:
def detect_polarity(text):
    return TextBlob(text).sentiment.polarity

In [None]:
df_textblob = df_final.Reviews.apply(detect_polarity)

Unlike the NLTK Vader library, Textblob returns us a single score/polarity for each sentence.

In [None]:
df_textblob.head()

Let's calculate the mean of all of these scores to get a summarized view of all the polarity scores.


In [None]:
tb_polarity = df_textblob.mean()

#### Text Blob Results

In [None]:
print("The average polarity of hotel review is:",tb_polarity)

### Sentiment Analysis Using Pattern

**Let's download the Pattern library**

In [None]:
!pip install pattern

In [None]:
#importing the sentiment function from the pattern library
nltk.download('omw-1.4')
from pattern.en import sentiment

Let's create our own custom function senti_pattern, to find the sentiment scores for each of the sentences in our hotel review data set.

In [None]:
def senti_pattern(text):
    return sentiment(text)

In [None]:
df_pattern = df_final.Reviews.apply(senti_pattern)

Here we are able to see in the results that both sentiment score also known as polarity is obtained as the first value in the output and the second value is the subjectivity score.

In [None]:
df_pattern.head()

The output is a list of tuples, so let's convert it to two different tuples.

The output obtained is present in tuple, thus to perform further operations we need to convert this list of tuples into two different tuples. For this conversion, we will use zip function of python. This will help us in getting two individual tuples which can then be used for our operations.

In [None]:
sentiment,subjectivity = zip(*df_pattern)

In [None]:
sentiment

To find the average of the sentiment score and subjectivity score in tuples, Let's build a custom function for calculating these two attributes average.

In [None]:
def average(values):
    sum = 0
    count = 0
    for num in values:
        try:
            sum += float(num)
            count += 1
        except: pass
    return sum / count

#### Pattern Results

In [None]:
print("The sentiment of hotel review is:",'%7.2f'%(average(sentiment)))
print("The subjectivity of hotel review is:",'%7.2f'%(average(subjectivity)))

### NLTK Vader vs Text Blob vs Pattern

Now let's compare the results of the three libraries

In [None]:
#importing matplotlib to visualize the results
import matplotlib.pyplot as plt
import numpy

In [None]:
labels = ['NLTK Vader','TextBlob','Pattern']
sentiment_count = [0.59,0.31,0.32]

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

In [None]:
fig, ax = plt.subplots()
rects = ax.bar(x, sentiment_count, width, label='NLP_Libraries')

ax.set_ylabel('Polarity_Score')
ax.set_title('Comparison of NLP Libraries')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()



plt.show()