# Import Packages

In [1]:
# Importing Packages
import pandas as pd

# Import NLP Packages
from textblob import TextBlob 

# Sentiment Analysis

## Import DataFrame

In [2]:
# Import csv file
df = pd.read_csv('../csv/Hotel_Reviews.csv')

In [3]:
# Selecting only the columns that I will use
features = ['Negative_Review','Positive_Review', 'Reviewer_Score']
df = df[features]

## Create a function for Sentiment Analysis

In this step, I will generate a sentiment analysis. Normally, this would be a step that I'd run after data cleaning for NLP. However, previous tests showed me that data cleaning does not affect the sentiment analysis using TextBlob.

Running sentiment analysis takes a lot of time because I have more than 515K observations. For this reason, once the sentiment analysis is created, I will pickle the DataFrame and upload it again, so it won't run again.

In [4]:
# Create a function to get subjectivity
def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

# Create a function to get polarity with tweets
def getPolarity(text):
    return TextBlob(text).sentiment.polarity

<b>NOTE:</b>

The following cell takes around 10 minutes to run. For this reason, I will save the DataFrame into a csv file and upload it again.

In [5]:
# # Create new columns to compare polarity and subjetivity on Negative Reviews
# df['Polarity_Net'] = df['Negative_Review'].apply(getPolarity)
# df['Polarity_Pos'] = df['Positive_Review'].apply(getPolarity)

In [6]:
# # Saving csv with sentiment analysis
# df.to_csv("../csv/sentiment_analysis.csv")

In [7]:
# Importing DataFrame with new Polarity column
df = pd.read_csv("../csv/sentiment_analysis.csv", index_col=0)

In [8]:
# Creating function to classify the Sentiment Analysis
df['Sent_Analysis_Neg'] = df['Polarity_Net'].apply(lambda x: 0 if x < 0 else 1 if x > -0.1 and x < 0.1 else 2)
df['Sent_Analysis_Pos'] = df['Polarity_Pos'].apply(lambda x: 0 if x < 0 else 1 if x > -0.1 and x < 0.1 else 2)

In [9]:
df.head()

Unnamed: 0,Negative_Review,Positive_Review,Reviewer_Score,Polarity_Net,Polarity_Pos,Sent_Analysis_Neg,Sent_Analysis_Pos
0,I am so angry that i made this post available...,Only the park outside of the hotel was beauti...,2.9,0.028671,0.283333,1,2
1,No Negative,No real complaints the hotel was great great ...,7.5,0.15,0.24196,2,2
2,Rooms are nice but for elderly a bit difficul...,Location was good and staff were ok It is cut...,7.1,0.032653,0.46,1,2
3,My room was dirty and I was afraid to walk ba...,Great location in nice surroundings the bar a...,3.8,-0.07037,0.625,0,2
4,You When I booked with your company on line y...,Amazing location and building Romantic setting,6.7,-0.009091,0.3,0,2


### Findings and Takeaways:

- It was created Subjectivity and Polarity features using sentiment analysis for Negative and Positive Reviews. 
- Polarity ranges between -1 and 1. Where -1 means that the review was very negative and 1 means that the review was very positive.
- Seems like sentiment analysis does a good job identifying positive reviews, but the negative reviews could be improved.

## Evaluation Results

## Target Variable

In this section, I will create a target variable and use it to train my models. I will turn the Reviewer Score classes feature into:

- <b>0 - Bad:</b> Scores below 5
- <b>1 - Regular:</b> Scores between 5 and 7
- <b>2 - Good:</b> Scores above 7

In [15]:
df.head()

Unnamed: 0,Negative_Review,Positive_Review,Reviewer_Score,Polarity_Net,Polarity_Pos,Sent_Analysis_Neg,Sent_Analysis_Pos
0,I am so angry that i made this post available...,Only the park outside of the hotel was beauti...,2.9,0.028671,0.283333,1,2
1,No Negative,No real complaints the hotel was great great ...,7.5,0.15,0.24196,2,2
2,Rooms are nice but for elderly a bit difficul...,Location was good and staff were ok It is cut...,7.1,0.032653,0.46,1,2
3,My room was dirty and I was afraid to walk ba...,Great location in nice surroundings the bar a...,3.8,-0.07037,0.625,0,2
4,You When I booked with your company on line y...,Amazing location and building Romantic setting,6.7,-0.009091,0.3,0,2


In [16]:
# Create function that turns the Reviewer Score into a classification target with 3 values
df['Score'] = df['Reviewer_Score'].apply(lambda x: 0 if x < 5 else 1 if x >= 5 and x < 7 else 2)

In [17]:
df['SA_Score'] = df.Sent_Analysis_Neg + df.Sent_Analysis_Pos
# df.drop(columns=(['Polarity_Net','Polarity_Pos','Sent_Analysis_Neg','Sent_Analysis_Pos']), inplace=True)

In [18]:
df['Sentiment_Analysis'] = df['SA_Score'].apply(lambda x: 0 if x == 1 else 1 if x == 2 else 2)
df.head()

Unnamed: 0,Negative_Review,Positive_Review,Reviewer_Score,Polarity_Net,Polarity_Pos,Sent_Analysis_Neg,Sent_Analysis_Pos,Score,SA_Score,Sentiment_Analysis
0,I am so angry that i made this post available...,Only the park outside of the hotel was beauti...,2.9,0.028671,0.283333,1,2,0,3,2
1,No Negative,No real complaints the hotel was great great ...,7.5,0.15,0.24196,2,2,2,4,2
2,Rooms are nice but for elderly a bit difficul...,Location was good and staff were ok It is cut...,7.1,0.032653,0.46,1,2,2,3,2
3,My room was dirty and I was afraid to walk ba...,Great location in nice surroundings the bar a...,3.8,-0.07037,0.625,0,2,0,2,1
4,You When I booked with your company on line y...,Amazing location and building Romantic setting,6.7,-0.009091,0.3,0,2,1,2,1
