# SENTIMENT ANALYSIS FOR ZOMATO REVIEWS

![](https://www.expressanalytics.com/wp-content/uploads/2021/06/sentimentanalysishotelgeneric-2048x803-1.jpg)Sentiment analysis is contextual mining of text which identifies and extracts subjective information in source material, and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations. Sentiment Analysis is the most common text classification tool that analyses an incoming message and gives a sentiment score between -1(most negative) to 1(most positive) with 0 being a neutral sentiment.

We have the data of 51717 restaurants of Bangalore, India. An important aspect of our Restaurant Recommendation system will be the Sentiment Analysis done on the reviews given by people to each restaurant. In this notebook we give sentiment score to each restaurant based on the reviews it has been given. In the end we will export this as a new dataset and further work will be done on that. 

## Importing Libraries and Data

In [None]:
#basic libraries
import pandas as pd
import numpy as np

#ast library to unwrap string wrapped python objects
import ast

#regular expression library to preprocess strings
import re

#NLTK library for text preprocessing and other nlp tasks
import nltk
from nltk.corpus import stopwords

#textblob libraray for lemmatising and generating sentiment scores 
from textblob import TextBlob
from textblob import Word

In [None]:
data=pd.read_csv('../input/zomato-bangalore-dataset/zomato.csv')
data.head()

In [None]:
data.shape

 ## Extracting Reviews Column for Sentiment Analysis

In [None]:
reviews=data.reviews_list

In [None]:
reviews[0]

Since we have data in a string of list we will first convert it to a python list object

## Converting string of list to list object using ast module

In [None]:
reviews=reviews.apply(ast.literal_eval)

In [None]:
type(reviews[0])

In [None]:
reviews[0]

## Concatenating reviews in one paragraph for each restaurant

All the reviews begin with a 'RATED\n' string which we will not require so we remove that.


In [None]:
def concatinating_reviews(lst):
    strng=''
    for tup in lst:
        strng=strng+tup[1][8:]
    return strng.lower() #converting to lower case and removing the rated/n from the starting of each paragraph

reviews=reviews.apply(concatinating_reviews)

In [None]:
reviews[0]

Now all reviews are concatenated into a string. 

# Preprocessing steps

## Objectives in PreProcessing :
* Remove all non alphabets from the reviews
* Remove Stopwords
* Remove non-english words
* Bring all words to their root form by lemmatization

In [None]:
#Removing all non alphabets from the reviews

strng=''
def cleaning(strng):
    strng=re.sub("[^A-Za-z]"," ",strng)
    strng = re.sub(' +', ' ', strng)
    return strng

reviews=reviews.apply(cleaning)

In [None]:
reviews[0]

In [None]:
cleaned_reviews_in_df=reviews.copy()

In [None]:
#importing stopwords and english words from nltk
stopwrds = stopwords.words('english')
eng_words = set(nltk.corpus.words.words())

Using TextBlob library for generating sentiment score and lemmatization process

In [None]:
def preprocess(strng):
    #tokenising sentence and removing stopwords and non-english words
    cleaned_word_list=[word for word in strng.split(' ') if word not in stopwrds and word in eng_words]
    #lemitising words
    cleaned_word_list2=[Word(w).lemmatize() for w in cleaned_word_list]
    #returning joined words
    return ' '.join(cleaned_word_list2)

reviews=reviews.apply(preprocess)


In [None]:
reviews[0]

## Sentiment Analysis using TextBlob Library

In [None]:
strng=''
sentiment_scores=[]
def sentiment_analysis(strng):
    blob=TextBlob(strng)
    sentiment_scores.append(blob.sentiment.polarity)
    
reviews.apply(sentiment_analysis);
#semicolon at the end to suppress output

Looking at the kind of values in polarity:

In [None]:
sentiment_scores[:5]

In [None]:
len(sentiment_scores)

## Exporting Final Dataset with Sentiment Scores

In [None]:
data['Sentiment_score']=pd.Series(sentiment_scores)
data['cleaned_reviews']=cleaned_reviews_in_df
data.to_csv("Cleaned-Zomato-data-with-sentiment-scores.csv",index=False)