## Restaurant Reviews


### Problem Statement: Build a model which understands the review description and classifies whether it is a good restaurant or bad restaurant
Consider rating above 3 as "Positive" and below 3 as "Negative"

## Load the Dataset

In [135]:
from warnings import filterwarnings
filterwarnings('ignore')

In [136]:
import pandas as pd
df = pd.read_csv('Restaurant_Reviews.tsv',sep='\t')
df.head()

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1


In [137]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Review  1000 non-null   object
 1   Liked   1000 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 15.8+ KB


## Data Cleaning - Perform basic data quality checks

In [138]:
df.shape

(1000, 2)

In [139]:
# checking for missing values
df.isna().sum()

Review    0
Liked     0
dtype: int64

## Data Pre-Processing
### Data Extraction using TFIDF Vectorizer

In [141]:
print(f'Review datatype: {df['Review'].dtypes}')
print(f'Liked datatype: {df['Liked'].dtypes}')

Review datatype: object
Liked datatype: int64


### Removing any special characters or numbers from the column.

In [143]:
import re 
df['Review'] = df['Review'].str.lower()
pattern = r'[^a-z\s]'
df['Review'] = df['Review'].replace(pattern,'',regex=True)
df['Review']

0                                   wow loved this place
1                                      crust is not good
2               not tasty and the texture was just nasty
3      stopped by during the late may bank holiday of...
4      the selection on the menu was great and so wer...
                             ...                        
995    i think food should have flavor and texture an...
996                              appetite instantly gone
997    overall i was not impressed and would not go back
998    the whole experience was underwhelming and i t...
999    then as if i hadnt wasted enough of my life th...
Name: Review, Length: 1000, dtype: object

In [144]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf =  TfidfVectorizer()

X = tfidf.fit_transform(df['Review']).toarray()
X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [145]:
X.shape

(1000, 2046)

In [146]:
Y = df[['Liked']]
Y.head()

Unnamed: 0,Liked
0,1
1,0
2,0
3,1
4,1


## Dividing the training dataset into train and test inorder to avoid overfitting model

In [147]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(X,Y,train_size=0.75,test_size=0.25,random_state=21)

In [148]:
xtrain.shape

(750, 2046)

In [149]:
xtest.shape

(250, 2046)

In [151]:
ytrain.shape

(750, 1)

In [152]:
ytest.shape

(250, 1)

## Building Neural Network model to perform sentiment analysis