# Sentiment Analysis using Natural Language Processing(NLP)

## Introduction
*   Natural Language Processing is used to perform many machine learning algorithms on Natural Language ie. we being human use to communicate with each other.
*   With the use of NLP, We can even interact with the Digitel world.
*   In this project we are going to use the dataset "Restaurant_Reviews.tsv". it is reviews provided by the customer for one restaurant.
*   And we are going to make our model predict the Sentiment of review ie. positive or negative
*   1 represent Prosetive Sentiment and 0 represent Negative Sentiment 

## Motivation
*   by Implementing this we can use this model to review restaurant customer feedback in seconds instead of reading all reviews.
*   Further extension of this can be like respond customer based on the sentiment prediction.

## Importing the libraries

In [68]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

## Importing the dataset

Next Link is an image, You need to type the url.

image.png

In [69]:
dataset = pd.read_csv("Restaurant_Reviews.tsv", sep="\t", quoting =3)

## Cleaning the Each Document(review)

In [70]:
#importing the required library for this step
import re
import nltk

#downloading the stop words
nltk.download('stopwords')

#importing stopwords
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

#creating empty corpus which will be the dataset at the end
corpus = []

#iterating the dataset
for i in range(len(dataset)):
  #applying regex to extract the review
  review = re.sub('[^a-zA-Z]', ' ', str(dataset.iloc[i].values))
  #converting the review inlower 
  review = review.lower()
  #spliting the review
  review = review.split()
  #creating object for PorterStemmer()
  ps = PorterStemmer()
  #defining the language for stopwords 
  all_stopwords = stopwords.words('english')
  #as "not" is important for our sentiment analyse so removing from the all_stopwords 
  all_stopwords.remove("not")
  #spliting the review in word and keepiing only which are not in stopwords
  review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
  #now again making the review in one string
  review = ' '.join(review)
  #finally appending processed review to the corpus
  corpus.append(review)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Creating the Bag of Words model

In [71]:
#importing librarey for the CountVectorizer model
from sklearn.feature_extraction.text import CountVectorizer
#creating object
cv = CountVectorizer(max_features=1500)
#appling to x 
X = cv.fit_transform(corpus).toarray()
#specifing the sentiment for the x in y
y = dataset.iloc[:,-1].values

## Splitting the dataset into the Training set and Test set

In [72]:
#importing librarey for the train_test_split
from sklearn.model_selection import train_test_split
#spliting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
#for multiple time usage keeping this temp variable
temp_x, temp_y = X_train, y_train

# Comperative Analysis of differest classifier

In [73]:
#final result values will be stored in results list
results = []

#defining the method for model results
def Model_results(X_test,y_test, classifier):
  #predicting the result
  y_pred = classifier.predict(X_test)
  #getting the results for this model
  from sklearn.metrics import confusion_matrix, accuracy_score
  #creating confussion metrix
  cm = confusion_matrix(y_test, y_pred)
  #getting the accuracy_score
  accuracy_score(y_test, y_pred)
  #defining TP
  TP = cm[0][0]
  #defining TN
  TN = cm[1][1]
  #defining FP
  FP = cm[1][0]
  #defining FN
  FN = cm[0][1]
  #getting the value for Accuracy
  Accuracy = (TP + TN) / (TP + TN + FP + FN)
  #getting the value for Precision
  Precision = TP / (TP + FP)
  #getting the value for Recall
  Recall = TP / (TP + FN)
  #printing the result
  print(f"Accuracy: {Accuracy}\nPrecision: {Precision}\nRecall:{Recall}")
  #appending the result to main librarey
  global results
  results.append([str(type(classifier)).split(".")[-1][:-2],Accuracy, Precision,Recall])  

## Decision Tree

In [74]:
#performing the Decision Tree
X_train, y_train = temp_x, temp_y
#importing librarey
from sklearn.tree import DecisionTreeClassifier
#creating object for this model
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
#fitting model to train set
classifier.fit(X_train, y_train)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='entropy',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=0, splitter='best')

In [75]:
Model_results(X_test,y_test, classifier)

Accuracy: 0.775
Precision: 0.7047619047619048
Recall:0.8409090909090909


## GaussianNB

In [76]:
#performing the Decision Tree
X_train, y_train = temp_x, temp_y
#importing librarey
from sklearn.naive_bayes import GaussianNB
#creating object for this model
classifier = GaussianNB()
#fitting model to train set
classifier.fit(X_train, y_train) 

GaussianNB(priors=None, var_smoothing=1e-09)

In [77]:
Model_results(X_test,y_test, classifier)

Accuracy: 0.715
Precision: 0.6962025316455697
Recall:0.625


## KNeighbors

In [78]:
#performing the Decision Tree
X_train, y_train = temp_x, temp_y
#importing librarey
from sklearn.neighbors import KNeighborsClassifier
#creating object for this model
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
#fitting model to train set
classifier.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [79]:
Model_results(X_test,y_test, classifier)

Accuracy: 0.78
Precision: 0.72
Recall:0.8181818181818182


## Logestic Regression

In [80]:
#performing the Decision Tree
X_train, y_train = temp_x, temp_y
#importing librarey
from sklearn.linear_model import LogisticRegression
#creating object for this model
classifier = LogisticRegression()
#fitting model to train set
classifier.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [81]:
Model_results(X_test,y_test, classifier)

Accuracy: 0.83
Precision: 0.78125
Recall:0.8522727272727273


## Random Forest

In [82]:
#performing the Decision Tree
X_train, y_train = temp_x, temp_y
#importing librarey
from sklearn.ensemble import RandomForestClassifier
#creating object for this model
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
#fitting model to train set
classifier.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='entropy', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=10,
                       n_jobs=None, oob_score=False, random_state=0, verbose=0,
                       warm_start=False)

In [83]:
Model_results(X_test,y_test, classifier)

Accuracy: 0.825
Precision: 0.7387387387387387
Recall:0.9318181818181818


## SVM

In [84]:
#performing the Decision Tree
X_train, y_train = temp_x, temp_y
#importing librarey 
from sklearn.svm import SVC
#creating object for this model
classifier = SVC(kernel = 'rbf', random_state = 0)
#fitting model to train set
classifier.fit(X_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,
    verbose=False)

In [85]:
Model_results(X_test,y_test, classifier)

Accuracy: 0.805
Precision: 0.7094017094017094
Recall:0.9431818181818182


## SVM Kernal

In [86]:
#performing the Decision Tree
X_train, y_train = temp_x, temp_y
#importing librarey
from sklearn.svm import SVC
#creating object for this model
classifier = SVC(kernel = 'linear', random_state = 0)
#fitting model to train set
classifier.fit(X_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,
    verbose=False)

In [87]:
Model_results(X_test,y_test, classifier)

Accuracy: 0.86
Precision: 0.8333333333333334
Recall:0.8522727272727273


# Result

In [88]:
results[-1][0] = "SVC Kernal"
pd.DataFrame(results, columns=["Model", "Accuracy", "Precision","Recall"])

Unnamed: 0,Model,Accuracy,Precision,Recall
0,DecisionTreeClassifier,0.775,0.704762,0.840909
1,GaussianNB,0.715,0.696203,0.625
2,KNeighborsClassifier,0.78,0.72,0.818182
3,LogisticRegression,0.83,0.78125,0.852273
4,RandomForestClassifier,0.825,0.738739,0.931818
5,SVC,0.805,0.709402,0.943182
6,SVC Kernal,0.86,0.833333,0.852273


# Conclusion

> As per the the above comperasion Accuracy we can select SVC



# Future enhancement

It can be attached to the Application so Bot can respond based on the sentiment predicted by our model