## Sentiment Analysis

In [1]:
#For data preprocessing
import pandas as pd

#For natural language processing
import spacy
nlp = spacy.load('en_core_web_sm')

#For regular expressions or pattern matching in python
import re

In [2]:
#Loading the sample dataset present under sample_data.csv file using pd.read_csv file
sample = pd.read_csv('amazon_ratings.csv')

In [0]:
#Analysing the top 2 rows
sample.head(2)

Unnamed: 0,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
0,16010470,R2EYF5O4W313NW,B00J46XO9U,744008282,"iXCC Lightning Cable 3ft, iPhone charger, for ...",Mobile_Electronics,4,0,0,N,Y,Four Stars,Very good quality. So far so good.,8/30/2015
1,50828061,R5F0MONILUKRP,B00J46XO9U,744008282,"iXCC Lightning Cable 3ft, iPhone charger, for ...",Mobile_Electronics,5,0,0,N,Y,Five Stars,Good product and good seller,8/29/2015


In [3]:
#Since, reviews are being provided in the review_headline and review_body hence concatenating both of them to get final review
sample['Final Review'] = sample['review_headline'] + ' ' + sample['review_body']

In [4]:
def func(x):
    temp = []
    document = nlp(x.lower())
    for i in document:
        if i.is_punct!=True:
            temp.append(i.lemma_)
    return ' '.join(temp)

In [7]:
sample['Final Cleaned Text'] = sample['Final Review'].apply(lambda x: func(x))

<font size = 5px, >
    <b><u>Naive Bayes Classifier</u></b><br><br>
</font>

#### Naive Bayes or Naive Bayesian is a classification algorithm which is used when we need to classify objects into two different categories such that the new object can either be of <b>First Category</b> or <b>Second Category</b>.<br>

#### It uses probability to find class of unknown object. <br>

#### Bayesian theorem is used to find probability for determining the class of new object.


![image.png](attachment:image.png)






#### Posterior Probability: <font color='red'>P(A|B)</font> The probability of 'A' being True given that 'B' is True. 
#### Likelihood: <font color='red'>P(B|A)</font> The probability of 'B' being True given that 'A' is True.
#### Prior Probability: <font color='red'>P(A)</font> The probability of 'A' being True.
#### Evidence: <font color='red'>P(B)</font> The probability of 'B' being True. 

![image.png](attachment:image.png)


#### It works by first training the model by providing set of features and their respective sentiment.<br>
### <font color='blue'> Feature Set 1 - </font>  <font color='red'>    Positive Sentiment </font><br>
### <font color='blue'> Feature Set 2 - </font>  <font color='red'>    Negative Sentiment </font><br>
### <font color='blue'> Feature Set 3 - </font>  <font color='red'>    Positive Sentiment </font><br>

#### One of the major advantages that Naive Bayes has over other classification algorithms is its ability to handle an extremely large number of features. In our case, each word is treated as a feature and there are thousands of different words.
#### Also, it performs well even with the presence of irrelevant features and is relatively unaffected by them.
#### The other major advantage it has is its relative simplicity than other classification algorithms. 



## Model Training and Testing

In [9]:

from sklearn.naive_bayes import GaussianNB
from sklearn.feature_extraction.text import TfidfVectorizer


In [10]:
#Performing the tfidf vectorization 
vectorizer = TfidfVectorizer(analyzer='word',stop_words='english')   

#fitting and transforming sample reviews and converting their values to list
text_tfidfs = vectorizer.fit_transform(sample['Final Cleaned Text'].values.tolist())


In [11]:
#Converting Ratings above 3 to positive sentiment or 1 and 3 or below to 0
sample['star_rating'] = sample['star_rating'].apply(lambda x:1 if x > 3 else 0)

In [18]:
a = text_tfidfs.toarray()
for i in a[1]:
    print(i)

0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0


In [19]:
#Splitting the dataset into training and testing for more robustness of algorithm
from sklearn.model_selection import train_test_split



In [20]:

#Splitting the data by calling train_test_split
x_train, x_test, y_train, y_test = train_test_split(text_tfidfs.toarray(), 
                                                    sample['star_rating'], test_size = 0.10, random_state = 251)

In [21]:
model = GaussianNB()          #Calling the Gaussian naive bayes for uniformly distributed data

In [22]:
model.fit(x_train, y_train)    #Fitting the model with vectors of text and their label(positive or negative)


GaussianNB(priors=None, var_smoothing=1e-09)

In [24]:
model.score(x_train, y_train)

1.0

In [25]:
y_prediction = model.predict(x_test)            #Calling predict method over testing data to validate model is good
from sklearn.metrics import accuracy_score      #Importing the accuracy score available in sklearn

""

''

In [26]:
#Finding the accuracy of the model 
print("Accuracy of this sentiment analysis is: ", accuracy_score(y_test, y_prediction)* 100)

Accuracy of this sentiment analysis is:  100.0


In [28]:
# Making Predictions

string = input("Enter review: ")
matrix = vectorizer.transform([string.lower()]).toarray()
print()
print('Positive Sentiment' if model.predict(matrix)[0] == 1 else 'Negative Sentiment')

Enter review: May be my first negative review about the product & Amazon both. I was much elated to receive the iPhone 11 so fast, next day of dispatch i.e. 28/09/19, but the thing I got started heating up every now and then. Contacted Applecare, just to be consoled that it's quite normal. As it continued, tried to return the product by speaking to Amazon customer support but in vain. Some body called me back to convey that only Apple will decide which one to take back. Why is then Amazon took up the sacred duty of selling such an item which they can't exchange/ have no control ? The product developed new issues like proximity sensor malfunction and last but most importantly loosing mobile network every other minute(even had two software updates). It was handed over to the Apple ASP as the return window closed on 10/10/19 (what use it was for??) and diagnosed as having issues and has further been sent to Apple repair facility at Bengaluru. So I'm here w/out my first iPhone after using 

In [0]:
#For data preprocessing
import pandas as pd

#For natural language processing
import spacy
nlp = spacy.load('en_core_web_sm')

#For regular expressions or pattern matching in python
import re