# Sentiment Analysis Using Naive Bayes Classifier

<img src="Intro2.PNG">

## Data
Let's have a look at our training data.

In [10]:
import pandas as pd

df = pd.read_csv("data.csv")
df.tail()

Unnamed: 0,Sentiment,SentimentText
124984,0,kid 50's 60's anything connected Disney defini...
124985,1,course reading review seen film already. 'Raja...
124986,0,"read ""There's Girl Soup"" came Peter Sellers's ..."
124987,0,film quite boring. snippets naked flesh tossed...
124988,1,Although film somewhat filled eighties cheese ...


The *"SentimentText"* column has reviews/tweets and the *"Sentiment"* column has the corresponding sentiment value, "0" is negative and "1" is positive.

## Data pre-processing

Let's prepare our data for the model.

The **tfidfvetorizer** is useful for this task.

In [11]:
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer

stopset = set(stopwords.words('english'))
vectorizer = TfidfVectorizer(use_idf = True, lowercase = True, strip_accents = 'ascii', stop_words = stopset)

The *tfidfvectorizer* mainly does two things:
1. For unique word in each document, count how many times it shows up in that document. That's **"Term Frequency” (TF)**
2. Then, take that unique word and count how many times it shows up in all documents. That's **“Document Frequency” (DF)**
(to keep the range reasonble, we use **log** of this frequency. That's **"Inverse Document Frequency" (IDF)**)

**Stopwords** are words that are common and do not provide much information about the label of the text. 
For example, "the", "a", "of", etc.

## Fitting the model

Here the target variable will be the *Sentiment* using the *SentimentText *.

In [12]:
y=df.Sentiment

X = vectorizer.fit_transform(df.SentimentText) # Fitting and transforming using the vectorizer

We will use the **naive_bayes** class from the **Sci-kit learn** module.

In [13]:
from sklearn import naive_bayes

clf=naive_bayes.MultinomialNB()
clf.fit(X,y)
print("Ready")

Ready


The classifier is ready!
Let's try some reviews.

In [14]:
from IPython.display import display                               
from ipywidgets import interactive
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np

def ren(Review):
    review_array = np.array([Review])
    review_vector = vectorizer.transform(review_array)
    if(Review==""):
        print("Write something!")
    elif(clf.predict(review_vector)==0):
        plt.imshow(mpimg.imread('t_d.jpg'))
        plt.axis('off')
    else:
        plt.imshow(mpimg.imread('t_u.jpg'))
        plt.axis('off')

re = input("Press Enter when you are ready to write a review: ")

inter = interactive(ren,
                   Review = re)
display(inter)

Press Enter when you are ready to write a review: 


interactive(children=(Text(value='', description='Review'), Output()), _dom_classes=('widget-interact',))

## Sentiment Analysis for tweets

We can use this classifier for a chunk of reviews and opinions simultaneously.

Let's jump to the place which is just right for us!

<img src="twitter-logo.png">

Twitter is like a constantly updating dataset with public opinions on a topic.

---

For accessing the **twitter api** we are using the module **tweepy**.

In [15]:
import tweepy

Authorization set-up for the api.

In [16]:
consumer_key = 'IFJlqnamXmKdoe1VW0oLjk3UP'
consumer_secret = 'zLXq4gnaQ8o3Qs1gejCbWdx82FghXESfgRiiquDztWCpVYDNui'

access_token = '612553252-brPdasq0Ho9LOaPtc77ho6PvvsApC2oonOmFVWaz'
access_token_secret = '6zropkmLNLcEye7jFOzE5ctuic0oDplZHlhokfbzmuRW1'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token,access_token_secret)

api = tweepy.API(auth)

In [18]:
import numpy as np
from textblob import TextBlob


def tweet_analyzer(Search):
    p = 0
    n = 0
    l = 0
    if(Search==""):
        print("Search a hashtag or a topic")
    else:
        public_tweets = api.search(q = Search,count = 20, lang = "en" )
        for tweet in public_tweets:
            print(tweet.text)
            analysis = TextBlob(tweet.text)
            if analysis.sentiment.polarity > 0:
                p=p+1
#                p.append(1)
                print("1")
            elif analysis.sentiment.polarity == 0:
                l=l+1
#                l.append(1)
                print("0")
            else:
                n=n+1
#                n.append(1)
                print("-1")
            print("##############################################")

se = input("Press Enter when you are ready to search")

inter_tweet = interactive(tweet_analyzer,
                   Search = se)
display(inter_tweet)


Press Enter when you are ready to search


interactive(children=(Text(value='', description='Search'), Output()), _dom_classes=('widget-interact',))