<a href="https://colab.research.google.com/github/nitin-barthwal/TextAnalytics/blob/master/TwitterSentimentAnalysisUsingTextBlob.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twitter Sentiment Analysis using TextBlob

TextBlob provides an API that can perform different Natural Language Processing (NLP) tasks like Part-of-Speech Tagging, Noun Phrase Extraction, Sentiment Analysis, Classification (Naive Bayes, Decision Tree), Language Translation and Detection, Spelling Correction, etc.


TextBlob is built upon Natural Language Toolkit (NLTK).

Sentiment Analysis means analyzing the sentiment of a given text or document and categorizing the text/document into a specific class or category (like positive and negative). In other words, we can say that sentiment analysis classifies any particular text or document as positive or negative. Basically, the classification is done for two classes: positive and negative. However, we can add more classes like neutral, highly positive, highly negative, etc.

**Installing TextBlob**


In [0]:
#pip install -U textblob
#python -m textblob.download_corpora

** TextBlob Sentiment Analysis **

I will use textblob that does Sentiment Analysis on any given text. 
The sentiment property gives the sentiment scores to the given text. There are two scores given: Polarity and Subjectivity.

**The polarity score is a float within the range [-1.0, 1.0] where negative value indicates negative text and positive value indicates that the given text is positive.**



**The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.**

In [6]:
from textblob import TextBlob
nltk.download('punkt')
 
text = TextBlob("I liked machine learning.I am working on it and want to build innovative solutions")
 
print (text.sentiment)
#print ('polarity: {}'.format(text.sentiment.polarity))
#print ('subjectivity: {}'.format(text.sentiment.subjectivity))

 
text = TextBlob("I dont like Non vegitarian food.It is not good for your health")
print (text.sentiment)
 
text = TextBlob("I liked to travel and meet new people.Its a great learning excercise.")
print (text.sentiment)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Sentiment(polarity=0.55, subjectivity=0.9)
Sentiment(polarity=-0.35, subjectivity=0.6000000000000001)
Sentiment(polarity=0.5121212121212121, subjectivity=0.6681818181818183)


**Using NLTK’s Twitter Corpus**

– I am using twitter_samples corpus to train the TextBlob's NaiveBayesClassifier.
– Using the twitter_samples corpus, I created a train set and test set containing a certain amount of positive and negative tweets.
– And, then  test the accuracy of the trained classifier.

In [7]:
import nltk
nltk.download('twitter_samples')

from nltk.corpus import twitter_samples


print ('Dataset ::',twitter_samples.fileids())

pos_tweets = twitter_samples.strings('positive_tweets.json')
print ('No of positive tweets ',len(pos_tweets))
 
neg_tweets = twitter_samples.strings('negative_tweets.json')
print ('No of negative tweets ',len(neg_tweets)) 
 
  
# positive tweets words list
pos_tweets_set = []
for tweet in pos_tweets:
    pos_tweets_set.append((tweet, 'pos'))

# negative tweets words list
neg_tweets_set = []
for tweet in neg_tweets:
    neg_tweets_set.append((tweet, 'neg'))

# radomize pos_reviews_set and neg_reviews_set
# doing so will output different accuracy result everytime we run the program
from random import shuffle 
shuffle(pos_tweets_set)
shuffle(neg_tweets_set)

[nltk_data] Downloading package twitter_samples to /root/nltk_data...
[nltk_data]   Package twitter_samples is already up-to-date!
Dataset :: ['negative_tweets.json', 'positive_tweets.json', 'tweets.20150430-223406.json']
No of positive tweets  5000
No of negative tweets  5000


**Create Train and Test Set**

Creating a  train and test set:

– test set = 2000 tweets (1000 positive + 1000 negative)

– train set = 3000 tweets (1500 positive + 1500 negative)


In [8]:
test_set = pos_tweets_set[:500] + neg_tweets_set[:500]
train_set = pos_tweets_set[1000:2500] + neg_tweets_set[1000:2500]
 
print('Test Data Size ',len(test_set))
print('Train Data Size ',len(train_set)) 
 
# train classifier
from textblob.classifiers import NaiveBayesClassifier
classifier = NaiveBayesClassifier(train_set)

Test Data Size  1000
Train Data Size  3000


**Training the Classifier & Calculating Accuracy**

In [12]:
# calculate accuracy
accuracy = classifier.accuracy(test_set)
print ('Accuracy  :: ',accuracy*100) 
 
# show most frequently occurring words
print (classifier.show_informative_features(10))

 
text = "It was a nice place , I enjoyed visting and planning to come again."
print (text ,'  :: ', classifier.classify(text)) 
 
text = "I dont want to talk to you . You are mean and selfish"
print (text,'  :: ', classifier.classify(text)) 
 

Accuracy  ::  74.1
Most Informative Features
             contains(D) = True              pos : neg    =     77.0 : 1.0
        contains(Thanks) = True              pos : neg    =     45.7 : 1.0
          contains(miss) = True              neg : pos    =     29.4 : 1.0
           contains(via) = True              pos : neg    =     17.7 : 1.0
          contains(Love) = True              pos : neg    =     16.3 : 1.0
           contains(sad) = True              neg : pos    =     14.6 : 1.0
         contains(Thank) = True              pos : neg    =     12.4 : 1.0
           contains(See) = True              pos : neg    =     12.3 : 1.0
           contains(TOO) = True              neg : pos    =     12.3 : 1.0
           contains(AND) = True              neg : pos    =     12.3 : 1.0
None
It was a nice place , I enjoyed visting and planning to come again.   ::  pos
I dont want to talk to you . You are mean and selfish   ::  neg
