# Sentiment Analysis with TextBlob

Sentiment analysis is a method for discerning emotions in text. [TextBlob](https://textblob.readthedocs.io/en/latest/index.html) is a python library for processing textual data. I'm using it here to perfom sentiment analysis based on the Naive Bayes analyzer which is a model pre-trained on a dataset of movie reviews.  

In this notebook I explore two different textual datasets. First I experiment with the Twitter API to analyze a small dataset. Second I use analyze a massive textual dataset from Ubuntu IRC.

In [4]:
import re
import tweepy
from tweepy import OAuthHandler
from textblob import TextBlob
 
class TwitterClient(object):
    '''
    Generic Twitter Class for sentiment analysis.
    '''
    def __init__(self):
        '''
        Class constructor or initialization method.
        '''
        # keys and tokens from the Twitter Dev Console
        consumer_key = 'djCoU87E8ziezWSZjaHmGOFNv'
        consumer_secret = 'WJe8Qotu4agT777IjusYD6OZmnxJeY2tEUrpj1Me4Q49JQqSaB'
        access_token = '2813666233-1zmbrE7Eij1m9AIzJcYEuWya3LaHPHXVTqLTQE7'
        access_token_secret = 'mVUPzUQDlccwGmWFCsq9doRRXRJAfdvjvIcaqElOkYrB5'
 
        # attempt authentication
        try:
            # create OAuthHandler object
            self.auth = OAuthHandler(consumer_key, consumer_secret)
            # set access token and secret
            self.auth.set_access_token(access_token, access_token_secret)
            # create tweepy API object to fetch tweets
            self.api = tweepy.API(self.auth)
        except:
            print("Error: Authentication Failed")
 
    def clean_tweet(self, tweet):
        '''
        Utility function to clean tweet text by removing links, special characters
        using simple regex statements.
        '''
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) | (\w+:\/\/\S+)", " ", tweet).split())

    def get_tweet_sentiment(self, tweet):
        '''
        Utility function to classify sentiment of passed tweet
        using textblob's sentiment method
        '''
        # create TextBlob object of passed tweet text
        analysis = TextBlob(self.clean_tweet(tweet))
        # set sentiment
        if analysis.sentiment.polarity > 0:
            return 'positive'
        elif analysis.sentiment.polarity == 0:
            return 'neutral'
        else:
            return 'negative'
 
    def get_tweets(self, query, count = 10):
        '''
        Main function to fetch tweets and parse them.
        '''
        # empty list to store parsed tweets
        tweets = []
 
        try:
            # call twitter api to fetch tweets
            fetched_tweets = self.api.search(q = query, count = count)
 
            # parsing tweets one by one
            for tweet in fetched_tweets:
                # empty dictionary to store required params of a tweet
                parsed_tweet = {}
 
                # saving text of tweet
                parsed_tweet['text'] = tweet.text
                # saving sentiment of tweet
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text)
 
                # appending parsed tweet to tweets list
                if tweet.retweet_count > 0:
                    # if tweet has retweets, ensure that it is appended only once
                    if parsed_tweet not in tweets:
                        tweets.append(parsed_tweet)
                else:
                    tweets.append(parsed_tweet)
 
            # return parsed tweets
            return tweets
 
        except tweepy.TweepError as e:
            # print error (if any)
            print("Error : " + str(e))
 
def main():
    # creating object of TwitterClient Class
    api = TwitterClient()
    # calling function to get tweets
    tweets = api.get_tweets(query = '@iandownard', count = 200)
 
    # picking positive tweets from tweets
    ptweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive']
    # percentage of positive tweets
    print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets)))
    # picking negative tweets from tweets
    ntweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative']
    # percentage of negative tweets
    print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets)))
    # percentage of neutral tweets
    print("Neutral tweets percentage: {} %".format(100*(len(tweets) - len(ntweets) - len(ptweets))/len(tweets)))

    # printing first 5 positive tweets
    print("\n\nPositive tweets:")
    for tweet in ptweets[:10]:
        print(tweet['text'])
 
    # printing first 5 negative tweets
    print("\n\nNegative tweets:")
    for tweet in ntweets[:10]:
        print(tweet['text'])
 

if __name__ == "__main__":
    # calling main function
   main()

Positive tweets percentage: 14 %
Negative tweets percentage: 0 %
Neutral tweets percentage: 85 %


Positive tweets:
RT @iandownard: @KirkDBorne @mapr Here's a bit of Drill in Action &gt; https://t.co/AYY0vBQz0U


Negative tweets:


# Sentiment Analysis from Ubuntu IRC data

The dataset I'm using is the [Ubuntu Dialogue Corpus v2.0] which consists of two-person chat conversations from an Ubuntu IRC channel. 

Here's how I generated the dataset:

```
git clone https://github.com/rkadlec/ubuntu-ranking-dataset-creator
pip install unicodecsv
cd src
./generate.sh -t -s -l
tar -xzvf ubuntu_dialogs.tgz
```

## References: 
[http://irclogs.ubuntu.com](http://irclogs.ubuntu.com)<br>
[https://arxiv.org/pdf/1506.08909.pdf](https://arxiv.org/pdf/1506.08909.pdf)<br>
[https://github.com/rkadlec/ubuntu-ranking-dataset-creator](https://github.com/rkadlec/ubuntu-ranking-dataset-creator)



In [30]:
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
from textblob import formats
#from textblob.classifiers import NaiveBayesAnalyzer
import csv

text=""
with open('/Users/idownard/tmp/ubuntu-ranking-dataset-creator/src/dialogs/9/1.tsv', 'r') as f:
    reader=csv.reader(f,delimiter='\t')
    for row in reader:
        text = text+". "+(row[-1])

blob = TextBlob(text, analyzer=NaiveBayesAnalyzer())
print(blob.sentiment)

Sentiment(classification='pos', p_pos=0.5449551130839605, p_neg=0.45504488691603534)
