# Twitter Sentiment Analysis

The following project uses python 3 to download the tweets, store them in mongoDB database and finally perform sentiment analysis on them.

### The Approach

The tasks performed in this project are as following:
1. Download the tweets.
2. Save the downloaded(raw) tweets in a csv file.
3. Import the csv file as a dataframe and clean the tweets column and other columns as well.
4. Save the dataframe with clean columns as another csv file. This serves as a backup copy.
5. Import the data from cleaned csv file, convert it into json format and save it mongoDB database.
6. Import the tweets from mongodb database, perform sentiment analysis(classify them as positive, negative or neutral) and update the database with same information(as a new field).

__Import libraries__

We begin with importing all the required libraries for this project.

In [None]:
import pymongo
from pymongo import MongoClient
import tweepy
import csv
import pandas as pd
import json
import re
from textblob import TextBlob
import jdc # a library to break down class into multiple cells in Jupyter Notebook

__Define Class__

Next, we define a class 'Twitter' to incorporate all the required functions for this project.
We define a __init__ method and mention all the keys required to connect to twitter API

In [None]:
class Twitter:
    #initializing the relevant keys for tweets downloads
    def __init__(self):
        self.consumer_key = 'Your_Consumer_Key'
        self.consumer_secret = 'Your_Consumer_Secret'
        self.access_token = 'Your_Access_Token'
        self.access_token_secret = 'Your_Access_Token_Secret'

__Search and Download Tweets__

The searchTweets() method is defined to connect to twitter API, search for tweets relating to specific topic and save them in a csv file. This will be a raw file containing all the information regarding the tweets in an unformatted order.

In [None]:
%%add_to Twitter
def searchTweets(self):
        auth = tweepy.OAuthHandler(self.consumer_key, self.consumer_secret)
        auth.set_access_token(self.access_token, self.access_token_secret)
        api = tweepy.API(auth,wait_on_rate_limit=True)
        
        csvFile = open('rawTweets.csv', 'a')
        csvWriter = csv.writer(csvFile)
        
        #creating header for our csv file
        writer = csv.DictWriter(csvFile, fieldnames = ["Tweet ID", "Username", "User's ScreenName","User's Location","User's Description","User's Language","User's Followers","User's Friends","Tweet Creation Time","Tweet Text"])
        writer.writeheader()
        
        #writing the downloaded tweets in csv file
        for tweet in tweepy.Cursor(api.search,q="#aircanada",lang="en",since="2018-07-25").items(100):
            csvWriter.writerow([tweet.id,tweet.user.name.encode("utf-8"),tweet.user.screen_name.encode("utf-8"),tweet.user.location.encode("utf-8"),tweet.user.description.encode("utf-8"),tweet.user.lang.encode("utf-8"),tweet.user.followers_count,tweet.user.friends_count,tweet.created_at, tweet.text.encode('utf-8')])
        csvFile.close()

**Clean Individual Tweets**

cleanSingleTweet() method is written to perform cleaning operation using regular expressions on each tweet. 
This method is later called in cleanAllTweets() method.

In [None]:
%%add_to Twitter
def cleanSingleTweet(self,tweet):
        return re.sub(r"(@|:|#|\\t|\\n|https://.*|\\x\S+)","",tweet)

__Clean Entire Data__

cleanAllTweets() method cleans all the fields as well as the tweets field to be used later for sentiment analysis. This cleaned data is stored as a csv file.

In [None]:
%%add_to Twitter
def cleanAllTweets(self):     
        df = pd.read_csv("rawTweets.csv")              #read csv with pandas df
        df["Username"]= df["Username"].str[2:-1]       #remove unwanted letters such as 'b'
        df["User's ScreenName"]= df["User's ScreenName"].str[2:-1]
        df["User's Location"]=df["User's Location"].str[2:-1]
        df["User's Description"]=df["User's Description"].str[2:-1]
        df["User's Language"]=df["User's Language"].str[2:-1]
        df["Tweet Text"]=df["Tweet Text"].str[2:-1]
        
        df["Clean Tweet Text"] = df["Tweet Text"].apply(self.cleanSingleTweet)
        
        #print(df.head())     #Uncomment to check the contents of the dataframe
        df.to_csv("cleanTweets.csv", index = False) #Save the cleaned tweets in csv file

__Save Data into MongoDB Database__

The cleaned csv file is imported as a pandas dataframe. The dataframe is converted to json format. A connection is established with mongodb and a new database and collection is created. The json format data is then inserted to the mongoDB database.

In [None]:
%%add_to Twitter
def saveTweetsToDB(self):
        #read from csv and save the contents in desired database (mongoDB in this case)
        client = MongoClient('localhost', 27017)  
        myDB = client['twitterDB'] 
        collection_name = 'tweets_collection' 
        db_cm = myDB[collection_name]
        
        data = pd.read_csv("cleanTweets.csv")
        data_json = json.loads(data.to_json(orient='records'))
        db_cm.remove()
        db_cm.insert(data_json)

__Sentiment Analysis of Individual Tweet__

sentimentAnalysis() method takes single tweet as input and returns a result (positive negative or neutral) based on the polarity generated using textblob library.

In [None]:
%%add_to Twitter
def sentimentAnalysis(self,tweet):  #a function to calculate the polarity in order to determine the sentiment of the tweet
        analysis = TextBlob(tweet)
        tweetPolarity = analysis.sentiment.polarity
        if tweetPolarity > 0:
            return 'Positive'
        elif tweetPolarity < 0:
            return 'Negative'
        else:
            return 'Neutral'

__Sentiment Analysis of all Tweets__

allTweetsAnalysis() method is used to extract all the tweets from the database and perform sentiment analysis on them. Finally, a new field representing the sentiment of each tweet is generated and the mongodb collection is updated.

In [None]:
%%add_to Twitter
def allTweetsAnalysis(self):
        client = MongoClient('localhost', 27017)
        #allDBs = client.list_database_names()
        myDB = client['twitterDB']
        collection_name = 'tweets_collection'
        dbCollection = myDB[collection_name]
        
        documents =dbCollection.find()
        #print(documents)
        
        #tweet = [] #Use these lists if you want to check sentiment analysis performed by textblob
        #result = []
        for document in documents:
            tempPolarity = self.sentimentAnalysis(document["Tweet Text"])
            dbCollection.update({"_id": document["_id"]}, {"$set": {"SentimentAnalysis": tempPolarity}})
            #tweet.append(document['Tweet Text'])
            #result.append(tempPolarity)
        #print(result)

#End of class 'Twitter'

__Calling the main() method__

Lastly, we define main() method to declare the object of 'Twitter' class. We finally call main() method which in turn calls the relevant methods defined in class Twitter.

In [None]:
def main():
    t = Twitter()
    t.searchTweets()
    t.cleanAllTweets()
    t.saveTweetsToDB()
    t.allTweetsAnalysis()
main()