# Tweeter Sentiment Analysis

In [2]:
# import from uw_ischool_sample file in the `data/` package (folder)
from data.uw_ischool_sample import SAMPLE_TWEETS

The data is represented as one list of dictionaries where each dictionary represents a tweet. Each dictionary contains many different values, some of which themselves may be dictionaries.

In [3]:
#print first 3 elements of sample tweets
print(SAMPLE_TWEETS[0:3])

[{'created_at': 'Mon Oct 10 18:39:51 +0000 2016', 'retweet_count': 9, 'entities': {'hashtags': [{'indices': [20, 41], 'text': 'IndigenousPeoplesDay'}]}, 'user': {'screen_name': 'UW_iSchool'}, 'text': 'RT @UWAPress: Happy #IndigenousPeoplesDay https://t.co/YmU9e9lj7v'}, {'created_at': 'Mon Oct 10 18:00:00 +0000 2016', 'retweet_count': 0, 'entities': {'hashtags': [{'indices': [16, 29], 'text': 'IdealistFair'}]}, 'user': {'screen_name': 'UW_iSchool'}, 'text': "We'll be at the #IdealistFair this evening on the Seattle U. campus. Come and learn about our graduate programs: https://t.co/et1HrQshmr"}, {'created_at': 'Mon Oct 10 15:10:36 +0000 2016', 'retweet_count': 1, 'entities': {'hashtags': []}, 'user': {'screen_name': 'UW_iSchool'}, 'text': 'RT @iYouthUW: iYouth Tips for 1st\xa0Years https://t.co/K4SCIEhJ8k https://t.co/p4lbC6Jb5o'}]


In [4]:
#import from data.sentiments_nrc module 
#it is a set of word-sentiments i.e. a list of words and emotions 
from data.sentiments_nrc import SENTIMENTS

#list of emotions
from data.sentiments_nrc import EMOTIONS

In [1]:
#print(SENTIMENTS)

In [6]:
print(EMOTIONS)

['positive', 'negative', 'anger', 'anticipation', 'disgust', 'fear', 'joy', 'sadness', 'surprise', 'trust']


## Text Sentiment
All of the sentiment analysis is based on the individual words in the text.

In [7]:
import re

#function that takes a tweet's text (a string) and splits it up into a list of individual words.

def tokenizeWords(input_text):
    #split the words
    split_words = re.split('\W+', input_text.lower())
    #remove words with less than 2 letters
    tokenize_list = [elem for elem in split_words if len(elem)>1]
    return tokenize_list

This function filters a list of the words to get only those words that contain a specific emotion. 
To determien whether a word has a particular emotion or not, the word is looked up in the imported SENTIMENTS variable. The word is used as a "key" to find the dictionary of emotions for that word, and then the emotion is used as the key to that dictionary to determine if the word has it. 

In [8]:
#function determines which words from a list have _each_ emotion (i.e., the "emotional" words), producing a dictionary of that information.

def filterWords(tokenize_list):
    filtered_dict = {}
    for emotion in EMOTIONS:
        #get the emotion for every word in tokenized list
        filtered_dict[emotion] = [word for word in tokenize_list if SENTIMENTS.get(word) and
                                                                     SENTIMENTS.get(word).get(emotion)==1]
    return filtered_dict

This function gets a list of the "most common" words in a list. It is a new list containing each word in the original list, in descending order by how many times that word appears in the orignal list.

In [9]:
def commonWords(words_list):
    count_words = {}
    for word in words_list:
        if word in count_words.keys():
            #get the count of the word from new dictionary count_words
            val = count_words.get(word)
            #update the count if you come across the same word again
            count_words.update({word:val+1})
        else:
            count_words.update({word:1})
    
    #sort the list in desc order
    sortedList = sorted(words_list, key=count_words.__getitem__,reverse = True)
    
    newList = []
    for i in sortedList:
        if i not in newList:
            newList.append(i)
    return newList
    
w =  ['a','b','c','c','c','a']
commonWords(w)  

['c', 'a', 'b']

## Tweet Statistics

This function takes a list of tweet data as an argument and returns the data of interest to display in a tabular format. For each emotion, the following things are produced:

1. The most common words across all tweets that have that emotion
2. The percentage of words across all tweets that have that emotion

In [10]:
from itertools import chain
from functools import reduce
from collections import defaultdict

def analyze_tweets(tweets):
    
    emotion_dict = {emotion:[] for emotion in EMOTIONS}
    templist = []
    tweet_stats = defaultdict(list)
    
    for tweet in tweets:
        #get the text of every tweet, tokenize and filter the words
        tokenized = tokenizeWords(tweet['text'])
        filtered = filterWords(tokenized)
        templist.append(filtered)

    #associate the emotion with the words
    for emotion_value in templist:
        for key in emotion_dict.keys():
            if emotion_value[key]!=[]:
                emotion_dict[key].append(emotion_value[key])
            
    emotion_dict = {key:sum(value,[]) for key,value in emotion_dict.items()}
    #call the func to get most common words
    mostCommonWords = {key:commonWords(emotion_dict[key])[0:3] for key in emotion_dict}  
    #reduce funtion to count number of words
    total= reduce(lambda x,y: x+y,[len(tokenizeWords(tweet['text'])) for tweet in tweets])
    #count the percentage of emotion
    emotionPerc = {key: round(100*len(value)/total,2) for key,value in emotion_dict.items()}
    
    for element in (emotionPerc,mostCommonWords):
        for key, value in element.items():    
            tweet_stats[key].append(value)
            
            
    return tweet_stats

In [11]:
analyze_tweets(SAMPLE_TWEETS)

defaultdict(list,
            {'positive': [6.16, ['learn', 'faculty', 'happy']],
             'negative': [0.88, ['fall', 'rejection', 'outstanding']],
             'anger': [0.33, ['rejection', 'disaster', 'involvement']],
             'anticipation': [2.53, ['happy', 'top', 'ready']],
             'disgust': [0.44, ['rejection', 'weird', 'finally']],
             'fear': [0.44, ['rejection', 'surprise', 'problem']],
             'joy': [1.76, ['happy', 'peace', 'deal']],
             'sadness': [0.55, ['fall', 'rejection', 'problem']],
             'surprise': [0.99, ['deal', 'award', 'surprised']],
             'trust': [3.08, ['school', 'faculty', 'happy']]})

### Displaying Tweets 

In [12]:
def show(tweet_stats):
    print("{0:14} {1:11} {2:35}".format("EMOTION","% WORDS","EXAMPLE WORDS"))
    
    #display tweets,emotion percentage and words
    for key,val in sorted(tweet_stats.items(), key=lambda x: x[1][0],reverse=True):
        print("{0:14} {1:<11} {2:35}".format(key, str(round(val[0],2))+"%", ', '.join(val[1]))) 

### Fetching Data

In [13]:
import requests
import json

#function to get the live data
def downloadData(username):
    parameters = {'screen_name':username}
    response = requests.get('https://faculty.washington.edu/joelross/proxy/twitter/timeline/',params = parameters)
    data = response.text
    parsed = json.loads(data)
    return parsed
     

The main function prompts the user for a Twitter username, calls the download function to fetch the tweets, and passes it to analyze and show functions in order to display sentiment analysis of the user's timeline. 

In [15]:
#main function
def main():
    name = input("Please enter your Twitter username-->")
    name = name.lower()
    #give sample_tweet as inputs if user enters sample
    if name == 'sample':
        analyzedData = analyze_tweets(SAMPLE_TWEETS)
    else: 
        tweetData = downloadData(name)
        analyzedData = analyze_tweets(tweetData)
        
    show(analyzedData)
    

if __name__ == '__main__':
     main() 

Please enter your Twitter username-->sample
EMOTION        % WORDS     EXAMPLE WORDS                      
positive       6.16%       learn, faculty, happy              
trust          3.08%       school, faculty, happy             
anticipation   2.53%       happy, top, ready                  
joy            1.76%       happy, peace, deal                 
surprise       0.99%       deal, award, surprised             
negative       0.88%       fall, rejection, outstanding       
sadness        0.55%       fall, rejection, problem           
disgust        0.44%       rejection, weird, finally          
fear           0.44%       rejection, surprise, problem       
anger          0.33%       rejection, disaster, involvement   


Based on the following output, the current sentiments of ischool and cse are very different.
The most common word for CSE is mad for most emotions. 
On the other hand, for ischool most words fall under the positive emotion and hence they have the highest percentage in the tweets. 

In [17]:
main()

Please enter your Twitter username-->ischool
EMOTION        % WORDS     EXAMPLE WORDS                      
positive       7.81%       build, enhance, feature            
trust          2.02%       assessment, school, show           
negative       0.76%       subject, remove                    
anticipation   0.76%       attendance, start                  
surprise       0.76%       assessment                         
anger          0.25%       remove                             
fear           0.25%       remove                             
joy            0.25%       create                             
sadness        0.25%       remove                             
disgust        0.0%                                           


In [18]:
main()

Please enter your Twitter username-->CSE
EMOTION        % WORDS     EXAMPLE WORDS                      
negative       9.09%       mad                                
anger          9.09%       mad                                
anticipation   9.09%       time                               
disgust        9.09%       mad                                
fear           9.09%       mad                                
sadness        9.09%       mad                                
positive       0.0%                                           
joy            0.0%                                           
surprise       0.0%                                           
trust          0.0%                                           
