# Twitter Sentiment Analysis

This is a quick project which performs sentiment analysis on tweets using the Natural Language Toolkit (NLTK). The file TweetGrabber.py is used to save any amount of real-time tweets through a Twitter Stream for any search term. The example file used here gathered 3692 tweets in about 10 minutes when the search term was "Trump".

In [1]:
import sys
import operator
import json
from collections import Counter
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

Create an object for sentiment analysis and then initialize variables for the current most positive and negative tweet sentiments

In [2]:
sia = SentimentIntensityAnalyzer()
pos_score = 0
neg_score = 0

Hardcode a filename for the purposes of using on GitHub. This would normally be a command line argument.

In [3]:
fname = "tweets_2-26-2017.json"

Open the file and parse by line (each line is a tweet in JSON format)

In [8]:
with open(fname, 'r') as f:
	count_all = Counter()
	for line in f:
		tweet = json.loads(line).get('text')

Make sure the tweet is not Null and then get a sentiment analysis score for it

In [None]:
		#filter out nulls
		if(tweet is not None):
			#get sentiment score for tweet
			score = sia.polarity_scores(tweet)

Check whether this tweet scores higher or lower than all previous

In [None]:
			#store most positive tweet
			if(score['pos'] > pos_score):
				pos_score = score['pos']
				pos = {'score':pos_score, 'tweet':tweet}

			#store most negative tweet
			if(score['neg'] > neg_score):
				neg_score = score['neg']
				neg = {'score':neg_score, 'tweet':tweet}

Count the most common terms across all tweets that were saved. Only count the word if it has more than 4 characters. This helps filter out most emojis and other short, useless words.

In [9]:
			#store a count of most common terms
			words = tweet.split()
			terms_all = [term for term in words if len(term)>4]
			count_all.update(terms_all)

Print the most frequently tweeted words, the most positive tweet, and the most negative tweet

In [5]:
print('Most used terms:')
print(nltk.FreqDist(terms_all).keys())

Most used terms:
dict_keys(['White', 'House', 'Social', '@THR:', 'Trump', 'https://t.co/Z69wgwywiC', 'media', 'Dinner', 'Baldwin', 'suggests', 'https://t.co/WrqNRD…', 'replace', "Correspondents'"])


In [6]:
print('\nMost positive tweet is: ' + str(pos['score']))
print(pos['tweet'])


Most positive tweet is: 0.733
Keep hope alive. https://t.co/huOJ8nwf6c


In [7]:
print('\nMost negative tweet is: ' + str(neg['score']))
print(neg['tweet'])


Most negative tweet is: 0.853
Frightening gullibility. https://t.co/aW5eH8qomH
