# Collecting and Analyzing Tweets

To replicate my set up and run this notebook, please visit https://github.com/galletti94/Tweet-Monitor 


### Let's get some tweets!

For this we use the twitter api which requires you to have a twitter account. Twitter gives you access keys which are unique to your account. These keys will go in a file called keys.txt which we make sure to include in the .gitignre file so that these are not made publically available. Let's get started!

In [None]:
import tweepy
import json
from pymongo import MongoClient
from __future__ import print_function

MONGO_HOST = 'mongodb://localhost/twitterdb'

WORDS = ['#deeplearning', '#computervision', '#datascience', '#bigdata']

LOCATION = [-127.3,24.1,-65.9,51.8]

# get credentials from the keys.txt file
keys_file = open("keys.txt")
lines = keys_file.readlines()
consumer_key = lines[0].rstrip()
consumer_secret = lines[1].rstrip()
access_token = lines[2].rstrip()
access_token_secret = lines[3].rstrip()
keys_file.close()

class StreamListener(tweepy.StreamListener):
    # This is a class provided by tweepy to access the Twitter Streaming API.

    def on_connect(self):
        # Called initially to connect to the Streaming API
        print("You are now connected to the streaming API.")

    def on_error(self, status_code):
        print('An Error has occured: ' + repr(status_code))
        return False

    def on_data(self, data):
        try:
            client = MongoClient(MONGO_HOST)
            db = client.twitterdb
            
            datajson = json.loads(data) # Decode the JSON from Twitter

            # grab the 'created_at' data from the Tweet to use for display
            created_at = datajson['created_at']
            
            # only get tweets that have geo location enabled
            if datajson['coordinates']:
                # print out a message to the screen that we have collected a tweet
                print("Tweet collected at " + str(created_at))
                db.twitter_search.insert(datajson) #insert into db
                
        except Exception as e:
            print(e)


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
listener = StreamListener(api=tweepy.API(wait_on_rate_limit=True))
streamer = tweepy.Stream(auth=auth, listener=listener)

# uncomment the next two lines and comment out the last two if you want to only store United States tweets
# print("Tracking: " + 'United States')
# streamer.filter(locations=LOCATION)

#Tracking tweets that include the word in WORDS
print("Tracking: " + str(WORDS))
streamer.filter(track=WORDS)

Awesome - now we can let this run and collect some tweets!

After waiting a few minutes for your database to fill up, you can hit CTRL + C to stop the collection and stop the program.

### What do we have here?

Let's take a look at the tweets we collected!

#### Where are they from?

We can use folium to map the coordinates of the tweets on a map:

In [5]:
from pymongo import MongoClient
import folium

client = MongoClient('localhost', 27017)
db = client['twitterdb']
collection = db['twitter_search']
tweets_iterator = collection.find()

mymap = folium.Map(location=[45.372, -121.6972], zoom_start=4)

for tweet in tweets_iterator:
    if tweet['coordinates']:
        folium.CircleMarker(location=list(reversed(tweet['coordinates']['coordinates']))).add_to(mymap)
    
mymap.save('map.html')

Notice anything? It seems like very few people that tweet about deeplearning, machine learning etc. have their location enabled! Actually I collected over 10000 tweets without checking for whether location was enabled and it turned out the map produced by folium displayed no point! - none of them had their location enabled

Let's compare to tweets in the US without keywords:

![Image](./tweetsUSA.jpg)

Nice. Now let's look at what kinds of emojis people use!

#### Emojis


In [7]:
from emoji import UNICODE_EMOJI

client = MongoClient('localhost', 27017)
db = client['usa_db']
collection = db['usa_tweets_collection']
tweets_iterator = collection.find()

d = dict()
i = 0
for tweet in tweets_iterator:
  for ch in list(tweet['text']):  #remember emojis are characters not words
    if ch in UNICODE_EMOJI:
      try:
        d[ch] += 1
      except KeyError:
        d[ch] = 1

d = sorted(d.items(), key=lambda x: -x[1])
print(d[:15])

[('❤', 164), ('🔥', 160), ('🎄', 121), ('😍', 118), ('🏾', 112), ('🏼', 112), ('😂', 104), ('🏻', 90), ('🏽', 81), ('✨', 79), ('🙌', 68), ('💪', 67), ('💯', 62), ('🙏', 60), ('❄', 60)]


When this was written it was close to christmas so I guess the second and third emojis make sense.


#### Sentiment Analysis


We can analyze the words of the tweets to get a sense of whether the tweet has positive sentiment, negative sentiment or neutral sentiment. We can aggregate these and get a sense of the general mood of the US at this time of year.

In [8]:
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer

tweetCnt = 0
locEnabled = 0
for tweet in tweets_iterator:
  if 'data' in tweet['text'].lower():
    tweetCnt += 1
    if tweet['user']['location']:
      locEnabled += 1

    blob = TextBlob(tweet['text'], analyzer=NaiveBayesAnalyzer())
    if blob.sentiment.classification == 'pos':
      print('positive sentiment for the tweet: ', tweet['text'])
    if blob.sentiment.classification == 'neg':
      print('negative sentiment for the tweet: ', tweet['text'])
    if blob.sentiment.classification == 'neu':
      print('neutral sentiment for the tweet: ', tweet['text'])

Stay tuned for more!