# Twitter Stream Analysis

### Reference:
* https://github.com/sridharswamy/Twitter-Sentiment-Analysis-Using-Spark-Streaming-And-Kafka
* https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object
* https://github.com/jasonqng/twitter-streaming-python/blob/master/twitter_streaming.ipynb

In [8]:
import configparser
import os

path = os.path.normpath(os.path.join(os.getcwd(), 'credentials.txt'))
config = configparser.ConfigParser()
config.read(path)
consumer_key = config['DEFAULT']['consumerKey']
consumer_secret = config['DEFAULT']['consumerSecret']
access_token = config['DEFAULT']['accessToken']
access_token_secret = config['DEFAULT']['accessTokenSecret']

In [9]:
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

Get some information by user id. For example, realDonaldTrump...

In [11]:
api = tweepy.API(auth)
realDonaldTrump = api.get_user('933173737')
print("User Id: {}".format(realDonaldTrump.screen_name))
print("Number of followers: {}".format(realDonaldTrump.followers_count))
print("Number of friends: {}".format(len(realDonaldTrump.friends())))
# for friend in realDonaldTrump.friends():
#     print(friend.screen_name)

User Id: girlsgolds
Number of followers: 2108
Number of friends: 20


Headline from my twitter account.

In [5]:
public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

Find out the winners from tonight's #GoldenGlobes as they are announced here: https://t.co/HTybCsUOAw https://t.co/KihLWS28Fg
Planet Computers' Gemini is a tiny Android laptop with the spirit of Psion https://t.co/g00nUbLc1q https://t.co/ctbfmYi8oR
Samsung's Notebook 9 Pen is a super-light Galaxy Note/laptop mashup https://t.co/74hrhrf7vC
Best Supporting Actor in Limited Series or TV Movie: 

Alexander Skarsgård

#GoldenGlobes https://t.co/UcmZRvmjB0
Why “The Room” is a better movie than James Franco’s “The Disaster Artist”: https://t.co/U0878T3UHr https://t.co/55EUHUIexo
Seth Meyers nailed his Golden Globes monologue—by giving away the mic https://t.co/zPd9OpdtXZ
#GoldenGlobes #MeToo #TIMESUP https://t.co/zLEMCoEQiR
.@SpaceX successfully launches top-secret Zuma spacecraft https://t.co/StSLMVG16m https://t.co/OWIdHFJvgD
Margaret Atwood, the prophet of dystopia: https://t.co/DnYlnUTuPS https://t.co/fEMWzVNh4O
Elisabeth Moss wins the Golden Globe for Best Actress in a TV Drama: https://

Create custom stream lister

In [25]:
import pandas as pd
import json
import numbers
import re

class MyStreamListener(tweepy.StreamListener):
    """ Options you can set by passing to the MyStreamListener object:
        limit: int, how many tweets to capture
        print_output: bool, whether to print the tweet to screen
        save_output: bool, whether to save the tweet data to a csv file
        filename: str, the filename to name the saved output, by default it's file.csv
        include_rts: bool, whether to capture retweets
        strict_text_search: bool, ocasionally, stream will capture a tweet that doesn't actually include the search query
            set to True to filter out these "accidental" tweets
        search_terms: str or array, pass in the search query or an array of terms you want to use for filtering
            if strict_text_search = True. Script checks and turns any string into array of strings
    """
    def __init__(self, limit=20, print_output=True, save_output=True,
                 filename='file.csv', include_rts=True,strict_text_search=False,
                 search_terms=None):
        self.df = pd.DataFrame()
        self.limit = limit
        self.counter = 0
        self.print_output = print_output
        self.header=False
        self.save_output=save_output
        self.filename=filename
        self.include_rts=include_rts
        self.strict_text_search = strict_text_search
        self.search_terms = search_terms

    def on_data(self, data):
        d = {}
        decoded = json.loads(data)
        # full list of fields you can collect: https://dev.twitter.com/overview/api/tweets
        tweet_fields_to_collect = ['created_at','id','text','source','favorite_count','coordinates','lang','place','retweet_count','retweeted','truncated']
        user_fields_to_collect = ['name','screen_name','location','id_str','statuses_count','followers_count','friends_count','favourites_count','description']
        if self.strict_text_search:
            if not isinstance(self.search_terms, list):
                self.search_terms = re.findall(r"[\w']+", self.search_terms)
            if not any(term.lower() in decoded['text'].lower() for term in self.search_terms):
                print("skipped")
                print(decoded['text'])
                return True
        for k,v in decoded.items():
            if k in tweet_fields_to_collect:
                if isinstance(v, numbers.Number):
                    v = str(v)
                try:
                    d['tweet_' + k.strip()] = v
                except:
                    print("Failure collecting tweet field", v.encode('ascii', 'ignore'))
            if k=='user':
                for user_k,user_v in v.items():
                    if user_k in user_fields_to_collect:
                        if isinstance(user_v, numbers.Number):
                            user_v = str(user_v)
                        try:
                            d[user_k.strip()]=user_v
                        except:
                            print("Failure collecting user field",user_v.encode('ascii', 'ignore'))
            if k=='retweeted_status':
                for retweet_k,retweet_v in v.items():
                    if retweet_k in tweet_fields_to_collect:
                        if isinstance(retweet_v, numbers.Number):
                            retweet_v = str(retweet_v)
                        try:
                            d['retweet_'+retweet_k.strip()]=retweet_v
                        except:
                            print("Failure collecting retweet field",user_v.encode('ascii', 'ignore'))
        if not self.include_rts:
            if ('retweet_text' in d and len(d['retweet_text'])>0) or d['tweet_text'].startswith('RT @'):
                return True
        tweet_df = pd.DataFrame(d, index=[0])
        frames = [self.df, tweet_df]
        self.df = pd.concat(frames)
        self.counter+=1
        if self.print_output:
            try:
                print(json.dumps(decoded, indent=1))
            except:
                print("Failure outputting tweet text",decoded['text'].encode('ascii', 'ignore'))
        if self.counter>=self.limit:
            print("finished collecting %s tweets, ending" % self.limit)
            if self.include_rts and 'retweet_text' in self.df.columns:
                self.df = self.df[['tweet_' + x for x in tweet_fields_to_collect] + user_fields_to_collect + ['retweet_' + x for x in tweet_fields_to_collect]]
            else:
                self.df = self.df[['tweet_' + x for x in tweet_fields_to_collect] + user_fields_to_collect]
            self.df.rename(columns={'id_str':'user_id'},inplace=True)
            self.df.to_csv(self.filename, index=False, encoding='utf-8')
            return False
        else:
            return True
        
    def on_error(self, status_code):
        if status_code == 420:
            return False
        
    def on_disconnect(self, notice):
        print("disconnecting due to " + str(notice))

In [8]:
import datetime
search_query = 'android'
filename = '%s_%s.csv' % (search_query, datetime.datetime.now().strftime("%Y-%m-%d_%H.%M.%S"))
myStreamListener = MyStreamListener(limit=10,
                                    print_output=True,
                                    save_output=False,
                                    search_terms=search_query,
                                    strict_text_search=True)
myStream = tweepy.Stream(auth, listener=myStreamListener)
myStream.filter(track=[search_query])
android_df = pd.read_csv(filename)

skipped
@BadabunOficial @YouTube Comen así  #Rayito #Rayito2 #Rayito #Rayito #Rayito2 #Rayito #Rayito #Rayito #Rayito… https://t.co/JObZsZNeuj
skipped
Saya sedang mendengarkan "We Are Young (feat. Janelle Monáe)-fun.;Janelle Monáe". Nikmati  https://t.co/WMXL1CjGPk https://t.co/DA0W5HQB68
skipped
RT @RedLinkAR: Enviar una #SolicitudGrupal a tus contactos divide el monto total y lleva la cuenta de quienes envían su parte. ¡Probalo!
⒱…
skipped
@RickyRoma0 c) People would be forced to make a budget and live within their means. 

How many people on food stamp… https://t.co/shJBAv0gxQ
skipped
It’s really too clear, I can’t take selfies unprepared now. https://t.co/CFcMqyCVEL
skipped
Esse ano ❤ https://t.co/R55KQpnctz
skipped
RT @Tarekizoo: G offert un chargeur a ma meuf pr son anversaire el a pleure psk c nul alors que ca fai 1 semaine elle msoule pr voler mon c…
skipped
I need my upgrade already https://t.co/lH1cQhhk9C
skipped
@BadabunOficial #Rayito #Rayito #Rayito #Rayito #Rayito #Rayito

In [10]:
android_df

Unnamed: 0,tweet_created_at,tweet_id,tweet_text,tweet_source,tweet_favorite_count,tweet_coordinates,tweet_lang,tweet_place,tweet_retweet_count,tweet_retweeted,...,retweet_id,retweet_text,retweet_source,retweet_favorite_count,retweet_coordinates,retweet_lang,retweet_place,retweet_retweet_count,retweet_retweeted,retweet_truncated
0,Mon Jan 08 01:58:24 +0000 2018,950184882747183105,#Rayito #Rayito2\n➡️Participando por ese iphon...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",0,,es,,0,False,...,,,,,,,,,,
1,Mon Jan 08 01:58:24 +0000 2018,950184884072538113,Vai toma no cu com iPhone x kkkkkkkkk,"<a href=""http://twitter.com/download/android"" ...",0,,pt,,0,False,...,,,,,,,,,,
2,Mon Jan 08 01:58:25 +0000 2018,950184885829931009,RT @ShaianaTalia: Front camera vs. Back camera...,"<a href=""http://twitter.com/download/iphone"" r...",0,,en,,0,False,...,9.501101e+17,Front camera vs. Back camera on the iPhone 8+ ...,"<a href=""http://twitter.com/download/iphone"" r...",3640.0,,en,,677.0,False,False
3,Mon Jan 08 01:58:25 +0000 2018,950184887327305730,"hola joel,tú!! si tú el q lee esto en la panta...","<a href=""http://twitter.com/download/android"" ...",0,,es,,0,False,...,,,,,,,,,,
4,Mon Jan 08 01:58:25 +0000 2018,950184887587352577,I liked a @YouTube video https://t.co/zyd0LaKn...,"<a href=""http://www.google.com/"" rel=""nofollow...",0,,en,,0,False,...,,,,,,,,,,
5,Mon Jan 08 01:58:25 +0000 2018,950184888837296128,RT @ShaianaTalia: Front camera vs. Back camera...,"<a href=""http://twitter.com/download/iphone"" r...",0,,en,,0,False,...,9.501101e+17,Front camera vs. Back camera on the iPhone 8+ ...,"<a href=""http://twitter.com/download/iphone"" r...",3640.0,,en,,678.0,False,False
6,Mon Jan 08 01:58:26 +0000 2018,950184890766712838,RT @MySocialBrew: Two Apple investors want the...,"<a href=""http://twitter.com/download/android"" ...",0,,en,,0,False,...,9.501803e+17,Two Apple investors want the company to study ...,"<a href=""http://bufferapp.com"" rel=""nofollow"">...",0.0,,en,,1.0,False,False
7,Mon Jan 08 01:58:26 +0000 2018,950184892616380417,The iPhone X is literally the shit. 😫😻,"<a href=""http://twitter.com/download/iphone"" r...",0,,en,,0,False,...,,,,,,,,,,
8,Mon Jan 08 01:58:27 +0000 2018,950184894151438337,#Rayito #Rayito2\n➡️Participando por ese iphon...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",0,,es,,0,False,...,,,,,,,,,,
9,Mon Jan 08 01:58:27 +0000 2018,950184897926385664,@DebRyanShow Quiero el IPhone x👉📱 #Rayito 🍀¡PO...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",0,,es,,0,False,...,,,,,,,,,,


In [26]:
import datetime
search_query = 'android'
myStreamListener = MyStreamListener(limit=3,
                                    print_output=True,
                                    save_output=False,
                                    search_terms=search_query,
                                    strict_text_search=True)
myStream = tweepy.Stream(auth, listener=myStreamListener)
myStream.filter(track=[search_query])

{
 "created_at": "Mon Jan 08 03:08:01 +0000 2018",
 "id": 950202403613114368,
 "id_str": "950202403613114368",
 "text": "\u3010\u4e43\u6728\u604b\u3011\n\u4e43\u6728\u604b\u306e\u300c\u304b\u305a\u8ecd\u56e3\u300d\u3067\u3059\u3002\n\u8ecd\u56e3\u54e1\u307f\u3093\u306a\u307e\u3063\u305f\u308a\u30d7\u30ec\u30a4\u4e2d\uff01\n\u8208\u5473\u306e\u3042\u308b\u65b9\u3001\u8ecd\u56e3ID\u3067\u691c\u7d22\u3057\u3066\u307f\u3066\u306d\uff01\n\u8ecd\u56e3ID\uff1ag8f5be77\n\u203b\u672c\u8ecd\u56e3ID\u306fiOS/Android\u5bfe\u8c61\u3068\u306a\u308a\u307e\u3059\nhttps://t.co/82DYOT4R59  #\u4e43\u6728\u604b",
 "source": "<a href=\"https://nogikoi.jp/\" rel=\"nofollow\">[\u4e43\u6728\u574246\u516c\u5f0f]\u4e43\u6728\u604b\uff5e\u5742\u9053\u306e\u4e0b\u3067\u3042\u306e\u65e5\u50d5\u306f\u604b\u3092\u3057\u305f\uff5e</a>",
 "truncated": false,
 "in_reply_to_status_id": null,
 "in_reply_to_status_id_str": null,
 "in_reply_to_user_id": null,
 "in_reply_to_user_id_str": null,
 "in_reply_to_screen_name": nu