# Prerequisites

https://towardsdatascience.com/real-time-twitter-sentiment-analysis-for-brand-improvement-and-topic-tracking-chapter-1-3-e02f7652d8ff

## Install Dependencies

    pip install tweepy
    pip install dash #will come with plotly
    pip install plotly
    pip install textblob

## Setup Twitter API

Go to [developer.twitter.com](developer.twitter.com) and apply for a developer account and get API credentials. Save the credentials in `credentials.py`

## Preferences
Create `preferences.py` to determine what word/topic, and the settings of database. Let's say we want to frame out `Facebook` and `Google`. 

```python 
TRACK_WORDS = ['Facebook', 'Google]
TABLE_NAME = "fbgoogle"
TABLE_ATTRIBUTES = "id_str VARCHAR(255), created_at DATETIME, text VARCHAR(255), \
            polarity INT, subjectivity INT, user_created_at VARCHAR(255), user_location VARCHAR(255), \
            user_description VARCHAR(255), user_followers_count INT, longitude DOUBLE, latitude DOUBLE, \
            retweet_count INT, favorite_count INT"
```

# Create Streamer to Collect Data

![](res/streamer.gif)

**Note : See the full code in `streamer.py`.**

In order to create a streamer, we need several functionality

## Database Initialization

We want the data to be stored in a database. So, we need one. This function will create the tables if previously dosn't exist and return the connection

In [1]:
def init_database(db_file, table_name, attributes):
    qry_check_table = f"""SELECT count(name) FROM sqlite_master WHERE type='table' AND name='{table_name}'"""
    qry_create_table = f"""CREATE TABLE {table_name} ({attributes});"""
    conn = None
    try:
        conn = sqlite3.connect(db_file)
        cursor = conn.execute(qry_check_table)
        # create table if previously doesn't exists
        if cursor.fetchone()[0]==0 : 
            conn.execute(qry_create_table)
        conn.commit()
    except Error as e:
        raise(e)
    finally:
        if conn:
            return conn

## Data Preprocessing

Having tweet's text in a raw form might not be a good choice since they might have non-ascii or unnecessary characters. Here's two methods we will use (you can add more) to clean the tweet.

In [2]:
import re 

def clean_tweet(tweet): 
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) \
                            |(\w+:\/\/\S+)", " ", tweet).split()) 
def to_ascii(text):
    if text:
        return text.encode('ascii', 'ignore').decode('ascii')
    else:
        return None

## Tweepy StreamListener Override
In order to custumize the data received, we will overridee `tweepy.StreamListener` class and add some code on it depends on what we need. 

In [62]:
from textblob import TextBlob
import tweepy

import sqlite3
from sqlite3 import Error


# Override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        # Get information of each tweet
        # Don't take retweet
        if status.retweeted:
            return True

        id_str = status.id_str
        created_at = status.created_at # utc+0
        text = to_ascii(status.text)    # Pre-processing the text 
        text = clean_tweet(text)
        sentiment = TextBlob(text).sentiment
        polarity = sentiment.polarity
        subjectivity = sentiment.subjectivity

        user_created_at = status.user.created_at # utc+0
        user_location = to_ascii(status.user.location)
        user_description = to_ascii(status.user.description)
        user_followers_count =status.user.followers_count
        longitude = None
        latitude = None
        if status.coordinates:
            longitude = status.coordinates['coordinates'][0]
            latitude = status.coordinates['coordinates'][1]

        retweet_count = status.retweet_count
        favorite_count = status.favorite_count

        print(f"""ID: {id_str}\tCreated: {created_at}\tPolarity: {polarity} subjectivity: {subjectivity}""")

        query = sql = f"INSERT INTO {preferences.TABLE_NAME} \
        (id_str, created_at, text, polarity, subjectivity, user_created_at, user_location, \
        user_description, user_followers_count, longitude, latitude, retweet_count, favorite_count) \
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
        
        val = (id_str, created_at, text, polarity, subjectivity, user_created_at, user_location, \
                user_description, user_followers_count, longitude, latitude, retweet_count, favorite_count)
        
        try:
            db.execute(query, val)
        except Error as e: 
            raise(e)
        finally:
            db.commit()
    
    def on_error(self, status_code):
        if status_code == 420:
            # return False to disconnect the stream
            return False

## main streamer

Use all the functions above by initiate the stream activity

In [63]:
import credentials 
import preferences 
import re
from textblob import TextBlob
import tweepy

import sqlite3
from sqlite3 import Error

auth  = tweepy.OAuthHandler(credentials.API_KEY, credentials.API_SECRET_KEY)
auth.set_access_token(credentials.ACCESS_TOKEN, credentials.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)

db = init_database('tes.db', preferences.TABLE_NAME, preferences.TABLE_ATTRIBUTES)
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener = myStreamListener)
myStream.filter(languages=["en"], track = preferences.TRACK_WORDS)

# Create Web for Data Visualization

**Note: Full code in `server.py`** \
This is the core of this application. However, it might be seem complex due to combination between UI and Processes. But Here's the framework. There are 2 main part : `app.layout` to generalize the shape and components of the web, and `callbacks` to periodically do something to the web if certain condition happened. Please take a look on this code:

![](res/web.gif)

In [None]:
import credentials, preferences
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.graph_objs as go
import pandas as pd
import pytz
import sqlite3
import datetime

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
app.title = 'Live Twitter Visualization'

server = app.server

app.layout = html.Div(children=[
    html.H2('Live Tweet Sentiment Dashboard', style={'textAlign': 'center'}),
    html.Div(id='live-update-graph'), # Here's where your visualization will be placed
    dcc.Interval(
        id='interval-component-slow',
        interval=5*1000, # in milliseconds
        n_intervals=0
    )
    ], style={'padding': '20px'})


# Update graph everytime interval is fired
@app.callback(Output('live-update-graph', 'children'),
              [Input('interval-component-slow', 'n_intervals')])
def update_graph_live():
    # TODO: Define children
    return children

if __name__ == '__main__':
    app.run_server(debug=True)

The `app.layout` part consisted of 2 html components and 1 dash core components. The 2 HTML components are Titles, and an **empty** div to contain the graph later. the core componens are `dcc.Interval`, an object that fired every specific interfal. This will be useful since we want the graph to be updated every particular time. 

The `callbacks` part is right on the line 31-35. We define a callback that will output to `live-update-graph` html components with the values of `children`, and takes input (a trigger) from dash core components `dcc.Interval`. Bsaically, every interval passed, the method in this callback will be called. And the method is `update_graph_live)`. The details of how you will do the visualization is up to you, but here's an example of mine

In [None]:
def update_graph_live(n):

    # 1. Create Database Connection
    db = sqlite3.connect('fbgoogle.db')

    # 2. Query the data 
    tz_gmt = pytz.timezone('GMT+0')
    time_diff = datetime.timedelta(minutes=15)
    now = pd.datetime.now(tz=tz_gmt)
    last_5min = now-time_diff
    last_10min = now-time_diff*2

    query = f"""SELECT id_str, created_at, polarity, user_location, text FROM {preferences.TABLE_NAME} WHERE created_at >= '{last_10min}' AND created_at <= '{now}';"""

    df10 = pd.read_sql(query, con=db, parse_dates='created_at')
    df10['created_at'] = df10['created_at'].dt.tz_localize('GMT+0')
    df = df10[df10['created_at'] > last_5min ]

    # 3. Apply Preprocessing for Area Plot
    result = df.copy()
    result['sentiment'] = df['polarity'].apply(to_sentiment)
    result = result.join(pd.get_dummies(result['sentiment']))
    result['total_tweets'] = result[['positive', 'negative', 'neutral']].sum(axis=1)
    result = result.set_index('created_at').resample('5S').agg({
        'positive':sum,
        'neutral':sum,
        'negative':sum,
        'total_tweets':sum,
    })


    # Create the graph html object
    children = [
                html.Div([
                    # Line Plot
                    html.Div([
                        dcc.Graph(
                            id='line-plot',
                            figure={
                                'data': [
                                    go.Scatter(
                                        x=result.index,
                                        y=result['neutral'] ,
                                        name='Neutrals',
                                        opacity=0.8,
                                        mode='lines',
                                        line=dict(shape='spline', smoothing=0.5, width=0.5, color='#323232'),
                                        stackgroup='one'
                                    ),
                                    go.Scatter(
                                        x=result.index,
                                        y=result['negative']*-1,
                                        name='Negatives',
                                        opacity=0.8,
                                        mode='lines',
                                        line=dict(shape='spline', smoothing=0.5, width=0.5, color='#891921'),
                                        stackgroup='two'
                                    ),
                                    go.Scatter(
                                        x=result.index,
                                        y=result['positive'] ,
                                        name='Positives',
                                        opacity=0.8,
                                        mode='lines',
                                        line=dict(shape='spline', smoothing=0.5, width=0.5, color='#119dff'),
                                        stackgroup='three'
                                    )
                                ],
                                'layout':{
                                    'showlegend':False,
                                    'title':'Number of Tweets in 15min',
                                }
                            }
                        )
                    ], style={'width': '73%', 'display': 'inline-block', 'padding': '0 0 0 20'}),
                    
                    # Pie Plot
                    html.Div([
                        dcc.Graph(
                            id='pie-chart',
                            figure={
                                'data': [
                                    go.Pie(
                                        labels=['Positives', 'Negatives', 'Neutrals'], 
                                        values=[result['positive'].sum(), result['negative'].sum(), result['neutral'].sum()],
                                        marker_colors=['#119dff','#891921','#323232'],
                                        opacity=0.8,
                                        textinfo='value',
                                        hole=.65)
                                ],
                                'layout':{
                                    'showlegend':True,
                                    'title':'Tweets Percentage',
                                    'annotations':[
                                        dict(
                                            text='{0:.1f}K'.format(result[['positive', 'negative', 'neutral']].sum().sum()/1000),
                                            font=dict(
                                                size=40
                                            ),
                                            showarrow=False
                                        )
                                    ]
                                }

                            }
                        )
                    ], style={'width': '27%', 'display': 'inline-block'})
                    
                ]),
                
            ]
    return children