<a href="https://colab.research.google.com/github/gmelaku/GM/blob/master/PS12_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Tweepy for Social Media Analytics

###by Gebremedhin Melaku

In this project we will be utilizing tweepy for analytics. We will intend to analyze tweets related to US 2020 elections.

###1. Install libraries, import packages

Tweepy needs to be installed into the notebook before we import libraries and objects associated with it. We will use !pip install to install tweepy. We will then import objects necessary for authenticating APIs, streaming, analyzing and visualizing tweets

In [0]:
!pip install tweepy
import tweepy
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

### 2. Authenticate our APIs

From our twitter app created from our developer twitter account, fetch the Consumer keys, consumer secrets, Access token and access token secrets.

In [0]:
CONSUMER_KEY = "WlPt2p3U5gXb9jAeipEhaec7p"
CONSUMER_SECRET = "Z5ij8szjP399I1RrY6DJIGQ6nK63u7XJioI1osgECc3InsSQ6G"
ACCESS_TOKEN = "762379377858453504-Psv8LQRbN3ID2IQYCq6Pn29xPnDOCjI"
ACCESS_TOKEN_SECRET = "7mXxiVv7Ohx6vVgFmr77pBNODDoJW5LgDQ9GhrE2IE2FA"

We will then finalize the authentication process using OAuthHandler()

In [0]:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)


### 3. Twitter Streaming

Once the authentication is done we will override tweepy.StreamListener to add logic to on_status

In [0]:
class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print(status.text)
        
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)

The filter function will streamline our streaming API to tweets, users and entities of interest. In our cases, it will be Donald Trump, Joe Biden and Election 2020

In [0]:
myStream.filter(track=['realDonaldTrump','JoeBiden'])

### 4. Analyzing tweeter data

As we can see from above, streaming tweets and twitter data of interest won't help us in decision making process unless we start analyzing and convert the twitter data into some format that will give some senses. We can analyze twitter data using TweetAnalyzer.

##### 4.1 Import additional libraries used for analysis.

In [0]:
from tweepy import API 
from tweepy import Cursor
import numpy as np
import pandas as pd

##### 4.2 Define Twitter Client

We will define twitter client API, user time line tweets, friend's list and home time line tweets in this section

In [0]:
class TwitterClient():
    def __init__(self, twitter_user=None):
        self.auth = TwitterAuthenticator().authenticate_twitter_app()
        self.twitter_client = API(self.auth)

        self.twitter_user = twitter_user

    def get_twitter_client_api(self):
        return self.twitter_client

    def get_user_timeline_tweets(self, num_tweets):
        tweets = []
        for tweet in Cursor(self.twitter_client.user_timeline, id=self.twitter_user).items(num_tweets):
            tweets.append(tweet)
        return tweets

    def get_friend_list(self, num_friends):
        friend_list = []
        for friend in Cursor(self.twitter_client.friends, id=self.twitter_user).items(num_friends):
            friend_list.append(friend)
        return friend_list
    def get_home_timeline_tweets(self, num_tweets):
      home_timeline_tweets = []
      for tweet in Cursor(self.twitter_client.home_timeline, id=self.twitter_user).items(num_tweets):
        home_timeline_tweets.append(tweet)
      return home_timeline_tweets
    
class TwitterAuthenticator():

    def authenticate_twitter_app(self):
        auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
        auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
        return auth

##### 4.3 Tweeter steam listener

The basic listener prints received tweets to Stdout. In this step, we will also define on_data (to return an output in cases of availability) and on_error (prints out an error message if the requested data is not available)

In [0]:
class TwitterListener(StreamListener):
    def __init__(self, fetched_tweets_filename):
        self.fetched_tweets_filename = fetched_tweets_filename

    def on_data(self, data):
        try:
            print(data)
            with open(self.fetched_tweets_filename, 'a') as tf:
                tf.write(data)
            return True
        except BaseException as e:
            print("Error on_data %s" % str(e))
        return True
          
    def on_error(self, status):
        if status == 420:
            return False
        print(status)


###### Introduce class TweetAnalyzer and define tweet data. 

The pandas and numpy libraries imported will add a fucntionality for analyzing and categorizing contents from tweets

In [0]:
class TweetAnalyzer():
    def tweets_to_data_frame(self, tweets):
        df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])

        df['id'] = np.array([tweet.id for tweet in tweets])
        df['len'] = np.array([len(tweet.text) for tweet in tweets])
        df['date'] = np.array([tweet.created_at for tweet in tweets])
        df['source'] = np.array([tweet.source for tweet in tweets])
        df['likes'] = np.array([tweet.favorite_count for tweet in tweets])
        df['retweets'] = np.array([tweet.retweet_count for tweet in tweets])

        return df

##### 4.4 Twitter Client Analysis

Now let's see the results from analysis from twitter data using a specific twitter client. In our case "realDonaldTrump" and "JoeBiden". We will compare the 10 most recent tweets by the two candicates by calling .head(10) functions from the last 200 tweets. We will also compare the number of retweets from the 50th tweets.

In [0]:
from operator import methodcaller
if __name__ == '__main__':

    twitter_client = TwitterClient()
    tweet_analyzer = TweetAnalyzer()

    api = twitter_client.get_twitter_client_api()

    tweets = api.user_timeline(screen_name="realDonaldTrump", count=200)

    print(tweets[50].retweet_count)

    df = tweet_analyzer.tweets_to_data_frame(tweets)
    
    print(df.head(10))

In [0]:
if __name__ == '__main__':

    twitter_client = TwitterClient()
    tweet_analyzer = TweetAnalyzer()

    api = twitter_client.get_twitter_client_api()

    tweets = api.user_timeline(screen_name="JoeBiden", count=200)

    print(tweets[50].retweet_count)

    df = tweet_analyzer.tweets_to_data_frame(tweets)
    
    print(df.head(10))

#### 5. Visualizing Twitter Data between the two candidates.

We will now us visualization technique to compare between this two candidates on how are they doing based on likes and retweets over time.

#####5.1 import visualization libraries

In [0]:
import matplotlib.pyplot as plt

###### Visualizing Donald Trumps tweets

Set up the client

In [0]:
if __name__ == '__main__':

    twitter_client = TwitterClient()
    tweet_analyzer = TweetAnalyzer()
    api = tweepy.API(auth)

    api = twitter_client.get_twitter_client_api()

    tweets = api.user_timeline(screen_name="realDonaldTrump", count=50)

    df = tweet_analyzer.tweets_to_data_frame(tweets)


##### Summary statistics

get the average length of overall tweets for Donal Drump

In [0]:
print(np.mean(df['len']))


Get the number of likes for the most liked tweet

In [0]:
print(np.max(df['likes']))

Get the number of retweets for the most retweeted tweet:

In [0]:
print(np.max(df['retweets']))

#### Plots

Let's plot the number of likes and retweets against the time for the given period of time

In [0]:
time_likes = pd.Series(data=df['likes'].values, index=df['date'])
time_likes.plot(figsize=(16, 4), label="likes", legend=True)
time_retweets = pd.Series(data=df['retweets'].values, index=df['date'])
time_retweets.plot(figsize=(16, 4), label="retweets", legend=True)
plt.show()

#####Visualizing Joe Biden's Tweet

In [0]:
if __name__ == '__main__':

    twitter_client = TwitterClient()
    tweet_analyzer = TweetAnalyzer()
    api = tweepy.API(auth)

    api = twitter_client.get_twitter_client_api()

    tweets = api.user_timeline(screen_name="JoeBiden", count=50)

    df = tweet_analyzer.tweets_to_data_frame(tweets)

#####Summary statistics

In [0]:
print(np.mean(df['len']))

In [0]:
print(np.max(df['likes']))

In [0]:
print(np.max(df['retweets']))

######Plots

Let's plot the number of likes and retweets against the time for the given period of time

In [0]:
time_likes = pd.Series(data=df['likes'].values, index=df['date'])
time_likes.plot(figsize=(16, 4), label="likes", legend=True)
time_retweets = pd.Series(data=df['retweets'].values, index=df['date'])
time_retweets.plot(figsize=(16, 4), label="retweets", legend=True)
plt.show()