<h1>Scrapping Twitter API using Tweepy</h1>
<h5>Created by Matt Steele, WVU</h5>
<h5>Contact: <a href ="https://directory.lib.wvu.edu/employee/210" target="_blank">https://directory.lib.wvu.edu/employee/210</a><h5>
<hr />


<h2>Part I: Setting up your Twitter Developers Account</h2>
<p>To access Twitter's API you will need to get a Developers Account from Twitter. The basic account has limits on time frames and rates of download.</p>
    <ul>
       <li><a href="https://developer.twitter.com/">Create an Account</a></li>
       <li><em>Note: If you have an academic project you want to pursue, you can register to recieve an Academic Research account which provides full access to the entire twitter archive.</em></li>
            </ul>


<h3>Configure your Developers Account</h3>
<ol>
    <li>Sign in with your developer account;</li>
    <li>Create a project through the <a href ="https://developer.twitter.com/en/portal/dashboard">Developer Dashboard</a> and click on the <strong>Add App</strong> button;</li>
    <li>Once your project is set up, save your <strong>Consumer Keys and Authentication Tokens</strong> to a notepad or word document;
        <ul><li>Note: Make sure you properly document your Consumer Keys and Authentication Tokens</li></ul>
            </li>
 
 <div style = "margin-left: 5%; margin-top:2%; padding: 2%; border: 2px gray solid; width:50%;">
     <ul>
         <li>consumer_key = 'YourConsumerKey'</li>
         <li>consumer_secret = 'YourConsumerSecret'</li>
         <li>access_token = 'YourAccessToken'</li>
         <li>access_token_secret = 'YourAccessTokenSecret'</li>
         <li>bearer_token = 'YourBearerToken'</li>
     </ul>
        </div>

<h3>Explore the API features</h3>
<p>For this workshops we will focus on using Twitter API v2 features and Tweepy < 4.0 features</p>
    <ul>
        <li><a href="https://developer.twitter.com/en/docs/twitter-api">Twitter API v2 Documentation</a></li>
        <li><a href="https://developer.twitter.com/en/docs/api-reference-index">Twitter API v2 Reference Index</a></li>
        <li><a href ="https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet">Tweet Objects</a></li>
        <li><a href = "https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user">User Objects</a></li>
        <li><a href = "https://docs.tweepy.org/en/stable/client.html">Tweepy Documentation</a></li>
    </ul>
    

<h2>Part II: Authentication</h2>
<p>To access Twitter, you will need to authenticate your account using your API keys and tokens. We do this by adding our credentials to a python file (keys.py)</p>
 <div style = "margin-left: 5%; margin-top:2%; padding: 2%; border: 2px gray solid; width:50%;">
    <p>Let's open the keys.py file and update it with our credentials, save it, and then import it.</p>
    

In [None]:
import keys #authentication file for storing you API keys and tokens

<h2>Part III: The Tweepy Library</h2>
<p>The tweepy library allows you to interact with the Twitter API directly from Python and pull information in the form of JSON files.</p>
    <ul>
    <li><a href ="https://docs.tweepy.org/en/stable/">Tweepy documentation</a></li>
        </ul>

<h3>Install Tweepy</h3>

In [None]:
conda install -c conda-forge tweepy

<h4>View installed libraries</h4>

In [None]:
conda list

<h3>Call Tweepy library</h3>

In [None]:
import tweepy

<div style = "background-color:#f0f0f0; margin-top: 5%; margin-bottom:5%; padding 2%;"><h6><em>Hint: Use the TAB button to see available methods and functions</em></h6></div>

<h3>Authenicate your Account</h3>

In [None]:
client = tweepy.Client(bearer_token = keys.bearer_token, 
                       wait_on_rate_limit=True)

<h3>Make your First Call</h3>
<p>Here we are going to searching tweets made by accounts. We can do this by building a search query and using the function search_recent_tweets()</a>.</p>
<ul><li><a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query">Learn more about building a search query</a></li>
<li><a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent">Learn more about search_recent_tweets()</a></li></ul>


In [None]:
elon_search = client.search_recent_tweets(query="elon musk", max_results=20)
print(elon_search)

<h4>What have we recieved</h4>
<p>JSON files containing metadata information about the tweets that were sent out. If we want to see what metadata information is included in the JSON file, we will need to look at the data dictionary for tweets in the Twitter Developer documents section. There will also be times where we recieve information about user accounts, where we would need to look at the data dictionary for users in the Twitter Developer documents section.</p>

<ul><li><a href ="https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet">Tweet Objects</a></li>
<li><a href = "https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user">User Objects</a></li></ul>

<h4>How do we view this information</h4>
<p>We will now use some python code that will ask the program to go through the data we have reteived and allow us to view the information that was  was returned.</p>

<p style = "margin-left:4%;"><strong>for loop</strong>: loop operator that is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string). This command will ask the computer to find and then perform another operation that we command on the information we have retrieved.</p>

In [None]:
for tweet in elon_search.data:
    display(tweet.id, tweet.text, tweet.edit_history_tweet_ids)

<h3>Expansions</h3>
<p>Expansions will allow you to expand the information included in the metadata beyond the default. For this example, for the tweet objects, I want to also retrieve the author of the tweet's id number (author_id).</p>

<ul><li><a href ="https://developer.twitter.com/en/docs/twitter-api/expansions">Learn more about expansions</a></li></ul>

In [None]:
elon_search = client.search_recent_tweets(query="elon musk -is:retweet", 
                                            max_results=20, 
                                           expansions=["author_id"])
for tweet in elon_search.data:
    display(tweet.id, tweet.text, tweet.author_id)

<h4>Expand your Query further and add User Fields to your Retrieval</h4>
<p>Right now we are just retieving information about the tweets. But we can also learn more about the users that are making the tweets. Your field options for user information retrieval can be found in the user objects data dictionary in the Twitter Developer documentation.</p>

<ul><li><a href ="https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user">Learn more about user objects</a></li></ul>

In [None]:
elon_search = client.search_recent_tweets(query="elon musk -is:retweet", 
                                            max_results=50, 
                                           expansions=["author_id"],
                                           user_fields = ["profile_image_url,description"])

<h4>Create a dictionary that defines user information retrieval</h4>

In [None]:
elon_users = {user['id']: 
         user for user in elon_search.includes['users']}

In [None]:
for tweet in elon_search.data:
    if elon_users[tweet.author_id]:
            user = elon_users[tweet.author_id]
            print(tweet.id, user.username, user.profile_image_url, user.description)

<h3>Converting Information to DataFrame and Exporting as CSV</h3>
<p>The following operations will allow you to convert the JSON information that you have retrieved to a readable dataframe and then export that dataframe as CSV using Pandas.</p>

In [43]:
#import pandas library to help create and export dataframes

import pandas as pd

In [None]:
elon_search = client.search_recent_tweets(query="elon musk -is:retweet", 
                                            max_results=50, 
                                           expansions=["author_id"],
                                           tweet_fields=["created_at,public_metrics,in_reply_to_user_id"],
                                           user_fields = ["profile_image_url,description"])

In [None]:
#set the data dictionary
data = []

#set the columns
columns = ['ID', 'Tweet', "Date Posted",'Author', "Liked", 'Reply_To']

#add the data from our retieval to the data dictionary
for tweet in elon_search.data:
    data.append([tweet.id, tweet.text, tweet.created_at, tweet.author_id, tweet.public_metrics['like_count'], tweet.in_reply_to_user_id])
    
#create the dataframe
elon_df = pd.DataFrame(data, columns=columns )

#export the data
elon_df.to_csv("elon_tweets_1.csv")

<h4>Include User fields in our dataframe</h4>

In [None]:
#set the data dictionary
data = []

#set the columns
columns = ['ID', 'Tweet', "Date Posted",'Author', "Author Bio", "Author Image", "Liked", 'Reply_To']

# create a ____ that will use the author_id field to look up more information about the users
elon_users = {user['id']: 
         user for user in elon_search.includes['users']}

#add the data from our retieval to the data dictionary
for tweet in elon_search.data:
    if elon_users[tweet.author_id]:
            user = elon_users[tweet.author_id]
            data.append([tweet.id, 
                         tweet.text, 
                         tweet.created_at, 
                         user.username, 
                         user.description, user.profile_image_url, tweet.public_metrics['like_count'], tweet.in_reply_to_user_id])

#create the dataframe
elon_df = pd.DataFrame(data, columns=columns)

#export the data
elon_df.to_csv("elon_tweets_2.csv")

<h3>Get information about a user account</h3>
<p>Let's look information about the twitter account <a href="https://twitter.com/elonmusk">@elonmusk</a></p>
<ul><li><a href="https://developer.twitter.com/en/docs/twitter-api/users/lookup/api-reference/get-users">get_users() documentation</a></li></ul>

In [None]:
elon_id = client.get_users(usernames=["elonmusk"])
for user in elon_id:
    display(user)

<h4>Save the user_id as a variable</h4>

In [None]:
user_id = "44196397"

<h3>Get User's Tweets</h3>

This endpoint/method returns Tweets composed by a single user, specified by the requested user ID

<ul><li><a href="https://developer.twitter.com/en/docs/twitter-api/tweets/timelines/api-reference/get-users-id-tweets">get_user_tweets() documentation</a></li></ul>

In [None]:
# By default, only the ID and text fields of each Tweet will be returned so we use user_fields to add creation data and public metrics information

elon_tweets = client.get_users_tweets(user_id, exclude = ["replies", "retweets"], 
                                      max_results = 100,
                                      tweet_fields=["created_at,public_metrics"])

In [None]:
#set the data dictionary
data = []

#set the columns
columns = ['Tweet ID', 'Text', "Date Posted", "Replies", "Retweets", "Likes", "Quote Tweets"]

#add the data from our retieval to the data dictionary
for tweet in elon_tweets.data:
    data.append([tweet.id, 
                 tweet.text, 
                 tweet.created_at, 
                 tweet.public_metrics['reply_count'], 
                 tweet.public_metrics['retweet_count'], 
                 tweet.public_metrics['like_count'], 
                 tweet.public_metrics['quote_count']])
    

elon_tweets_df = pd.DataFrame(data, columns=columns )

#export the data
elon_tweets_df.to_csv("elon_acct_tweets.csv")

<h3>Plotting Data</h3>
<p>Here we will use Matplotlib to the amount of times an Elon Musk tweet has been retweeted and liked.</p>

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
tl_em_df = pd.read_csv("elon_acct_tweets.csv")
tl_em_df.dtypes

<h4>Clean the Date Posted field</h4>
<p>Use the pd.datetime command to convert the date posted variable to a date variable.</p>
<ul><li><a href ="https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html">Pandas to_datetime documentation</a></li><ul>

In [None]:
tl_em_df["Date Posted"]  = pd.to_datetime(tl_em_df["Date Posted"])
tl_em_df.dtypes

<h4>Create a line plot of the tweets</h4>
<p>Use the MatPLotLib plot command to plot the variables</p>

In [None]:
tl_em_df.plot(x="Date Posted", y=["Retweets"])

<h3>Get User's Mentions</h3>

This endpoint/method retrieves Tweets that mention a single user, specified by the requested user ID

<ul><li><a href="https://developer.twitter.com/en/docs/twitter-api/tweets/timelines/api-reference/get-users-id-mentions"> get_user_mentions() documentation</a></li></ul>

In [None]:
elon_mentions = client.get_users_mentions(user_id, tweet_fields=["created_at"])

for tweets in elon_mentions.data:
    dsiplay(tweet.id, tweet.text, tweet.created_at)

<h3>Get User's Followers</h3>

This endpoint/method returns a list of users who are followers of the specified user ID

<ul><li><a href ="https://developer.twitter.com/en/docs/twitter-api/users/follows/api-reference/get-users-id-followers">get_users_followers() documentation</a></li></ul>

In [None]:
elon_followers = client.get_users_followers(user_id, user_fields=["profile_image_url,description"])

for user in elon_followers.data:
    display(user.username, user.profile_image_url, user.description)

<h3>Get Following</h3>

This endpoint/method returns a list of users who are followers of the specified user ID

<ul><li><a href = "https://developer.twitter.com/en/docs/twitter-api/users/follows/api-reference/get-users-id-following">get_users_following() documentation</a></li></ul>

In [None]:
elon_following = client.get_users_following(user_id, user_fields=["description","profile_image_url", "location"]
)

for user in elon_following.data:
    display(user.name, user.username, user.description, user.profile_image_url, user.location)

In [None]:
#set the data dictionary
data = []

#set the columns
columns = ['Name', "Username", 'Bio', "Image", "Location"]

#add the data from our retieval to the data dictionary
for user in elon_following.data:
    data.append([user.name, user.username, user.description, user.profile_image_url, user.location])
    

elon_following_df = pd.DataFrame(data, columns=columns )

#export the data
elon_following_df.to_csv("elon_follows.csv")

In [None]:
loc = elon_following_df["Location"].value_counts()
loc.head(50)

<h3>Get information about a tweet</h3>
<p>let's retrieve information about a tweet using the tweet id.</p>
<ul><li><a href ="https://twitter.com/dril/status/831805955402776576?lang=en">Example tweet</a></li></ul>

In [None]:
tweet_id = 831805955402776576

<h4>Users that liked a tweet</h4>
<p>Find out user accounts that liked a tweet.</p>
<ul><li><a href = "https://developer.twitter.com/en/docs/twitter-api/tweets/likes/api-reference/get-tweets-id-liking_users">get_liking_user() documentation</a></li></ul>

In [None]:
dril_likes = client.get_liking_users(id=tweet_id)

for user in dril_likes.data:
    print(user.username)

<h4>Users that retweeted a tweet</h4>
<p>Find users that retweeted a tweet</p>
<ul><li><a href="https://developer.twitter.com/en/docs/twitter-api/tweets/retweets/api-reference/get-tweets-id-retweeted_by">get_retweeters() documentation</a></li></ul>

In [None]:
dril_rts = client.get_retweeters(id=tweet_id)

for user in dril_rts.data:
    print(user.username)

<h2>Cleaning text for Sentiment Analysis<h4>

In [None]:
elon_sentiment_df = pd.read_csv("elon_tweets_cleaned.csv")
elon_sentiment_df.head()

In [None]:
from textblob import TextBlob
import re

def clean_tweet(tweet):
    '''
    Utility function to clean the text in a tweet by removing
    links and special characters using regex.
    '''
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

def analize_sentiment(tweet):
    '''
    Utility function to classify the polarity of a tweet
    using textblob.
    '''
    analysis = TextBlob(clean_tweet(tweet))
    if analysis.sentiment.polarity > 0:
        return 1
    elif analysis.sentiment.polarity == 0:
        return 0
    else:
        return -1

In [None]:
# We create a column with the result of the analysis:
elon_sentiment_df['SA'] = np.array([ analize_sentiment(tweet) for tweet in elon_sentiment_df['Cleaned Text'] ])

# We display the updated dataframe with the new column:
display(elon_sentiment_df.head(10))

In [None]:
pos_tweets = [ tweet for index, tweet in enumerate(elon_sentiment_df['Text']) if elon_sentiment_df['SA'][index] > 0]
neu_tweets = [ tweet for index, tweet in enumerate(elon_sentiment_df['Text']) if elon_sentiment_df['SA'][index] == 0]
neg_tweets = [ tweet for index, tweet in enumerate(elon_sentiment_df['Text']) if elon_sentiment_df['SA'][index] < 0]

print("Percentage of positive tweets: {}%".format(len(pos_tweets)*100/len(elon_sentiment_df['Text'])))
print("Percentage of neutral tweets: {}%".format(len(neu_tweets)*100/len(elon_sentiment_df['Text'])))
print("Percentage de negative tweets: {}%".format(len(neg_tweets)*100/len(elon_sentiment_df['Text'])))