<img src="figures/tweepy.png">

### **Table of Content:**

 1. [Installation](#head-1)  
 2. [Configuration](#head-2)   
 3. [Search Twitter for Tweets](#head-3)
     3.1. [Tweets List to Pandas DataFrame](#head-3-1) 
 4. [Get tweets from a specific user](#head-4)
 
In this notebook, I'll explore the most popular twitter API wrapper library - Tweepy. I'll start from the scratch to set up the e scrapping pipeline, then save the ready to use results to a csv file for furthering modeling

To get started, you’ll need to do the following things:

- Set up a Twitter account
- Apply for Developer Access (https://developer.twitter.com) and then create an application that will generate the API credentials that you will use to access Twitter from Python.


Tutorial for applying for twitter API: https://cran.r-project.org/web/packages/rtweet/vignettes/auth.html

# 1. Installation <a class="anchor" id="head-1"></a>

Required libraries:

- pandas
- tweepy

In [None]:
!pip install -r requirements.txt

# 2. Configuration <a class="anchor" id="head-2"></a>

Create a .env file that contains:

consumer_key= 'yourkeyhere'  <br>
consumer_secret= 'yourkeyhere' <br>
access_token= 'yourkeyhere' <br>
access_token_secret= 'yourkeyhere'

In [None]:
import os
import tweepy as tw
import pandas as pd

In [3]:
from dotenv import dotenv_values

# load environment variabele
config = dotenv_values(".env")

In [None]:
# configurate tweepy
auth = tw.OAuthHandler(config["consumer_key"], config["consumer_secret"])
auth.set_access_token(config["access_token"], config["access_token_secret"])
api = tw.API(auth, wait_on_rate_limit=True)

# 3. Search Twitter for Tweets <a class="anchor" id="head-3"></a>

In [None]:
# Define the search term and the date_since date as variables
search_words = "#wildfires"
date_since = "2018-11-16"

# Collect tweets
tweets = tw.Cursor(api.search,
                       q=search_words,
                       lang="en",
                       since=date_since).items(5)

# Collect a list of tweets
[tweet.text for tweet in tweets]

## 3. 1Tweets List to Pandas DataFrame

In [None]:
# fuction to extract data from tweet object
def extract_tweet_attributes(tweet_object):
    # create empty list
    tweet_list =[]
    # loop through tweet objects
    for tweet in tweet_object:
        tweet_id = tweet.id # unique integer identifier for tweet
        text = tweet.text # utf-8 text of tweet
        favorite_count = tweet.favorite_count
        retweet_count = tweet.retweet_count
        created_at = tweet.created_at # utc time tweet created
        source = tweet.source # utility used to post tweet
        reply_to_status = tweet.in_reply_to_status_id # if reply int of orginal tweet id
        reply_to_user = tweet.in_reply_to_screen_name # if reply original tweetes screenname
        retweets = tweet.retweet_count # number of times this tweet retweeted
        favorites = tweet.favorite_count # number of time this tweet liked
        # append attributes to list
        tweet_list.append({'tweet_id':tweet_id, 
                          'text':text, 
                          'favorite_count':favorite_count,
                          'retweet_count':retweet_count,
                          'created_at':created_at, 
                          'source':source, 
                          'reply_to_status':reply_to_status, 
                          'reply_to_user':reply_to_user,
                          'retweets':retweets,
                          'favorites':favorites})
    # create dataframe   
    df = pd.DataFrame(tweet_list, columns=['tweet_id',
                                           'text',
                                           'favorite_count',
                                           'retweet_count',
                                           'created_at',
                                           'source',
                                           'reply_to_status',
                                           'reply_to_user',
                                           'retweets',
                                           'favorites'])
    return df


df = extract_tweet_attributes(trump_tweets)

# 4. Get tweets from a specific user <a class="anchor" id="head-4"></a>

In [None]:
# tweets from a specific user
trump_tweets = api.user_timeline('realdonaldtrump')
for tweet in trump_tweets:
    print(tweet.text)