# Twitter API

Typical ways of collecting text and non-text data from the web are:
* APIs
* HTML scraping

The following example shows how to retrieve tweets from Twitter using its API and store it on a pandas dataframe.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext watermark
%watermark -v -m -p numpy,pandas,tweepy -g

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import watermark
import yaml
import tweepy as tw
from tqdm import tqdm

CPython 3.7.3
IPython 7.6.0

numpy 1.16.4
pandas 0.24.2
tweepy 3.7.0

compiler   : GCC 7.3.0
system     : Linux
release    : 5.0.0-19-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 8
interpreter: 64bit
Git hash   : c077a50bb88d22f95b9db6e256624c4701eb7011


### Constants

Modify these values to update the config file, the query or the output file.

In [2]:
CONFIG_FILE = "twitter_config.yml"
SEARCH_WORDS = "#film"
DATE_SINCE = "2019-06-25"
LIMIT = 1000
OUTPUT_FILE = "tweets.csv"

### How to get Twitter Consumer key and Consumer secret key

1. Go to https://dev.twitter.com/apps/new and log in, if necessary
2. Apply for new Developer account
3. Supply the necessary required fields, accept the Terms Of Service, and solve the CAPTCHA.
4. Submit the form
5. Create new App
6. Go to the tab of Keys and tokens
6. Copy the consumer key (API key) and consumer secret from the screen into our application.

Load twitter keys from the yaml file. The file contains the following variables:
* access_token
* access_token_secret
* consumer_api_key
* consumer_api_secret_key

In [3]:
with open(CONFIG_FILE, 'r') as f:
    twitter_keys = yaml.safe_load(f)

Load twitter tokens to authenticate the access:

In [4]:
auth = tw.OAuthHandler(twitter_keys['consumer_api_key'], twitter_keys['consumer_api_secret_key'])
auth.set_access_token(twitter_keys['access_token'], twitter_keys['access_token_secret'])

api = tw.API(auth, api_root='/1.1', wait_on_rate_limit=True)

Get public tweets by using a specific search and limited to a number of tweets. <br>
Note: The Twitter Search API returns at max 3200 of a users' most recent tweets.

In [5]:
def max_limit(limit):
    if limit <= 3200:
        return limit
    else:
        return 3200

In [6]:
public_tweets = api.home_timeline(count=20)

Define some functions to retrieve a search or tweets from the timeline:
Note that the search may wait until there is not rate limit due to 'wait_on_rate_limit'.

In [7]:
def tweet_query(api, query, lang="en", limit=LIMIT):
    tweets = []
    for tweet in tqdm(tw.Cursor(api.search, q=query, lang=lang).items(max_limit(limit))):
        tweets.append(tweet)

    return tweets

Similarly to retrieve from the timeline....

In [8]:
def getTimeline(api, limit=LIMIT, resultType="recent"):
    lim = max_limit(limit)
    try:
        tweets = []
        tweetsObj = tw.Cursor(api.home_timeline,
                result_type=resultType,
                exclude_replies = False).items(lim)

        pBar = tqdm(tweetsObj, ascii=True, total=lim, desc="Getting Tweets!")
        for cnt, tweet in enumerate(pBar):
            pBar.update(1)
            if not cnt < lim:
                break
            tweets.append(tweet)
    except tw.error.TweepError as et:
        print(et)
    except Exception as e:
        print(e)
    return tweets 

Extract some attributes from the tweets:

In [9]:
tweet_columns = ["screen_name", "location", "source", "coordinates", "favorite_count", 
                 "favorited", "lang", "hashtags", "created_at", "text"]
def get_tweet_info(tweet_list):
    return [[tweet.user.screen_name, tweet.user.location, tweet.source, tweet.coordinates,
             tweet.favorite_count, tweet.favorited, tweet.lang, tweet.entities['hashtags'], 
             tweet.created_at, tweet.text] for tweet in tweet_list]

### Create a Pandas Dataframe From A List of Tweet Data
Once you have a list of items that you wish to work with, you can create a pandas dataframe.

In [10]:
tweet_search_results = tweet_query(api, SEARCH_WORDS)

1000it [00:29, 33.66it/s]


In [11]:
tweet_df = pd.DataFrame(data=get_tweet_info(tweet_search_results), 
                        columns=tweet_columns)

In [12]:
tweet_df.to_csv(OUTPUT_FILE, index=False)

In [13]:
tweet_df.head(5)

Unnamed: 0,screen_name,location,source,coordinates,favorite_count,favorited,lang,hashtags,created_at,text
0,ccchapman3103,"MN, AZ, TX, USA",Twitter Web Client,,0,False,en,"[{'text': 'Plato', 'indices': [70, 76]}, {'tex...",2019-07-02 07:50:54,"RT @DaviesWriter: At the touch of a lover, eve..."
1,DonRon777,,Twitter Web App,,0,False,en,[],2019-07-02 07:48:36,RT @777Liquid: What do you think about the new...
2,lavenderlens,sightseeing at the cathedral💀,Twitter for iPhone,,0,False,en,"[{'text': 'sfx', 'indices': [89, 93]}, {'text'...",2019-07-02 07:47:27,RT @katesfxmakeup: Time to write a new blog bu...
3,AYoungNegus,Chicago,Twitter for iPhone,,0,False,en,"[{'text': 'blackdynamite', 'indices': [97, 111...",2019-07-02 07:45:22,RT @jaiganticstudio: One of the greatest scene...
4,EmpireDynamic,50 MILLION VIEWS MONTHLY,Twibble.io,,0,False,en,"[{'text': 'boxoffice', 'indices': [93, 103]}]",2019-07-02 07:45:11,Cineflix Acquires Global Rights to Israel-Iran...
