# Twitter API Tutorial

The Twitter API allows you to search for tweets with search queries through Python. It returns a JSON formatted response to the query.

The API is limited in the following ways:
- Max: 7-days in the past
- 180 queries / 15 minutes for user authorized accounts
- 450 queries / 15 minutes for app authorized accounts

It API call has the following relevant parameters:
- q (str): The search query (**mandatory**)
- geocode (str): Location about which to search with the format latitude,longitude,range
    - Allows you to search by location
- result_type (str): 'mixed', 'recent', or 'popular': type of results to return
- count (int): Number of tweets to return - default: 15, max: 100
- until (YYYY-MM-DD): Returns tweets created before the provided date
    - Good for looking for tweets in the past
- since_id (int): Returns tweets with tweet IDs higher than the given id
- max_id (int): Return tweets with tweet IDs lower than the given id
    - Good for search back through pages
    
**Note**: The Twitter API usually returns the first search page result for the query entered but the max_id parameter can be used to traverse back through pages to get more results.
    
This API also has paid versions with less strict limits.

More information about this API can be found at:
https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets


#### Download necessary libraries

In [4]:
!pip install twitter
!pip install numpy
!pip install pandas

Collecting twitter
  Downloading twitter-1.18.0-py2.py3-none-any.whl (54kB)
[K    100% |████████████████████████████████| 61kB 2.7MB/s 
[?25hInstalling collected packages: twitter
Successfully installed twitter-1.18.0


#### Import necessary libraries

In [5]:
import numpy as np
import pandas as pd
from twitter import *
import json

### Insert API credentials from Twitter

**Copy the credentials and replace the contents of *twitter_credentials.json*. Alternatively, you can just copy and paste the keys in the variables for username and password.**

In [6]:
credentials = {}

with open('twitter_credentials.json') as f:
    data = json.load(f)
    try:
        credentials['ACCESS_TOKEN'] = data['ACCESS_TOKEN']
        credentials['ACCESS_SECRET'] = data['ACCESS_SECRET']
        credentials['CONSUMER_KEY'] = data['CONSUMER_KEY']
        credentials['CONSUMER_SECRET'] = data['CONSUMER_SECRET']
    except:
        print("Cannot access twitter_credentials.json to load credentials")

#### Get username and password for twitter
Enter API username and password manually or add to twitter credentials file.

In [7]:
ACCESS_TOKEN = ''
ACCESS_SECRET = ''
CONSUMER_KEY = ''
CONSUMER_SECRET = ''

if ACCESS_TOKEN == '':
    ACCESS_TOKEN = credentials['ACCESS_TOKEN']
    ACCESS_SECRET = credentials['ACCESS_SECRET']
    CONSUMER_KEY = credentials['CONSUMER_KEY']
    CONSUMER_SECRET = credentials['CONSUMER_SECRET']   

## Create Twitter Instance

In [8]:
oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
twitter = Twitter(auth=oauth)

## Perform a Query

Here a twitter query for 'canada' will be done. The raw results are shown.


In [10]:
response = twitter.search.tweets(q='canada')
print ('Raw Results: ')
print(json.dumps(response, indent=2))

Raw Results: 
{
  "statuses": [
    {
      "id": 962457922667143168,
      "geo": null,
      "in_reply_to_status_id_str": null,
      "lang": "en",
      "entities": {
        "urls": [
          {
            "url": "https://t.co/DBn4WMoAQy",
            "display_url": "dlvr.it/QFrznG",
            "expanded_url": "http://dlvr.it/QFrznG",
            "indices": [
              52,
              75
            ]
          }
        ],
        "symbols": [],
        "user_mentions": [],
        "hashtags": []
      },
      "possibly_sensitive": false,
      "is_quote_status": false,
      "text": "US concerned violence escalating at Israel's border https://t.co/DBn4WMoAQy",
      "in_reply_to_screen_name": null,
      "retweeted": false,
      "created_at": "Sat Feb 10 22:47:05 +0000 2018",
      "place": null,
      "coordinates": null,
      "source": "<a href=\"https://dlvrit.com/\" rel=\"nofollow\">dlvr.it</a>",
      "contributors": null,
      "favorited": false,
      "id_str"

## Perform Multiple Queries Through Pages and Store Results into Pandas DataFrame

### Specify options for search

In [11]:
search_query = 'canada' # What to search for
num_pages = 10 # Number of pages to return results for
num_tws_per_page = 100 # Number of tweets per page
search_type = 'mixed' # Type of results to return
include_retweets = True # Include retweets in search
 

### Create a DataFrame to Store Tweets

In [12]:
# Specify the columns for the dataframe
columns = ['tweet_id', 'created_at', 'screen_name',
         'user_location', 'place', 'latitude', 'longitude',
         'tweet', 'hashtags', 'fav_count', 'retweet_count']

# Create a dataframe
tweetsDF = pd.DataFrame(data=np.random.randn(num_tws_per_page*num_pages, len(columns)),columns=columns)

### Perform the specified number of queries per page, and store the results in the created dataframe

Here multiple queries will be done where each query will use the lowest tweet id from the previous query as its max_id (in all cases except the first one). Some of the parameters in the response are stored in the pandas dataframe created above.

In [13]:
idx = 0 # Index for iterating through dataframe
max_id = 0 # Variable to store the lowest tweet id from the previous search
for page in range(num_pages):
    query = ''
    if page == 0:
        query = twitter.search.tweets(q=search_query, result_type=search_type, 
                                     count=num_tws_per_page)
    else:
        query = twitter.search.tweets(q=search_query, result_type=search_type, 
                                     count=num_tws_per_page, max_id=max_id)
    
    for tweet in query['statuses']:
        max_id = tweet['id'] - 1
        tweetsDF.loc[idx, 'tweet_id'] = tweet['id']
        tweetsDF.loc[idx, 'created_at'] = tweet['created_at']
        tweetsDF.loc[idx, 'screen_name'] = tweet['user']['screen_name']
        tweetsDF.loc[idx, 'tweet'] = tweet['text'].replace('\n',' ').replace('\t',' ')
        tweetsDF.loc[idx, 'user_location'] = tweet['user']['location'].replace('\n',' ').replace('\t',' ')
        tweetsDF.loc[idx, 'fav_count'] = tweet['favorite_count']
        tweetsDF.loc[idx, 'retweet_count'] = tweet['retweet_count']
        
        tweetsDF.loc[idx, 'longitude'] = ''
        tweetsDF.loc[idx, 'latitude'] = ''
        tweetsDF.loc[idx, 'hashtags'] = '[]'
        tweetsDF.loc[idx, 'place'] = ''
        
        if tweet['coordinates']:
            tweetsDF.loc[idx, 'longitude'] = tweet['coordinates']['coordinates'][0]
            tweetsDF.loc[idx, 'latitude'] = tweet['coordinates']['coordinates'][1]
            
        if tweet['entities']['hashtags']:
            hashtag_list = []
            for hashtag in tweet['entities']['hashtags']:
                hashtag_list.append(hashtag['text'])
            tweetsDF.loc[idx, 'hashtags'] = '[' + ','.join(hashtag_list) + ']'
            
        if tweet['place']:
            tweetsDF.loc[idx, 'place'] = tweet['place']['full_name'].replace('\n', ' ')\
                                                  .replace('\t', ' ') \
                                                  + ' ' + tweet['place']['country'].replace('\n', ' ')\
                                                  .replace('\t', ' ')            
            

        idx += 1
        

### Show the Populated DataFrame

In [14]:
tweetsDF

Unnamed: 0,tweet_id,created_at,screen_name,user_location,place,latitude,longitude,tweet,hashtags,fav_count,retweet_count
0,9.619925e+17,Fri Feb 09 15:57:49 +0000 2018,chuckwoolery,Texas,,,,STRANGE? Every time the Left doesn't get it's ...,[],10537.0,3801.0
1,9.623042e+17,Sat Feb 10 12:36:14 +0000 2018,CharlieAngusNDP,Timmins-James-Bay,,,,Canada needs to wake up and understand what ju...,[],1061.0,664.0
2,9.619620e+17,Fri Feb 09 13:56:38 +0000 2018,nationalpost,Canada,,,,Minimum wage hike fallout? Canada just lost th...,[],217.0,283.0
3,9.624580e+17,Sat Feb 10 22:47:12 +0000 2018,yllekhattan,"Ottawa, Ontario",,,,"RT @CFRAOttawa: Freezing rain expected Sunday,...","[ottnews,ottwx]",0.0,1.0
4,9.624579e+17,Sat Feb 10 22:47:10 +0000 2018,DJBLAZEORLANDO,"ÜT: 28.535077,-81.30776",,,,"Boston then Canada , fuck the cold",[],0.0,0.0
5,9.624579e+17,Sat Feb 10 22:47:10 +0000 2018,wanobokujou,古河市,,,,RT @CNN: 2 hurt after officer in Canadian Prim...,[],0.0,74.0
6,9.624579e+17,Sat Feb 10 22:47:09 +0000 2018,EliseRonan,,,,,RT @vncnt_csmr: One example of how lazy Americ...,[],0.0,2.0
7,9.624579e+17,Sat Feb 10 22:47:08 +0000 2018,moon_boy9,,,,,RT @HamsedaCA: #Israel’s military attacked 12 ...,"[Israel,Syrian,Iranian]",0.0,2.0
8,9.624579e+17,Sat Feb 10 22:47:07 +0000 2018,Hamilton__Jobs,Canada,,,,Senior Server Side Core Java Developer: Citi C...,"[hamilton,eluta]",0.0,0.0
9,9.624579e+17,Sat Feb 10 22:47:05 +0000 2018,Egy_U,Middle East,,,,US concerned violence escalating at Israel's b...,[],0.0,0.0
