### Twitter Full Archive Search API v2
This Jupyter notebook allows you to use the full archive search endpoint for Twitter's API v2.
You must be approved for the Academic Research track to use this script.

The last cell of the notebook allows you to specify the parameters for your search. You will need a bearer token to authenticate.

<b>NOTE:</b>
- The endpoint allows you to make only 1 request per second and 300 requests per 15 minutes. Accordingly, the script will pause for 3.1 seconds between each request to stay within the rate limits.
- The search endpoint returns Tweet objects. The v2 API has significantly revamped the object model, so extended fields will need to be specified separately using the new fields and expansions parameters. Although additional information about the user are available for each tweet, the expansions parameter is slightly tricky to deal with. I'm currently working on a solution to add more user detail fields apart from the user id. If you're feeling adventurous, [you can refer to the documentation for fields and expansions](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet) to modify the code below!

#### Execute the next three cells

In [None]:
import requests
import pandas as pd
import json
import time

In [None]:
def search_endpoint_connect(bearer_token, query, st, et, next_token):
    
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    query_params = {
                    'query': query,
                    'start_time': st,
                    'end_time': et,
                    'max_results': 500,
                    'tweet.fields': 'id,text,author_id,created_at,geo,lang,public_metrics,in_reply_to_user_id',               
                   }
    
    if (next_token is not None):
        url = "https://api.twitter.com/2/tweets/search/all?next_token={}".format(next_token)
    else:
        url = "https://api.twitter.com/2/tweets/search/all"
    
    response = requests.request("GET", url, params=query_params, headers=headers)
    
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    
    return response.json()

In [None]:
def main(bearer_token, n, fn, sq, st, et):
    
    count = 0
    flag = True
    first = True


    while flag:
        

        if count >= n and n!=0:
            break
        if not first:
            json_response = search_endpoint_connect(bearer_token, sq, st, et, next_token)
        else:
            json_response = search_endpoint_connect(bearer_token, sq, st, et, next_token=None)
        
        result_count = json_response['meta']['result_count']
        if 'next_token' in json_response['meta']:
            next_token = json_response['meta']['next_token']
        
        if result_count is not None and result_count > 0:
            
            df = pd.json_normalize(json_response['data'])
            df = df.reindex(columns=['id', 'author_id', 'created_at', 'text', 'lang', 'public_metrics.retweet_count',
                                     'public_metrics.reply_count', 'public_metrics.like_count', 'public_metrics.quote_count',
                                    'in_reply_to_user_id', 'geo.place_id', 'geo.coordinates.type', 'geo.coordinates.coordinates'])
            if not first:
                df.to_csv('%s.csv'%fn, mode='a', encoding='utf-8', index=False, header=None)
            else:
                df.to_csv('%s.csv'%fn, encoding='utf-8', index=False)

        time.sleep(3.1)
        count += result_count
        print('Tweets downloaded: '+str(count))
        if 'next_token' not in json_response['meta']:
            flag = False
        first = False

### Setting your bearer token and search parameters

- You should have access to Twitter's Academic Research Product Track to obtain the bearer token.
- You can limit the number of tweets returned for the query by setting a desired value. Note that the minimum will be somewhere in the vicinity of 500 since that is maximum number of results Twitter returns <b>per request.</b> Setting it to 0 will return the full set of results. <b> Be careful about your search parameters as this can easily exhaust your monthly quota of 10 million tweets.</b>
- The output will be written to a csv file. You can change the file name down below. <b>Do not include .csv in the file name here.</b>
- You can craft your search query using a variety of different parameters. You can find a list of [supported search query parameters here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query).
- For the start and end time, you should follow the YYYY-MM-DDTHH:MM:SSZ format. Note that the date and time should be separated by a <b>capital T</b>. The <b>capital Z</b> after the time allows for specifiying an offset from UTC, which is the default. <b>Example:</b> 2019-11-27T00:00:00Z to specify midnight of 27th November 2019.

In [None]:
#Enter your bearer token
bearer_token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

#Set number of tweets to be downloaded. Enter 0 for no limits
no_of_tweets = 20

#Specify the name of the output csv file. Do not include .csv
file_name = 'downloaded_tweets'

#Enter your search query. Refer to https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query
search_query = '#your #query #here'

#Set the beginning date and time in YYYY-MM-DDTHH:MM:SSZ format
start_time = "YYYY-MM-DDTHH:MM:SSZ"

#Set the ending date and time in YYYY-MM-DDTHH:MM:SSZ format
end_time = "YYYY-MM-DDTHH:MM:SSZ"

main(bearer_token, no_of_tweets, file_name, search_query, start_time, end_time)