# **Recent search**

The recent search endpoint allows you to programmatically access filtered public Tweets posted over the last week, and is available to all developers who have a developer account and are using keys and tokens from an App within a Project.

You can authenticate your requests with OAuth 1.0a User Context, OAuth 2.0 App-Only, or OAuth 2.0 Authorization Code with PKCE. However, if you would like to receive private metrics, or a breakdown of organic and promoted metrics within your Tweet results, you will have to use OAuth 1.0a User Context or OAuth 2.0 Authorization Code with PKCE, and pass user Access Tokens that are associated with the user that published the given content. 

This endpoint can deliver up to 100 Tweets per request in reverse-chronological order, and pagination tokens are provided for paging through large sets of matching Tweets. 

When using a Project with Essential or Elevated access, you can use the basic set of operators and can make queries up to 512 characters long. When using a Project with Academic Research access or Enterprise access, you have access to additional operators.

Rate limit: App rate limit (Application-only): 450 requests per 15-minute window shared among all users of your app

## Set up

1. Tokens and keys should not be kept in colab notebooks. I advise you to 
delete them from the code after every use. Here we create an environmental variable and save the token in it. It will be deleted on restart of the runtime.

2. In order to efficiently save your data it's best to connect the notebook with your Google Drive (n.b: this doesn't work with university's account - you need to use Gmail). This way you don't risk losing your data if the notebook malfunctions and you can later access it from a notebook again. 

3. Libraries that we need (for now) are *requests*, to handle our API requests, and *pandas* to easily view and store the results.

4. Colab environment offers you loads of pre-installe dpython libraries (you can check which ones by executing `!pip freeze`), *however* if your library is not on the list you need to manually install it.


In [10]:
import os
os.environ['TOKEN'] = "AAAAAAAAAAAAAAAAAAAAACiskgEAAAAACR0Ob5x7EAcEyWB1yK08z38ONeE%3DFDLKfajgEnOwF4Ult0L1HcalsR01XnaKUztJgdClTQJaWTRqbE"

## CREAT PATH : 

The code generates a "Path" object that specifies the destination directory for storing files or codes, ensuring efficient and reliable data organization :

In [11]:
path = 'D:/other/job/students_project/network_science/maryam_project/data/gymnastics/'
error_log_path = 'D:/other/job/students_project/network_science/maryam_project/data/gymnastics/'


### Import libraries

In [12]:
import requests 
import pandas as pd 
import time

If you need to download a library, use the following code, just specify the name of the library you need (here we downloaded emoji library)

## Step 1: authenticate

In order to authenticate our request, we need to create a request header and add an authorization field. You can authorize a request by using the bearer token, or the API consumer/secret keys. Here we do it with bearer token for the sake of simplicity.

You can read more about it here: https://developer.twitter.com/en/docs/authentication/overview


### Set up headers

In [13]:
def create_headers(bearer_token):
        headers = {"Authorization": "Bearer {}".format(bearer_token)}
        return headers

In [14]:
headers = create_headers(os.environ['TOKEN'])

## Step 2: build a search query

**Ingredients**: endpoint, parameters and operators

For endpoint we use: https://api.twitter.com/2/tweets/search/recent

**Example parameters**: 

* query: the text of your search (required) - max 512 chars
* end_time: the newest date for your tweets
* start_time: the oldest date for your tweets
(format for date: YYYY-MM-DDTHH:mm:ssZ (ISO 8601/RFC 3339))
* max_results: between 10 (default) and 100
* tweet_fields: which fields to get (if empty, you only get id&text&edit 
history)
* user_fields, place_fields, expansions
* next_token: to get the next page of results 


**Example operators**: keyword (menstruation), exact phrase("sexual education"), hashtag ("#metoo"), emoji (😬), logical operators (AND = a blank space), OR, NOT), from: or to: (tweets from a user or directed to a user), @ (tweets that mention the user, @NASA), is:retweet, is: reply , is:quote, lang: ("en")

Grouping is done with brackets. F.e (#prolife abortion) OR (#prochoice abortion)

See more here: 

Operators: https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

Parameters: https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent




In [15]:
def create_url(query, start_time, end_time, max_results, expansions, tweet_fields, user_fields, place_fields, endpoint):
    
    search_url = endpoint #Change to the endpoint you want to collect data from

    #change params based on the endpoint you are using
    #also can request different fields, e.g ids of users ... 
    query_params = {'query': query,
                    'end_time': end_time,
                    'start_time': start_time,
                    'max_results': max_results,
                    'expansions': expansions,
                    'tweet.fields': tweet_fields,
                    'user.fields': user_fields,
                    'place.fields': place_fields}

    return (search_url, query_params)

In [16]:
def connect_to_endpoint(url, headers, params, next_token = None):
    if next_token is not None and next_token != '':
      params['next_token'] = next_token
    response = requests.request("GET", url, headers = headers, params = params)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

In [17]:
def get_data(query, start_time, end_time, max_results, expansions, tweet_fields, user_fields, place_fields, endpoint, next_token=""):
  
  results = []
  count = 0
  batch = 1
  lastTweetTime = "NULL"

  while next_token is not None:
  # while count < 1:
    try:
      url = create_url(query, start_time, end_time, max_results, expansions, tweet_fields, user_fields, place_fields, endpoint)
      json_response = connect_to_endpoint(url[0], headers, url[1], next_token)
      #if we have results, they will be in the field 'data' of our response
      if "data" in json_response:
        results.extend(json_response["data"])
        print(str(count) + ": " + str(len(json_response["data"])) + " Tweets downloaded in this batch. with last tweet: " + lastTweetTime)
        lastTweetTime = json_response['data'][0]['created_at']
        count += 1
      #the next_token is added to the field 'meta' of our response
      if "meta" in json_response:
        if "next_token" in json_response["meta"].keys():
          next_token = json_response["meta"]["next_token"]          
        else:
          next_token = None
      else:
        next_token = None

      if count >= 30:
         tweets_df = pd.DataFrame(results)
         tweets_df.to_pickle(path + "tweets_" + lastTweetTime[:10] + "_" + str(batch) + ".pkl")
         print("Tweets batch %d stored in pkl file" % batch)
         batch += 1
         count = 0
      
      #to control the rate limit we need to slow down our download
      time.sleep(3)

    except Exception as e:
      print("Error occured", e)
      print("Next token value", next_token)
      error_log = {"Error":e, "Next token":next_token, "Day":start_time, 
                   "Downloaded":len(results)}
      # pd.DataFrame.from_dict(error_log, orient="index").to_csv(error_log_path+start_time+"_"+next_token+".csv")
      return results

  print("Done")
  
  return results

## Step 3: download and save the data

We call the function, filling in the desired parameters. We convert the data into a pandas dataframe to easily manipulate it (view, edit, save). We save the data in the PICKLE format, so we can recover the exact data types later.

In [18]:
tweets = get_data("(#gymnastics) lang:en -is:retweet",
         start_time = "2021-07-15T00:00:00Z",
         end_time = "2021-08-15T00:00:00Z",
         max_results=100,
         expansions='entities.mentions.username,referenced_tweets.id.author_id',
         tweet_fields='id,text,author_id,created_at,entities',
         user_fields='id,name,username,description',
         place_fields='full_name,id,country,country_code,geo,name,place_type',
         endpoint="https://api.twitter.com/2/tweets/search/all")

0: 97 Tweets downloaded in this batch. with last tweet: NULL
1: 95 Tweets downloaded in this batch. with last tweet: 2021-08-14T22:02:43.000Z
2: 95 Tweets downloaded in this batch. with last tweet: 2021-08-13T09:32:21.000Z
3: 96 Tweets downloaded in this batch. with last tweet: 2021-08-11T15:42:43.000Z
4: 99 Tweets downloaded in this batch. with last tweet: 2021-08-10T08:06:09.000Z
5: 96 Tweets downloaded in this batch. with last tweet: 2021-08-09T06:00:04.000Z
6: 95 Tweets downloaded in this batch. with last tweet: 2021-08-08T12:32:00.000Z
7: 100 Tweets downloaded in this batch. with last tweet: 2021-08-08T01:35:11.000Z
8: 95 Tweets downloaded in this batch. with last tweet: 2021-08-07T15:03:55.000Z
9: 100 Tweets downloaded in this batch. with last tweet: 2021-08-07T11:49:22.000Z
10: 98 Tweets downloaded in this batch. with last tweet: 2021-08-07T00:09:46.000Z
11: 94 Tweets downloaded in this batch. with last tweet: 2021-08-06T19:29:32.000Z
12: 97 Tweets downloaded in this batch. with

In [None]:
tweets_df = pd.DataFrame(tweets)

In [None]:
tweets_df.to_pickle(path+"tweets.pkl")

## Step 4: individual work

- think of a topic you'd like to read some tweets from
- build a query (play around with the logic - can you get only tweets that are not a retweet?)
- how many tweets did you get?
- what if you changed the date range?