*Hello and welcome!* 👋

This notebook is the <u>first</u> part of a **tutorial** on how to **get started with Twitter API v2 using Python** 🤓! Read our medium blog post [here](https://medium.com/data-analytics-at-nesta).

In this notebook, we start by making a very simple request to the **recent search** endpoint, using heat pump mentions on Twitter as our use case.

**More on the use case**

The [sustainable future mission](https://www.nesta.org.uk/sustainable-future/) at [Nesta](https://www.nesta.org.uk/) is focused on projects to help decarbonise UK homes, with special interest in greener heating systems such as heat pumps. For those who are not familiar with the concept, a heat pump is a low-carbon heating system that captures heat from outside and moves it into your home.

Let us have a go at collecting tweets mentioning heat pumps!

### Importing packages and loading credentials
We start by importing the necessary packages to run the code.

In [None]:
import requests
import json
import time
import random
import os

We import our *bearer_token* which we previously defined as an environment variable. This way you do not have to expose your credentials in your code.

In [None]:
bearer_token = os.environ.get("BEARER_TOKEN")

### Preparing our API request
We will use the recent search endpoint to collect our first set of tweets. To do that we need to define the endpoint URL, the rules clarifying the data we want to collect and other query parameters (such as fields to include and maximum number of results).

In [None]:
endpoint_url = "https://api.twitter.com/2/tweets/search/recent"

We create a dictionary with query parameters, where we pass the following fields:
- **query**: with the rule to query the data. In this case we will collect tweets matching on one of the expressions "heat pump"/"heat pumps", written in english, which are not retweets.
- **tweet.fields**: fields in the tweet object for which we want to collect information, in this example: the tweet unique identifier, the tweet text, the identifier of the user posting the tweet and the date/time the tweet was created;
- **max_results**: the maximum number of tweets to be retrieved per request to the API (defaults to 10 with a maximum of 100).

In [None]:
query_parameters = {
    "query": '("heat pump" OR "heat pumps") lang:en -is:retweet',
    "tweet.fields": "id,text,author_id,created_at",
    "max_results": 10,
}

### Authentication
Authentication is done by bearer token.

In [None]:
def request_headers(bearer_token: str) -> dict:
    """
    Sets up the request headers. 
    Returns a dictionary summarising the bearer token authentication details.
    """
    return {"Authorization": "Bearer {}".format(bearer_token)}

In [None]:
headers = request_headers(bearer_token)

### Connecting to endpoint and requesting data
We connect to the endpoint and retrieve our first page of data.

In [None]:
def connect_to_endpoint(endpoint_url: str, headers: dict, parameters: dict) -> json:
    """
    Connects to the endpoint and requests data.
    Returns a json with Twitter data if a 200 status code is yielded.
    Programme stops if there is a problem with the request and sleeps
    if there is a temporary problem accessing the endpoint.
    """
    response = requests.request(
        "GET", url=endpoint_url, headers=headers, params=parameters
    )
    response_status_code = response.status_code
    if response_status_code != 200:
        if response_status_code >= 400 and response_status_code < 500:
            raise Exception(
                "Cannot get data, the program will stop!\nHTTP {}: {}".format(
                    response_status_code, response.text
                )
            )
        
        sleep_seconds = random.randint(5, 60)
        print(
            "Cannot get data, your program will sleep for {} seconds...\nHTTP {}: {}".format(
                sleep_seconds, response_status_code, response.text
            )
        )
        time.sleep(sleep_seconds)
        return connect_to_endpoint(endpoint_url, headers, parameters)
    return response.json()

In [None]:
json_response = connect_to_endpoint(endpoint_url, headers, query_parameters)

### Taking a look at the collected data

The **json_response** variable contains our Twitter data. It is a dictionary with two keys, *data* and *meta* (standing for metadata).

In [None]:
type(json_response)

In [None]:
json_response.keys()

If we take a look at *meta*, we can see that it contains information about the newest and oldest tweet identifiers collected, the number of tweets collected and a *next_token* identifier. We will learn more about the *next_token* identifier in the next notebook of this tutorial.

In [None]:
json_response["meta"]

And now, let's finally take a look at the tweets!

json_response["data"] is a list of dictionaries with size equal to the *result_count* query parameter (i.e. number of tweets we collected, in this case 10). Each dictionary in the list represents one tweet and it contains the **tweet.fields** information for each tweet.

With this first request we get the newest possible tweets matching our rule in the past 7 days - but not all!

In [None]:
len(json_response["data"])

In [None]:
json_response["data"][0]

**Here we are! We have collected our first 10 tweets** 💪🤓

In the next notebook, **"Tweets from the past 7 days"**, we will build on this code and use the recent search endpoint to its full potential to retrieve the remaining tweets from the last 7 days.

This code was inspired in official Twitter code in this [GitHub repo](https://github.com/twitterdev/Twitter-API-v2-sample-code).