## Using API's in Python

### How to use API's

API's are very different but usually consist of the following components:

**Request:** Like any other interactions with the web, using API's involves sending a request (GET or POST) to the API.

**Endpoint:** API's usually consists of different endpoints. These can be considered different outlets. Endpoints are simply URLs we send the request to.

**Parameters:** Parameters are the arguements the endpoint accepts. Some may be required, others are optionals.

**Authentication:** Most API's requires some kind of authentication. This can be either HTTPS authentication (username and password) or authentication via tokens. Tokens are essentially unique keys that identify who is making the request.

## Example: Using the Statistics Denmark’s API for StatBank

Link to API documentation: https://www.dst.dk/en/Statistik/brug-statistikken/muligheder-i-statistikbanken/api

The Statistics Denmark's API for StatBank makes it possible to access the data in Statbank.

The following demonstrates how to interact with the API directly via python.

*Note*: The StatBank API does not require authentication

### Extracting data from the StatBank


In [None]:
import requests

statbank_api = "https://api.statbank.dk/v1/data"  #Endpoint of the data API

data_req = {'table': 'folk1c',
            'format': 'CSV',
            'variables': [{'code': 'OMRÅDE', 'values': ['101', '851']},  #Request in JSON/dictionary
                                                            {'code': 'ALDER', 'values': ['20-24', '25-29']}]
           }

data_req = requests.post(statbank_api, json=data_req)  #Sending requests

print(data_req.text)  #Printing the raw text output

The data API returns commma-separated values by default (csv).

This output is directly readable by the `pandas` package (`pd.read_csv`)

In [None]:
from io import StringIO
import pandas as pd

dstdata = StringIO(data_req.text)  #Read the data output as raw text
dstdf = pd.read_csv(dstdata, sep=";")  #Read text as csv
dstdf  #Print data

In [None]:
dstdf.groupby(['OMRÅDE']).sum()  #Group by municipality and count sum

## Example: Using the Pushshift API for Reddit data

Link to API documentation: https://github.com/pushshift/api

The Pushshift API provides endpoints for extracting submission and comments from continously updated datasets of the entirety of reddit.com.

There are two main endpoints used to search all publicly available comments and submissions on Reddit:
- https://api.pushshift.io/reddit/search/comment/
- https://api.pushshift.io/reddit/search/submission/

Searching comments for a specific submission, requires retrieving the *comment ids* for a submission. These can be retrieved via the endpoint:
- https://api.pushshift.io/reddit/submission/comment_ids/

### Searching for submissions in a subreddit

In the code below, we extract data on submissions from X. We provide the API the following parameters:
- The subreddit to search for (via the `subreddit`)
- A search query (via the  `q` parameter)
- A timeframe (via the `before` and `after` parameter)

In [None]:
import requests
from datetime import datetime

endpoint = 'https://api.pushshift.io/reddit/search/submission/'
    
subreddit = 'denmark'
start_time = int(datetime(2022,9,21,0,0).timestamp())
end_time = int(datetime(2022,9,28,0,0).timestamp())
q = 'mette'

params = {'subreddit': subreddit,
          'after': start_time,
          'before': end_time,
          'size': 499,
          'q': q
         }

r = requests.get(endpoint, params = params)

In [None]:
r.status_code

In [None]:
submissions = r.json().get('data')

In [None]:
submissions[0]

### Collecting comments from a submission

To collect comments, we first need the comment ids. In the code below, we first extract the submission id for a submission collected. Afterwards we retrieve the comment ids using the "comment_ids" endpoint. Finally, we retrieve comment data using the "comment" endpoint.

In [None]:
subcomment_end = "https://api.pushshift.io/reddit/submission/comment_ids/" # endpoint for getting comment id

subid = submissions[5].get('id') # get id for submission 5 in the collected submissions

request_url = f"{subcomment_end}{subid}" # API call via URL as f-string (the two strings above basically pasted together)

r = requests.get(request_url) # send request

In [None]:
comment_ids = r.json().get('data') # collect comment ids

In [None]:
comment_end = "https://api.pushshift.io/reddit/search/comment/" # endpoint for comment data

params = {'ids': comment_ids} # set parameters - the comment ids to collect
       
r = requests.get(comment_end, params = params) # API call

In [None]:
comments = r.json().get('data') # get comment from response

In [None]:
comments[0]

## Example: Using the Twitter API

***NOTE***: This notebook uses a token that is not included in the notebook. You will not be able to reproduce this on your own computer without proper authentication (for this you need access to the Twitter enterprise API: https://developer.twitter.com/en/docs/twitter-api/getting-started/getting-access-to-the-twitter-api

The Twitter API contains a wide variety of endpoints for both interacting with Twitter (sending tweets, replying) and for retrieving data.

The example below uses the "Search Tweets" endpoints (full archive search): https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-all. The example retrieves tweets from Elon Musk from the last week.

It is adapted from Twitter's own sample code: https://github.com/twitterdev/Twitter-API-v2-sample-code/blob/main/Full-Archive-Search/full-archive-search.py

In [None]:
import requests
import os
import json
import time
from datetime import datetime, timedelta

# token and endpoint
with open(os.path.join("C:/", "repos", "tokens", "twitter_bearer.txt"), 'r') as f:
    bearer_token = f.read()

search_url = "https://api.twitter.com/2/tweets/search/all"

# set start_time
d = datetime.today() - timedelta(days=7)
start_time = f"{str(d.date())}T00:00:00Z"

query_params = {'query': 'from:elonmusk -is:retweet',
                'tweet.fields': 'entities,public_metrics,created_at,referenced_tweets',
                'expansions': 'author_id',
                'user.fields': 'created_at,description,public_metrics,url,verified', 
                'max_results': 500,
                'start_time': start_time}


def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2FullArchiveSearchPython"
    return r


def connect_to_endpoint(url, params):
    response = requests.get(search_url, auth=bearer_oauth, params=params)
    #print(response.status_code)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()


def initial():
    json_response = connect_to_endpoint(search_url, query_params)
    return(json_response)

def continued(next_token):
    new_params = query_params.copy()
    new_params['next_token'] = next_token
    json_response = connect_to_endpoint(search_url, new_params)
    return(json_response)

data = initial()
all_data = data.copy()
all_data.pop('meta', None)

used_next_tokens = []
next_token = data.get('meta').get('next_token')

if next_token is not None:
    while True:
        time.sleep(1)
        data = continued(next_token)
        all_data['data'] = all_data.get('data') + data.get('data')
        all_data['includes']['users'] = all_data.get('includes').get('users') + data.get('includes').get('users')

        used_next_tokens.append(next_token)

        next_token = data.get('meta').get('next_token')

        if next_token is None:
            break