# Using an API

An Application Programming Interface (API) is an application programming interface, or API, enables companies to open up their applications’ data and functionality to external third-party developers, business partners, and internal departments within their companies. This allows services and products to communicate with each other and leverage each other’s data and functionality through a documented interface. Developers don't need to know how an API is implemented; they simply use the interface to communicate with other products and services. API use has surged over the past decade, to the degree that many of the most popular web applications today would not be possible without APIs.

Basically it's a way to query databases without having to much permissions on them.

We'll be using the [Schiphol API](https://developer.schiphol.nl/login). You need to register to get a key. Keep this key personal.

In [None]:
with open('schiphol_api_key.txt') as f:
    app_key = f.readline().strip()

with open('schiphol_app_key.txt') as f:
    app_id = f.readline().strip()


Next is getting to know what Schiphol is willing to tell us. This is all in the [documentation](https://developer.schiphol.nl/documentation). There is a quickstart-part there, so let's start with copying that an reformatting it a little bit.

In [None]:
import requests

url = 'https://api.schiphol.nl/public-flights/flights'

headers = {
    'accept': 'application/json',
    'resourceversion': 'v4',
    'app_id': app_id,
    'app_key': app_key
}

try:
    response = requests.request('GET', url, headers=headers)
except requests.exceptions.ConnectionError as error:
    print(error)

if response.status_code == 200:
    flightList = response.json()
    print('found {} flights.'.format(len(flightList['flights'])))
    for flight in flightList['flights']:
        print('Found flight with name: {} scheduled on: {} at {}'.format(
            flight['flightName'],
            flight['scheduleDate'],
            flight['scheduleTime']))
else:
    print('''Oops something went wrong Http response code: {} {}'''.format(response.status_code, response.text))


This give use the first 20 flights that arrived or took of at Schiphol airport. Could we get this for another date? More [documentation](https://developer.schiphol.nl/apis/flight-api/v4/flights?version=latest):

![](images/2022-08-31-09-35-24.png)

This screenshot shows how a date can be passed. Also note how the ResourceVersion is a **header** and scheduleDate is a **query**. This means we have to add it to the URL. You could do this manually, but there's also a [library](https://www.markhneedham.com/blog/2019/01/11/python-add-query-parameters-url/) for that.

In [None]:
from requests.models import PreparedRequest
req = PreparedRequest()

url = 'https://api.schiphol.nl/public-flights/flights' # same as before
params = {'scheduleDate':'2022-07-23','page':'2'}
req.prepare_url(url, params)

print(req.url)

url = 'https://api.schiphol.nl/public-flights/flights' # same as before
params = {'scheduleDate':'2022-07-23','page':'2','test_space':'a name'}
req.prepare_url(url, params)

print(req.url)

This is promising, but does it work?

In [None]:
url = 'https://api.schiphol.nl/public-flights/flights' # same as before
params = {'scheduleDate':'2022-07-23','page':'2'}
req.prepare_url(url, params)

print(req.url)

try:
    response = requests.request('GET', req.url, headers=headers)
except requests.exceptions.ConnectionError as error:
    print(error)

if response.status_code == 200:
    flightList = response.json()
    print('found {} flights.'.format(len(flightList['flights'])))
    for flight in flightList['flights']:
        print('Found flight with name: {} scheduled on: {} at {}'.format(flight['flightName'],flight['scheduleDate'],flight['scheduleTime']))
else:
    print('''Oops something went wrong Http response code: {} {}'''.format(response.status_code, response.text))

We get the flights from somewhere mid-summer, and not the first flights of the day so the pagination works.

Which leads to another question: how many pages are there? Luckily, there is some [documentation](https://developer.schiphol.nl/apis/flight-api/overview?version=latest) on that. We need to get the HTTP-response headers. They are in the response (we can reuse the one we got earlier, so no need to put in another request.

In [None]:
print(response.headers)

We get four links:

* <https://api.schiphol.nl:443/public-flights/flights?scheduleDate=2022-07-23>; rel="first"
* <https://api.schiphol.nl:443/public-flights/flights?scheduleDate=2022-07-23&page=1>; rel="prev"
* <https://api.schiphol.nl:443/public-flights/flights?scheduleDate=2022-07-23&page=3>; rel="next"
* <https://api.schiphol.nl:443/public-flights/flights?scheduleDate=2022-07-23&page=199>; rel="last"'

They are written still in one big chunk though, so let's split them up. Good news: we get to use regex!

In [None]:
import re

links = response.headers['link']
separate = links.split(',')

links_dict = {}
for link in separate:
    url, name = link.split(";")
    # name = re.sub(" rel=\"([a-z]*)\"", r'\1', name)
    name = re.sub(" rel=\"(first|next|prev|last)\"", r'\1', name) # much stricter, so better
    links_dict[name] = url

print(links_dict)


What we could do to get all pages is always get the "next" page until we run out of next-pages. Another angle would be to look for the last page number and write a for-loop to get all pages in between. Let's first check what the header in this last page looks like.

In [None]:
url = 'https://api.schiphol.nl/public-flights/flights'
params = {'scheduleDate':'2022-07-23','page':'199'}
req.prepare_url(url, params)

try:
    response = requests.request('GET', req.url, headers=headers)
except requests.exceptions.ConnectionError as error:
    print(error)

if response.status_code == 200:
    flightList = response.json()
    print('found {} flights.'.format(len(flightList['flights'])))
    # for flight in flightList['flights']:
    #     print('Found flight with name: {} scheduled on: {} at {}'.format(flight['flightName'],flight['scheduleDate'],flight['scheduleTime']))

print(response.headers)


There are no longer links for "next" or "last". This is a recipe for a very nice recursive function, something like:

<pre>
def get_all_pages(date, url = None):
    if url = None:
        create url from date
    download the pages from url
    store in list

    if header has "next":
        list += get_all_pages(next_url)

    return list
</pre>

But let's do a loop for now. And start dividing up the code in functions.

* Get_page: get a page based on a URL. Returns all flights and all response headers in one dict.
* Create_url: create a URL based on a date and a page.
* Get_links: convert the "link"-part in the headers to a dict with all links.

In [None]:
def get_page(url):
    print(url)
    try:
        response = requests.request('GET', url, headers=headers)
    except requests.exceptions.ConnectionError as error:
        print(error)

    return_list = {'headers': response.headers}

    if response.status_code == 200:
        flightList = response.json()
        return_list['flights'] = flightList['flights']
    
    return return_list

def create_url(base_url, date, page=0):
    req = PreparedRequest()
    params = {'scheduleDate':date}
    if page > 0:
        params['page'] = page
    
    req.prepare_url(base_url, params)
    return req.url

def get_links(raw):
    separate = raw.split(',')

    links_dict = {}
    for link in separate:
        url, name = link.split(";")
        # name = re.sub(" rel=\"([a-z]*)\"", r'\1', name)
        name = re.sub(" rel=\"(first|next|prev|last)\"", r'\1', name) # much stricter, so better
        links_dict[name] = url

    return links_dict

And now al is left is to use these functions...

In [None]:
url = create_url('https://api.schiphol.nl/public-flights/flights', '2022-07-23')

page = get_page(url)

all_flights = list()

if 'flights' in page:
    all_flights.extend(page['flights'])

if 'headers' in page:
    links = get_links(page['headers']['link'])
    print(links)
    match = re.search("page=(\d{1,4})", links['last'])
    if match:
        last_page = int(match[1])


# for i in range(2,last_page+1):
for i in range(last_page-5,last_page+1):
    url = create_url('https://api.schiphol.nl/public-flights/flights', '2022-07-23', i)
    page = get_page(url)
    if 'flights' in page:
        all_flights.extend(page['flights'])

print(len(all_flights))

Note how there are a lot of unnecessary ifs there. They become useful when you enter a non-existing day and there is no response. They're not perfect yet, you could play around with it to get the code more robust. But keep in mind the tradeoff when robusting code: <quote>robustness of code should proportional to the level of intelligence of the users</quote>. This means:

* Code written for users (aka non-IT) should be very robust.
* Code written for colleagues should be pretty robust.
* Code written for yourself should be somewhat robust. Or not robust at all. Much depends on how much time you have to write the code.

You could print the flights, or store them in a CSV or a pickle-file. That is all up to you!

In [None]:
print(all_flights)

So is this a finished product? No. This notebook has tried to show the process of understanding and querying an API. If you want a real script to download all flights for a certain month you'll have some more copy-pasting to do into a .py-file (don't forget the imports) and figure out what of the flights you want to store. But the difficult part is done!