# 1. APIs

## Learning objectives:
- Learn what a REST API is
- Use REST APIs to obtain data

Before we start scraping data by ourselves, remember that you don't always have to start at that point. Some websites offer web APIs we may access to pull information from. Information coming from APIs are returned in a structured format, such as JSON.

The word API keeps popping... but what is it? And what is a REST API?

An **API** stands for Application Programmable Interface. When we are writing a program/code, we would often need to interface with other people's code (e.g. a library). An API defines the rules we need to follow to talk to the code (e.g. function names). In other words, an API defines a standardized syntax that allows one piece of software to communicate with another piece of software

This notebook focuses on web APIs and uses the term API to refer specifically to that type. Keep in mind that, in other contexts, API is also a generic term that can be used to, say, allow a Java program to communicate with a Python program running on the same machine.

A **REST API** allows communication over HTTP. The client sends a request, and the server receives a response. Requests will take on one four following types: GET, PUT, POST, and DELETE. Most related to pulling data from other services (via APIs) is **GET**. As the name implies, this is the HTTP Method we use when we want to request some data. Technically, more than these four exist (such as HEAD, OPTIONS, and CONNECT), but they are rarely used in APIs, and it is unlikely that you will ever see them. You can think of GET as saying, “Hey, web server, please retrieve/get me this information.”

**So how do we request data?**
Well, we need a place to request data from, and this comes in the form of an endpoint URL. An endpoint URL usually looks something along these lines:

![](images/api_url_structure.png)

Let's visit the github API endpoint to see what the **response** is: https://api.github.com/users/ai-core/repos?sort=pushed&direction=desc

As we can see, the response from calling the Github API is a JSON object. However, this doesn't necessarily have to be the case - the developer who coded the API could have allowed for any file format to be returned (XML, CSV, Images etc.). For gathering data through APIs, JSON is typically the easiest to work with, so where possible, we should favour this.

In [None]:
import requests

weather = requests.get('https://samples.openweathermap.org/data/2.5/weather?q=London,uk&appid=b6907d289e10d714a6e88b30761fae22')

In [None]:
weather.json()

In [None]:
weather.status_code

Read the docs! https://api.stackexchange.com/docs

Let's collect data from StackExchange's API. Here we'll be working in a slightly roundabout fashion to pull the data we want from their API. This is for teaching purposes, so we can understand the structure of JSON, and for you to get some hands on experience with using a REST API.

We'll be collecting the body/contents of questions posted on StackOverflow. To do this, we'll first pull some posts within a date range. If the type of the post is a question, we'll make another API request to StackExchange's questions endpoint to pull the body of the question.

## challenge 1 

In [1]:

import requests

# api key
ROOT_URL = "https://api.stackexchange.com"
POSTS_ENDPOINT = "/2.2/posts?pagesize=10&fromdate=1625443200&todate=1625529600&order=desc&sort=votes&site=stackoverflow&filter=!nKzQUR-u0g"
r = requests.get(ROOT_URL+POSTS_ENDPOINT)

In [None]:
r.status_code

In [None]:
r.json()

In [9]:
ROOT_URL = "https://api.stackexchange.com"
POSTS_ENDPOINT = "/2.3/posts?pagesize=10&fromdate=1628553600&todate=1628640000&order=desc&sort=votes&site=stackoverflow"
highly_voted = requests.get(ROOT_URL+POSTS_ENDPOINT)


In [11]:
highly_voted.json()

{'items': [{'owner': {'account_id': 5941570,
    'reputation': 10862,
    'user_id': 4672588,
    'user_type': 'registered',
    'profile_image': 'https://www.gravatar.com/avatar/edcbc85fb3b43202dab37afb2bd5281e?s=128&d=identicon&r=PG&f=1',
    'display_name': 'cpplearner',
    'link': 'https://stackoverflow.com/users/4672588/cpplearner'},
   'score': 15,
   'last_activity_date': 1628581578,
   'creation_date': 1628581578,
   'post_type': 'answer',
   'post_id': 68722862,
   'content_license': 'CC BY-SA 4.0',
   'link': 'https://stackoverflow.com/a/68722862'},
  {'owner': {'account_id': 274825,
    'reputation': 139359,
    'user_id': 567292,
    'user_type': 'registered',
    'accept_rate': 84,
    'profile_image': 'https://www.gravatar.com/avatar/b55da07dea89fa675c7e2894c77f8024?s=128&d=identicon&r=PG',
    'display_name': 'ecatmur',
    'link': 'https://stackoverflow.com/users/567292/ecatmur'},
   'score': 13,
   'last_activity_date': 1628581578,
   'creation_date': 1628580752,
   '

In [12]:
ROOT_URL = "https://api.stackexchange.com"
POSTS_ENDPOINT = "/2.3/badges/name?pagesize=3&fromdate=1628553600&todate=1628640000&order=asc&sort=name&site=stackoverflow"
first3_badges = requests.get(ROOT_URL+POSTS_ENDPOINT)

In [13]:
ROOT_URL = "https://api.stackexchange.com"
POSTS_ENDPOINT = "/2.3/badges/recipients?pagesize=3&fromdate=1628553600&todate=1628640000&site=stackoverflow&filter=!9XUxqaWjU"
three_recent_brzone = requests.get(ROOT_URL+POSTS_ENDPOINT)

Each response has a status code.

### HTTP Codes

You might have heard the term HTTP (Hypertext Transfer Protocol), which essentially is a request-response protocol. When you send a HTTP request to a server, which can host many URL, the server sends a document in response. Each response has a code, which is called status code. It’s a quick way for the server to tell the client approximately what happened to the client’s request.

Here we can see a bunch of different status codes. You've probably seen a 404 error when you search a page that doesn't exist. What else do you recognise?


In [None]:
r.status_code


![](images/http-codes.png)

In [None]:
print(r.json().keys())
print(r.json())

In [None]:
def get_questions(items_object):
    data = {"display_name": [], "profile_image_url": [], "post_id": [], "post_contents": []}
    
    ## Loop over the items object. For the relevant fields in the 'data' variable defined above,
    ## Populate those fields IF the type of the post is a question.
    ## If the type of a post is a question, additionally a request to the relevent API method to obtain the question body
    ## READ READ READ the documentation (or Google it 🙄) to find out how to do so
    ## The question body should be populated in the 'post_contents' field
    ## Return the data object
    for item in items_object:

        if item["post_type"] == "question":

            data["display_name"].append(item["owner"]["display_name"])
            data["profile_image_url"].append(item["owner"]["profile_image"])
            data["post_id"].append(item["post_id"])
            
            question_endpoint = "/2.2/questions/{}?order=desc&sort=activity&site=stackoverflow&filter=withbody".format(item["post_id"])
            r = requests.get(ROOT_URL+question_endpoint)
            if r.status_code == 200:

                body = r.json()["items"][0]["body"]
                print(body)
                data["post_contents"].append(body)
   
        
    return data

In [None]:
r.json()['items']

In [None]:
import pprint

questions = get_questions(r.json()["items"])
pprint.pprint(questions)

## Are we really getting all of the data we asked for? 

If we look in the docs, we'll see that the default and max number of items to return from a request is 100. We'll also see that each request response has a key called `has_more`. This key tells us that we haven't got all of the data, just the first page. Each request also takes a query parameter that indicates what page of the responses we want to get. So we can combine the `has_more` response key and the `page` query string parameter to implement a way to keep making requests if more items are available. Let's make a function to do that now and see how it works.

In [None]:
def get_all(endpoint):
    r = requests.get(endpoint) # make initial request - page query param defaults to 1 if not provided
    r = r.json() # to json

    page = 1
    results = r['items'] # initially the results are the items returned
    
    while r['has_more']: # while there are more items to get
        page += 1 # increment the page number that we are asking for
        e = f'{endpoint}&page={page}' # set the page query param
        print('making request to:', e)
        r = requests.get(e) # make requests
        r = r.json()
        results.extend(r['items']) # extend our list of results with the new results

        if page > 10: # stop after 10 pages (just for time sake in this example)
            break
        
    print(f'i found {len(results)} results')
    return results

In [None]:
for dn, piu, pi, pc in zip(questions["display_name"], questions["profile_image_url"], questions["post_id"], questions["post_contents"]):
    print("Display Name:", dn)
    print("Profile Image URL:", piu)
    print("Post ID:", pi)
    print("Post Body:", pc)
    print()

# Challenge: Pokemon API

1. Go to the [Pokemon api](https://pokeapi.co/). I will guide you through the process a little bit.
2. Read the docs for the API. Especifically, try to see how to retrieve data from Pokemon of a specific type.
3. Retrieve all Pokemon of type 'psychic' whose Pokedex number is lower than 151.
4. Store that information in a dictionary like this one: `psychic_ls = {'Number': [], 'Name': [], 'Link': []}`.

As an statrting point, you can use the following code.

In [22]:
import requests
psy = requests.get('https://pokeapi.co/api/v2/type/psychic')
# Create a dictionary with empty lists that we will populate
psychic_ls = {'Number': [], 'Name': [], 'Link': []}
psy.json().keys()



dict_keys(['damage_relations', 'game_indices', 'generation', 'id', 'move_damage_class', 'moves', 'name', 'names', 'pokemon'])

In [60]:
for pokemon in psy.json()['pokemon']:
    id = pokemon['pokemon']['url'].split('/')[-2]
    if int(id) < 151:
        psychic_ls['Number'].append(id)
        psychic_ls['Name'].append(pokemon['pokemon']['name'])
        psychic_ls['Link'].append(pokemon['pokemon']['url'])
        
        current_pokemon = requests.get('https://pokeapi.co/api/v2/pokemon/'+id+'/')
        poketype = []
        current_pokemon.json
        for slot in current_pokemon.json()['types']:
            poketype.append(slot['type']['name'])

            


{'abilities': [{'ability': {'name': 'overgrow',
    'url': 'https://pokeapi.co/api/v2/ability/65/'},
   'is_hidden': False,
   'slot': 1},
  {'ability': {'name': 'chlorophyll',
    'url': 'https://pokeapi.co/api/v2/ability/34/'},
   'is_hidden': True,
   'slot': 3}],
 'base_experience': 64,
 'forms': [{'name': 'bulbasaur',
   'url': 'https://pokeapi.co/api/v2/pokemon-form/1/'}],
 'game_indices': [{'game_index': 153,
   'version': {'name': 'red', 'url': 'https://pokeapi.co/api/v2/version/1/'}},
  {'game_index': 153,
   'version': {'name': 'blue', 'url': 'https://pokeapi.co/api/v2/version/2/'}},
  {'game_index': 153,
   'version': {'name': 'yellow',
    'url': 'https://pokeapi.co/api/v2/version/3/'}},
  {'game_index': 1,
   'version': {'name': 'gold', 'url': 'https://pokeapi.co/api/v2/version/4/'}},
  {'game_index': 1,
   'version': {'name': 'silver',
    'url': 'https://pokeapi.co/api/v2/version/5/'}},
  {'game_index': 1,
   'version': {'name': 'crystal',
    'url': 'https://pokeapi.co/


5. Extra points: Add a new key to the dictionary called `Type`. In this case, the value of `Type` will be a list of strings. For some Pokemon, the value will be a single-element list, but for others, it will be a list of multiple elements. For example, the value of 'Type' of 'Slowpoke' will be ['Water', 'Psychic'].
6. Extra Points: Using the `urllib.request.urlretrieve` method, save the images of the psychic Pokemon in a folder called `images`. The images will be stored in the attribute 'sprite' in the API, read the docs to see how to get to that key. Use the following code to create a new folder. For this challenge, getting the front_sprite is more than enough

In [None]:
import urllib.request
import os
from pathlib import Path
curr_dir = os.getcwd()
Path(f"{curr_dir}/images").mkdir(parents=True, exist_ok=True)

Since navigating through the API is a little more complicated than the previous challenges, I'll guide you through it.

In [None]:
## For example, for retrieving the image of a pokemon, we need to know the name of the pokemon
bulbasaur = requests.get('https://pokeapi.co/api/v2/pokemon/bulbasaur/')
urllib.request.urlretrieve(bulbasaur.json()['sprites']['front_default'], 'images/bulabasaur.png')

Now, try the same with the rest of the Pokemon you extracted previously!