<h1>Working With APIs in Python - Pagination and Data Extraction</h1>
<h2>1. Introduction</h2>
<p>In this notebook, I'll be following the <a href='https://www.youtube.com/watch?v=-oPuGc05Lxs'>YouTube tutorial</a> by <a href='https://www.youtube.com/c/JohnWatsonRooney'>John Watson Rooney</a> on how to extract data from an API using pagination methods. I decided to take these notes after struggling with similar APIs. I am doing as a pedagogical exercise and, in case if anybody is interested, as a way to share some valuable knowledge (it is at least to me!).</p>
<p>John used the <strong>Rick and Morty</strong> API. Here's the <a href='https://rickandmortyapi.com/documentation/'>link</a> to its documentation.</p>
<p><strong>Base url</strong>: https://rickandmortyapi.com/api</p>


<p>Once you click on the API url, you get to see the following links, which you can visit just as a website.</p>

```json
{
  "characters": "https://rickandmortyapi.com/api/character",
  "locations": "https://rickandmortyapi.com/api/location",
  "episodes": "https://rickandmortyapi.com/api/episode"
}
```

<p>By selecting, say <code>"characters"</code>, you get to see how the data is stored in the API. Exploring these features, together with <strong>reading the docs</strong> should be the first things to do when trying to consume any API.</p>
<p>Let's take a look at a few parts of the result of the main query:</p>

```json
{
  "info": {
    "count": 826,
    "pages": 42,
    "next": "https://rickandmortyapi.com/api/character?page=2",
    "prev": null
  },
 ```

<p>In the above excerpt, we get some important data on the dataset. We get to see the number of characters (<code>"count"</code>), the number of pages (<code>"pages"</code>), and the <code>"next"</code> key, which is the main reason I'm here typing all this info.</p>

```json
 "results": [
    {
      "id": 1,
      "name": "Rick Sanchez",
      "status": "Alive",
      "species": "Human",
      "type": "",
      "gender": "Male",
      "origin": {
        "name": "Earth (C-137)",
        "url": "https://rickandmortyapi.com/api/location/1"
      },
      "location": {
        "name": "Citadel of Ricks",
        "url": "https://rickandmortyapi.com/api/location/3"
      },
      "image": "https://rickandmortyapi.com/api/character/avatar/1.jpeg",
      "episode": [
        "https://rickandmortyapi.com/api/episode/1",
        "https://rickandmortyapi.com/api/episode/2",
        "https://rickandmortyapi.com/api/episode/3",
        "https://rickandmortyapi.com/api/episode/4",
        "https://rickandmortyapi.com/api/episode/5",
        "https://rickandmortyapi.com/api/episode/6",
        "https://rickandmortyapi.com/api/episode/7",
        "https://rickandmortyapi.com/api/episode/8",
        "https://rickandmortyapi.com/api/episode/9",
        "https://rickandmortyapi.com/api/episode/10",
        "https://rickandmortyapi.com/api/episode/11",
        "https://rickandmortyapi.com/api/episode/12",
        "https://rickandmortyapi.com/api/episode/13",
        "https://rickandmortyapi.com/api/episode/14",
        "https://rickandmortyapi.com/api/episode/15",
        "https://rickandmortyapi.com/api/episode/16",
        "https://rickandmortyapi.com/api/episode/17",
        "https://rickandmortyapi.com/api/episode/18",
        "https://rickandmortyapi.com/api/episode/19",
        "https://rickandmortyapi.com/api/episode/20",
        "https://rickandmortyapi.com/api/episode/21",
        "https://rickandmortyapi.com/api/episode/22",
        "https://rickandmortyapi.com/api/episode/23",
        "https://rickandmortyapi.com/api/episode/24",
        "https://rickandmortyapi.com/api/episode/25",
        "https://rickandmortyapi.com/api/episode/26",
        "https://rickandmortyapi.com/api/episode/27",
        "https://rickandmortyapi.com/api/episode/28",
        "https://rickandmortyapi.com/api/episode/29",
        "https://rickandmortyapi.com/api/episode/30",
        "https://rickandmortyapi.com/api/episode/31",
        "https://rickandmortyapi.com/api/episode/32",
        "https://rickandmortyapi.com/api/episode/33",
        "https://rickandmortyapi.com/api/episode/34",
        "https://rickandmortyapi.com/api/episode/35",
        "https://rickandmortyapi.com/api/episode/36",
        "https://rickandmortyapi.com/api/episode/37",
        "https://rickandmortyapi.com/api/episode/38",
        "https://rickandmortyapi.com/api/episode/39",
        "https://rickandmortyapi.com/api/episode/40",
        "https://rickandmortyapi.com/api/episode/41",
        "https://rickandmortyapi.com/api/episode/42",
        "https://rickandmortyapi.com/api/episode/43",
        "https://rickandmortyapi.com/api/episode/44",
        "https://rickandmortyapi.com/api/episode/45",
        "https://rickandmortyapi.com/api/episode/46",
        "https://rickandmortyapi.com/api/episode/47",
        "https://rickandmortyapi.com/api/episode/48",
        "https://rickandmortyapi.com/api/episode/49",
        "https://rickandmortyapi.com/api/episode/50",
        "https://rickandmortyapi.com/api/episode/51"
      ],
      "url": "https://rickandmortyapi.com/api/character/1",
      "created": "2017-11-04T18:48:46.250Z"
    },
```

<p>Now, we are actually dealing with the data on Rick and Morty characters. We got 20 entries for this page, each one containing different features of each character.</p>

<h2>2. Importing libraries, doing our request </h2>

In [37]:
import requests
import pandas as pd

<p>Let's set our base url separated from the endpoint, like <code>character</code>.</p>

In [2]:
baseurl = 'https://rickandmortyapi.com/api/'
endpoint = 'character'

In [3]:
r = requests.get(baseurl + endpoint)

In [4]:
print(r)

<Response [200]>


<p>We got a good response (200).</p>
<p>To get the actual information, we add <code>.json()</code>, giving the response in this format.</p>

In [5]:
r.json()

{'info': {'count': 826,
  'pages': 42,
  'next': 'https://rickandmortyapi.com/api/character?page=2',
  'prev': None},
 'results': [{'id': 1,
   'name': 'Rick Sanchez',
   'status': 'Alive',
   'species': 'Human',
   'type': '',
   'gender': 'Male',
   'origin': {'name': 'Earth (C-137)',
    'url': 'https://rickandmortyapi.com/api/location/1'},
   'location': {'name': 'Citadel of Ricks',
    'url': 'https://rickandmortyapi.com/api/location/3'},
   'image': 'https://rickandmortyapi.com/api/character/avatar/1.jpeg',
   'episode': ['https://rickandmortyapi.com/api/episode/1',
    'https://rickandmortyapi.com/api/episode/2',
    'https://rickandmortyapi.com/api/episode/3',
    'https://rickandmortyapi.com/api/episode/4',
    'https://rickandmortyapi.com/api/episode/5',
    'https://rickandmortyapi.com/api/episode/6',
    'https://rickandmortyapi.com/api/episode/7',
    'https://rickandmortyapi.com/api/episode/8',
    'https://rickandmortyapi.com/api/episode/9',
    'https://rickandmortyapi.

<p>Now, let's store our result in a new variable, so that we can access the keys of the response.</p>

In [6]:
data = r.json()

In [7]:
data['info']

{'count': 826,
 'pages': 42,
 'next': 'https://rickandmortyapi.com/api/character?page=2',
 'prev': None}

<p>Great! Now, we can get hold of the number of pages, since we want to get all of the responses for all characters.</p>
<p><strong>Reminder</strong>: it'as a good practice to do an initial request just to see what kind of data you are dealing with. In my experience, this is particularly important, as not all APIs will let you extract data freely; you might run into query limitations (they are actually the rule, and not the exception), for instance.</p>
<p>Let's get the number of pages and store it in a variable:</p>

In [9]:
pages = data['info']['pages']
print(pages)

42


<p>Now, let's check the actual data by tapping into <code>data['results']</code>. Let's get the first item of the list by slicing it with <code>[0]</code>.</p>

In [10]:
data['results'][0]

{'id': 1,
 'name': 'Rick Sanchez',
 'status': 'Alive',
 'species': 'Human',
 'type': '',
 'gender': 'Male',
 'origin': {'name': 'Earth (C-137)',
  'url': 'https://rickandmortyapi.com/api/location/1'},
 'location': {'name': 'Citadel of Ricks',
  'url': 'https://rickandmortyapi.com/api/location/3'},
 'image': 'https://rickandmortyapi.com/api/character/avatar/1.jpeg',
 'episode': ['https://rickandmortyapi.com/api/episode/1',
  'https://rickandmortyapi.com/api/episode/2',
  'https://rickandmortyapi.com/api/episode/3',
  'https://rickandmortyapi.com/api/episode/4',
  'https://rickandmortyapi.com/api/episode/5',
  'https://rickandmortyapi.com/api/episode/6',
  'https://rickandmortyapi.com/api/episode/7',
  'https://rickandmortyapi.com/api/episode/8',
  'https://rickandmortyapi.com/api/episode/9',
  'https://rickandmortyapi.com/api/episode/10',
  'https://rickandmortyapi.com/api/episode/11',
  'https://rickandmortyapi.com/api/episode/12',
  'https://rickandmortyapi.com/api/episode/13',
  'htt

<p>We can now store relevant info in different variables.</p>

In [12]:
name = data['results'][0]['name']
episodes = data['results'][0]['episode']

In [48]:
print(name)

Rick Sanchez


In [13]:
print(episodes)

['https://rickandmortyapi.com/api/episode/1',
 'https://rickandmortyapi.com/api/episode/2',
 'https://rickandmortyapi.com/api/episode/3',
 'https://rickandmortyapi.com/api/episode/4',
 'https://rickandmortyapi.com/api/episode/5',
 'https://rickandmortyapi.com/api/episode/6',
 'https://rickandmortyapi.com/api/episode/7',
 'https://rickandmortyapi.com/api/episode/8',
 'https://rickandmortyapi.com/api/episode/9',
 'https://rickandmortyapi.com/api/episode/10',
 'https://rickandmortyapi.com/api/episode/11',
 'https://rickandmortyapi.com/api/episode/12',
 'https://rickandmortyapi.com/api/episode/13',
 'https://rickandmortyapi.com/api/episode/14',
 'https://rickandmortyapi.com/api/episode/15',
 'https://rickandmortyapi.com/api/episode/16',
 'https://rickandmortyapi.com/api/episode/17',
 'https://rickandmortyapi.com/api/episode/18',
 'https://rickandmortyapi.com/api/episode/19',
 'https://rickandmortyapi.com/api/episode/20',
 'https://rickandmortyapi.com/api/episode/21',
 'https://rickandmorty

<p>Let's figure out how many episodes has this character been at.</p>

In [15]:
print(len(episodes))

51


<h2>3. Make the API request</h2>

<p>Now, we can start writing some functions to make a proper API request.</p>

<p><code>main_request()</code> takes three arguments: <code>baseurl</code>, <code>endpoint</code>, and <code>x</code>. With it, you're going to make a request to the API, using its base url, its endpoint (in this case, <code>character</code>), and the page you want to return. This will be particulary important when combined with other functions. It will, as a result, return the response in a <code>.json()</code> format.</p>

In [29]:
def main_request(baseurl, endpoint, x):
    r = requests.get(baseurl + endpoint + f'?page={x}')
    return r.json()

<code>get_pages()</code> takes <code>response</code> as argument, returning the number of pages of the API. This is important when looping through the API and scraping the data from each page.</p>

In [21]:
def get_pages(response):
    return response['info']['pages']
    

<p>Finally, the function <code>parse_json()</code> iterates over the <code>response['results']</code> and return info based on the keys we put into it (here, <code>item['id']</code>, <code>['name']</code>, and <code>['episode']</code>). The outcome will be stored in a variable <code>charlist</code> as a list.</p>

In [42]:
def parse_json(response):
    charlist = []
    for item in response['results']:
        char = {
            'id': item['id'],
            'name': item['name'],
            'no_ep': len(item['episode']),
        }

        charlist.append(char)
    return charlist

<p>Now, using the number of pages as a range of our dataset, we can iterate over all pages and get hold of all the data stored in each page. We can print the number of pages to see if we looped through all the dataset, and store all the results in a new variable <code>mainlist</code>.</p>

In [49]:
mainlist = []
for x in range(1, get_pages(data)+1):
    print(x)
    mainlist.extend(parse_json(main_request(baseurl, endpoint, x)))


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


<h2>4. Store results in a DataFrame, save them in a .csv file</h2>

<p>Finally, we can use Pandas <code>DataFrame</code>, store all the data and save it into a <code>.csv</code>.</p>

In [45]:
char_df = pd.DataFrame(mainlist)

In [46]:
char_df.head()

Unnamed: 0,id,name,no_ep
0,1,Rick Sanchez,51
1,2,Morty Smith,51
2,3,Summer Smith,42
3,4,Beth Smith,42
4,5,Jerry Smith,39


In [41]:
char_df.tail()

Unnamed: 0,name,no_ep
821,Young Jerry,1
822,Young Beth,1
823,Young Beth,1
824,Young Jerry,1
825,Butter Robot,1


In [47]:
char_df.to_csv('data/charlist.csv', index=False)

<p>And that's it! That was a great tutorial, and hopefully I'll get to use it in my side projects!</p>