# Getting data from Web APIs

Most Web sites and applications nowadays provides access to its data and functionality using the REST architectural approach. The essential feature of REST is that it is acccessed using the HTTP protocol, i.e. you invoke operations by using the URLs of those.
While a call to a REST operation can be used for getting data or for putting or doing some modification, here we focus only on reading data.

NOTE: Data in this Notebook is downloaded live, so some of the sentences may not work if some data has been removed from the site.

## A simple example

In REST APIs we need to first learn the different endpoints or objects offered. As an example, let's look at the RESTful API of 4chan: https://github.com/4chan/4chan-API

We can see in the left different endpoints: 

A list of threads:

http(s)://a.4cdn.org/board/threads.json

A list of archived thread IDs can be found here:

http(s)://a.4cdn.org/board/archive.json

A list of boards is exposed at the following URL:

http(s)://a.4cdn.org/boards.json

## Getting a piece of data

Getting data is as simple as requesting one of these URIs.

In [1]:
import requests as rq
r = rq.get('https://a.4cdn.org/boards.json')
print r.status_code
print r.headers['content-type']
print r.text[:1000]

200
application/json
{"boards":[{"board":"3","title":"3DCG","ws_board":1,"per_page":15,"pages":10,"max_filesize":4194304,"max_webm_filesize":3145728,"max_comment_chars":2000,"max_webm_duration":120,"bump_limit":310,"image_limit":150,"cooldowns":{"threads":600,"replies":60,"images":60},"meta_description":"\u0026quot;\/3\/ - 3DCG\u0026quot; is 4chan's board for 3D modeling and imagery.","is_archived":1},{"board":"a","title":"Anime \u0026 Manga","ws_board":1,"per_page":15,"pages":10,"max_filesize":4194304,"max_webm_filesize":3145728,"max_comment_chars":2000,"max_webm_duration":120,"bump_limit":500,"image_limit":250,"cooldowns":{"threads":600,"replies":60,"images":60},"meta_description":"\u0026quot;\/a\/ - Anime \u0026amp; Manga\u0026quot; is 4chan's imageboard dedicated to the discussion of Japanese animation and manga.","spoilers":1,"custom_spoilers":1,"is_archived":1},{"board":"aco","title":"Adult Cartoons","ws_board":0,"per_page":15,"pages":10,"max_filesize":4194304,"max_webm_filesize"

## Parsing Json

Javascript Object Notation (JSON) is a format for encoding data in a hierarchical way similar to XML. Many REST APIs return JSON objects, as seen in the example above. We can parse JSON objects from inside Python. 

In [2]:
import json
boards = json.loads(r.text)
print type(boards)
# print(json.dumps(boards, indent=2))

<type 'dict'>


## Dictionaries in Python

JSON is basically composed of dictionaries, i.e. maps of keys (strings) to values. And the values associated can be another dictionary. For example, in the above JSON, key "board" has as value an string, but key "cooldowns" is associated to another dictionary with several key-value pairs describing more data grouped.

Python has a dictionary type that is very similar to JSON structures.

In [3]:
myoffer = {"title": "Data scientist", "job_type": "full-time"}
print myoffer["job_type"]
myoffer["salary_max"] = 200000
print len(myoffer)

full-time
3


## Converting a JSON document to a Dictionary

The <code>json.loads</code> converts JSON documents into Python data types using the following conventions:
    https://docs.python.org/3/library/json.html#json.loads

The single job above is a JSON object, so that it is directly converted to a dict.

In [4]:
boards["boards"][4]

{u'board': u'an',
 u'bump_limit': 310,
 u'cooldowns': {u'images': 60, u'replies': 60, u'threads': 600},
 u'image_limit': 150,
 u'is_archived': 1,
 u'max_comment_chars': 2000,
 u'max_filesize': 4194304,
 u'max_webm_duration': 120,
 u'max_webm_filesize': 3145728,
 u'meta_description': u"&quot;/an/ - Animals &amp; Nature&quot; is 4chan's imageboard for posting pictures of animals, pets, and nature.",
 u'pages': 10,
 u'per_page': 15,
 u'title': u'Animals & Nature',
 u'ws_board': 1}

## Transforming data into DataFrames

We can build data frames from JSON data by first creating the columns (Series) and then populating one row per JSON fragment that is retrieved.

In [5]:
import pandas as pd

boardsframe = pd.DataFrame(columns=["title", "meta_description", "image_limit", "threads_cd"])

def add_row(df, board):
    title = board["title"]
    metadesc = board["meta_description"]
    imagelim = int(board["image_limit"])
    threadscs = int(board["cooldowns"]["threads"])

    df.loc[board["board"]] = [title, metadesc, imagelim, threadscs]
    
for row in boards["boards"]:
    add_row(boardsframe, row)


boardsframe.head(10)    


Unnamed: 0,title,meta_description,image_limit,threads_cd
3,3DCG,&quot;/3/ - 3DCG&quot; is 4chan's board for 3D...,150.0,600.0
a,Anime & Manga,&quot;/a/ - Anime &amp; Manga&quot; is 4chan's...,250.0,600.0
aco,Adult Cartoons,&quot;/aco/ - Adult Cartoons&quot; is 4chan's ...,250.0,600.0
adv,Advice,&quot;/adv/ - Advice&quot; is 4chan's board fo...,150.0,600.0
an,Animals & Nature,&quot;/an/ - Animals &amp; Nature&quot; is 4ch...,150.0,600.0
asp,Alternative Sports & Wrestling,&quot;/asp/ - Alternative Sports &amp; Wrestli...,150.0,600.0
b,Random,&quot;/b/ - Random&quot; is the birthplace of ...,150.0,60.0
biz,Business & Finance,&quot;/biz/ - Business &amp; Finance&quot; is ...,150.0,600.0
c,Anime/Cute,&quot;/c/ - Anime/Cute&quot; is 4chan's imageb...,150.0,600.0
cgl,Cosplay & EGL,&quot;/cgl/ - Cosplay &amp; EGL&quot; is 4chan...,150.0,600.0


In [6]:
print len(boardsframe)

71


In [7]:
print len(boardsframe[boardsframe["image_limit"]>200])

31


## Paginating

Most of REST APIs use pagination so that you can get the data in small chunks. In the case of 4chan, this is an example of operation using pagination:
```
http(s)://a.4cdn.org/board/pagenumber.json (1 is main index)
```


We know from the examples above that the /an/ board has 10 pages, so that we can iterate them in the following way:

In [8]:
for p in range(1, 10):
    page = rq.get('https://a.4cdn.org/an/'+str(p)+'.json')
    contents = json.loads(page.text)
    #print contents
    print contents["threads"][0]["posts"][0]["name"]
    print "---------------------------"
    print contents["threads"][0]["posts"][0]["com"]
    print "---------------------------"
    print contents["threads"][0]["posts"][0:2]

Anonymous
---------------------------
So, about a week and a half ago, I was opening the fridge for something or other, and my pet rabbit, Dobbit, dashes into the fridge and clambers onto one of the shelves.<br><br>I pull him out, get my food, forget about it. Over the past period, he keeps running inside, on average 1-2 times a day, and I would shoo him out. I figured that there was something in there he could smell, so today I just tried rewarding his persistence and letting him nibble whatever it was he so badly wanted.<br><br>So today, when he hopped in, I didn&#039;t chase him out. And he didn&#039;t go for anything. He just kind of sat on one of the shelves and grunted contentedly.<br><br>Anyone ever deal with similar behavior? I&#039;m stumped at this one.
---------------------------
[{u'bumplimit': 0, u'name': u'Anonymous', u'no': 2260014, u'tn_w': 177, u'h': 376, u'time': 1478566948, u'fsize': 23863, u'replies': 5, u'imagelimit': 0, u'filename': u'Confusedface', u'tim': 147856