# HTTP and APIs

In [2]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

## Web Servers and the Web

![](images/wikipedia_main.png)

## HTTP

![inline](images/wikipedia_404.png)



### HTTP Status Codes

(from  http://www.garshol.priv.no/download/text/http-tut.htm)

- 200 OK:
    Means that the server did whatever the client wanted it to, and all is well.

- 201 Created:
    The request has been fulfilled and resulted in a new resource being created. The newly created resource can be referenced by the URI(s) returned in the entity of the response, with the most specific URI for the resource given by a Location header field.

- 400: Bad request
    The request sent by the client didn't have the correct syntax.

- 401: Unauthorized
    Means that the client is not allowed to access the resource. This may change if the client retries with an authorization header.

- 403: Forbidden
    The client is not allowed to access the resource and authorization will not help.

- 404: Not found
    Seen this one before? :) It means that the server has not heard of the resource and has no further clues as to what the client should do about it. In other words: dead link.

- 500: Internal server error
    Something went wrong inside the server.

- 501: Not implemented
    The request method is not supported by the server.

## Using `requests`

You might need to retrieve some data from the Internet. Python has many built-in libraries that were developed over the years to do exactly that (e.g. urllib, urllib2, urllib3).

However, these libraries are very low-level and somewhat hard to use. They become especially cumbersome when you need to issue POST requests or authenticate against a web service.

Luckily, as with most tasks in Python, someone has developed a library that simplifies these tasks. Get acquainted to `requests` as soon as possible, since you will probably need it in the future.

In [5]:
import requests

Now that the requests library was imported into our namespace, we can use the functions offered by it.

In this case we'll use the appropriately named `get` function to issue a *GET* request. This is equivalent to typing a URL into your browser and hitting enter.

### GET

In [10]:
resp = requests.get("https://en.wikipedia.org/wiki/Harvard_University")
resp

<Response [200]>

Python is an Object Oriented language, and everything on it is an object. Even built-in functions such as `len` are just syntactic sugar for acting on object properties.

We will not dwell too long on OO concepts, but some of Python's idiosyncrasies will be easier to understand if we spend a few minutes on this subject.

When you evaluate an object itself, such as the `resp` object we created above, Python will automatically call the `__str__()` or `__repr__()` method of that object. The default values for these methods are usually very simple and boring. The `req` object however has a custom implementation that shows the object type (i.e. `Response`) and the HTTP status number (200 means the request was successful).

Just to confirm, we will call the `type` function on the object to make sure it agrees with the value above.

In [51]:
type(resp)

requests.models.Response

In [11]:
resp.status_code

200

![](images/hwiki.png)

Right now `resp` holds a reference to a *Request* object; but we are interested in the text associated with the web page, not the object itself.

So the next step is to assign the value of the `text` property of this `Request` object to a variable.

In [16]:
resp.text[6000:8000]

'pload.wikimedia.org"/>\n<link rel="alternate" media="only screen and (max-width: 720px)" href="//en.m.wikipedia.org/wiki/Harvard_University"/>\n<link rel="apple-touch-icon" href="/static/apple-touch/wikipedia.png"/>\n<link rel="shortcut icon" href="/static/favicon/wikipedia.ico"/>\n<link rel="search" type="application/opensearchdescription+xml" href="/w/opensearch_desc.php" title="Wikipedia (en)"/>\n<link rel="EditURI" type="application/rsd+xml" href="//en.wikipedia.org/w/api.php?action=rsd"/>\n<link rel="license" href="//creativecommons.org/licenses/by-sa/3.0/"/>\n<link rel="canonical" href="https://en.wikipedia.org/wiki/Harvard_University"/>\n<link rel="dns-prefetch" href="//login.wikimedia.org"/>\n<link rel="dns-prefetch" href="//meta.wikimedia.org" />\n</head>\n<body class="mediawiki ltr sitedir-ltr mw-hide-empty-elt ns-0 ns-subject page-Harvard_University rootpage-Harvard_University skin-vector action-view skin-vector-legacy"><div id="mw-page-base" class="noprint"></div>\n<div id

## APIs

In [36]:
#http://www.mediawiki.org/wiki/API:Main_page
import requests
#from urllib3 import quote
WIKIPEDIA='http://en.wikipedia.org/w/api.php'
querydict={'action':'query', 'format':'json', 'prop':'revisions', 'rvprop':'content', 'titles':'Capitol Attack'}
#querydict={k:quote(v) for k, v in querydict.items()}
querydict

{'action': 'query',
 'format': 'json',
 'prop': 'revisions',
 'rvprop': 'content',
 'titles': 'Capitol Attack'}

In [37]:
query_string = "?"+"&".join([k+"="+v for k,v in querydict.items()])
r = requests.get(WIKIPEDIA+query_string)

In [38]:
r.json()

{'batchcomplete': '',
 'query': {'pages': {'-1': {'ns': 0,
    'title': 'Capitol Attack',
    'missing': ''}}}}

In [47]:
APISTART="https://api.github.com/"
resp_api = requests.get(APISTART+"users/rahuldave")
resp_api

<Response [200]>

In [48]:
resp_api.text

'{"login":"rahuldave","id":43227,"node_id":"MDQ6VXNlcjQzMjI3","avatar_url":"https://avatars3.githubusercontent.com/u/43227?v=4","gravatar_id":"","url":"https://api.github.com/users/rahuldave","html_url":"https://github.com/rahuldave","followers_url":"https://api.github.com/users/rahuldave/followers","following_url":"https://api.github.com/users/rahuldave/following{/other_user}","gists_url":"https://api.github.com/users/rahuldave/gists{/gist_id}","starred_url":"https://api.github.com/users/rahuldave/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/rahuldave/subscriptions","organizations_url":"https://api.github.com/users/rahuldave/orgs","repos_url":"https://api.github.com/users/rahuldave/repos","events_url":"https://api.github.com/users/rahuldave/events{/privacy}","received_events_url":"https://api.github.com/users/rahuldave/received_events","type":"User","site_admin":false,"name":"Rahul Dave","company":"Harvard University/univ.ai","blog":"https://univ.ai","loca

In [49]:
resp_api.json()

{'login': 'rahuldave',
 'id': 43227,
 'node_id': 'MDQ6VXNlcjQzMjI3',
 'avatar_url': 'https://avatars3.githubusercontent.com/u/43227?v=4',
 'gravatar_id': '',
 'url': 'https://api.github.com/users/rahuldave',
 'html_url': 'https://github.com/rahuldave',
 'followers_url': 'https://api.github.com/users/rahuldave/followers',
 'following_url': 'https://api.github.com/users/rahuldave/following{/other_user}',
 'gists_url': 'https://api.github.com/users/rahuldave/gists{/gist_id}',
 'starred_url': 'https://api.github.com/users/rahuldave/starred{/owner}{/repo}',
 'subscriptions_url': 'https://api.github.com/users/rahuldave/subscriptions',
 'organizations_url': 'https://api.github.com/users/rahuldave/orgs',
 'repos_url': 'https://api.github.com/users/rahuldave/repos',
 'events_url': 'https://api.github.com/users/rahuldave/events{/privacy}',
 'received_events_url': 'https://api.github.com/users/rahuldave/received_events',
 'type': 'User',
 'site_admin': False,
 'name': 'Rahul Dave',
 'company': 'H

Most APIs have security restrictions:

![](images/github_personal_token.png)

You can create your own at [https://github.com/settings/tokens](https://github.com/settings/tokens). I chose here a very low access scope so you cant mess with my information..you can only see my public repositories)

In [50]:
access_token="b325143eca7a84e28b4d217a6455f09014076ecb"
headers = {'Authorization': 'token '+access_token}
the_json = requests.get(APISTART+"user", headers=headers).json()
the_json

{'login': 'rahuldave',
 'id': 43227,
 'node_id': 'MDQ6VXNlcjQzMjI3',
 'avatar_url': 'https://avatars3.githubusercontent.com/u/43227?v=4',
 'gravatar_id': '',
 'url': 'https://api.github.com/users/rahuldave',
 'html_url': 'https://github.com/rahuldave',
 'followers_url': 'https://api.github.com/users/rahuldave/followers',
 'following_url': 'https://api.github.com/users/rahuldave/following{/other_user}',
 'gists_url': 'https://api.github.com/users/rahuldave/gists{/gist_id}',
 'starred_url': 'https://api.github.com/users/rahuldave/starred{/owner}{/repo}',
 'subscriptions_url': 'https://api.github.com/users/rahuldave/subscriptions',
 'organizations_url': 'https://api.github.com/users/rahuldave/orgs',
 'repos_url': 'https://api.github.com/users/rahuldave/repos',
 'events_url': 'https://api.github.com/users/rahuldave/events{/privacy}',
 'received_events_url': 'https://api.github.com/users/rahuldave/received_events',
 'type': 'User',
 'site_admin': False,
 'name': 'Rahul Dave',
 'company': 'H

In [45]:
the_json['followers']

317