#  Web APIs

An API, or aplication programming interface, is the way programs communicate with one another. 

Web APIs are the way programs communicate with one another _over the internet_

[RESTful](https://en.wikipedia.org/wiki/Representational_state_transfer) APIs respect a series of design principles that make them simple to use.

The basic tools we are going to use are: POST and GET requests to urls we'll specify and json objects that we'll receive as response or send as payload (in a POST command, for example).

In [40]:
import requests

response = requests.get('http://elpais.com')

response.content[:1000]

b'<!DOCTYPE html>\n<html lang="es">\n<head>\n<meta charset="utf-8">\n<meta http-equiv="X-UA-Compatible" content="IE=edge"><script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQEDUVdSCxAIVVVUBggHVw=="};window.NREUM||(NREUM={}),__nr_require=function(t,e,n){function r(n){if(!e[n]){var o=e[n]={exports:{}};t[n][0].call(o.exports,function(e){var o=t[n][1][e];return r(o||e)},o,o.exports)}return e[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[function(t,e,n){function r(t){try{c.console&&console.log(t)}catch(e){}}var o,i=t("ee"),a=t(20),c={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(c.console=!0,o.indexOf("dev")!==-1&&(c.dev=!0),o.indexOf("nr_dev")!==-1&&(c.nrDev=!0))}catch(s){}c.nrDev&&i.on("internal-error",function(t){r(t.stack)}),c.dev&&i.on("fn-err",function(t,e,n){r(n.stack)}),c.dev&&(r("NR AGENT IN DEVELOPMENT MODE"),r("flags: "+a(c,function(t

[This](http://api.open-notify.org/) is an API that returns the current position of the ISS:

In [8]:
r = requests.get('http://api.open-notify.org/iss-now.json')

r.status_code

200

In [9]:
r.content

b'{"message": "success", "timestamp": 1527927314, "iss_position": {"longitude": "69.2460", "latitude": "-11.0559"}}'

In [10]:
type(r.content)

bytes

## JSON

We can convert a json-formatted string such as the one we get in the response into a Python object with the json library.

JavaScript Object Notation, JSON, is text written with JavaScript object notation.

JSON is widely used as a data exchange format. You convert to it when you need to make the data available externally. 

It's very similar to Python dict and list syntax, but with small differences. For example, True and False are written in lower case, and single quotes are not acceptable string delimiters.

Top-level JSON objects are always either dictionaries or lists, within which anything can be nested as deeply as we want.

In [15]:
import json

json.loads(r.content)

{'iss_position': {'latitude': '-11.0559', 'longitude': '69.2460'},
 'message': 'success',
 'timestamp': 1527927314}

In [16]:
type(json.loads(r.content))

dict

In [18]:
type(json.loads('[1,2,3]'))

list

In [19]:
json.loads(r.content)['iss_position']

{'latitude': '-11.0559', 'longitude': '69.2460'}

We also can go in the other direction and generate json-formatted strings from Python objects:

In [21]:
my_dict = {'Acelgas' : False, 'Bacon' : True}

json.dumps(my_dict)

'{"Acelgas": false, "Bacon": true}'

#### Exercise:
Write a function that returns the duration of the next 5 overhead passes of the ISS for a given latitude and longitude. Use http://open-notify.org/Open-Notify-API/ISS-Pass-Times/
. We are going to need to encode the parameters in the url as per the specification.

For example, for Madrid:

http://api.open-notify.org/iss-pass.json?lat=40.4&lon=-3.7

In [31]:
def next_5_durations(latitude, longitude):
    url = 'http://api.open-notify.org/iss-pass.json?lat=%.3f&lon=%.3f' % (latitude, longitude)
    
    r = requests.get(url)
    
    data = json.loads(r.content)
    
    durations = [iss_pass['duration'] for iss_pass in data['response']]
    
    return durations

next_5_durations(40.4, -3.7)

[466, 640, 593, 537, 590]

Although we managed to get the response, more complicated sets of parameters will be a complicated and error-prone thing to encode. Thankfully, the `requests` library can do that work for us.

In [36]:
madrid_coords = {'lat': 40.4, 'lon' : -3.7}

r = requests.get('http://api.open-notify.org/iss-pass.json', params=madrid_coords)

json.loads(r.content)

{'message': 'success',
 'request': {'altitude': 100,
  'datetime': 1527928098,
  'latitude': 40.4,
  'longitude': -3.7,
  'passes': 5},
 'response': [{'duration': 466, 'risetime': 1527950162},
  {'duration': 641, 'risetime': 1527955830},
  {'duration': 593, 'risetime': 1527961661},
  {'duration': 537, 'risetime': 1527967536},
  {'duration': 590, 'risetime': 1527973358}]}

Even more complicated sets of parameters are sometimes required. When that is the case, API designers often decide to require them in json format, received via a `POST` request.

For example, take a look at the [QPX api from Google](https://developers.google.com/qpx-express/v1/trips/search). In the documentation, they define the body of the request, which we will have to provide, and of the response, which they'll provide back.

In [39]:
help(requests.post)

Help on function post in module requests.api:

post(url, data=None, json=None, **kwargs)
    Sends a POST request.
    
    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response



# Web scraping

![HTML to DOM](http://www.cs.toronto.edu/~shiva/cscb07/img/dom/treeStructure.png)

![DOM TREE](http://www.openbookproject.net/tutorials/getdown/css/images/lesson4/HTMLDOMTree.png)



In [41]:
from bs4 import BeautifulSoup

In [46]:
url = 'https://aflcio.org/what-unions-do/social-economic-justice/advocacy/legislative-alerts'

r = requests.get(url)

page = r.content

page[:1000]

b'<!DOCTYPE html>\n<html lang="en" dir="ltr" xmlns:article="http://ogp.me/ns/article#" xmlns:book="http://ogp.me/ns/book#" xmlns:product="http://ogp.me/ns/product#" xmlns:profile="http://ogp.me/ns/profile#" xmlns:video="http://ogp.me/ns/video#" prefix="content: http://purl.org/rss/1.0/modules/content/  dc: http://purl.org/dc/terms/  foaf: http://xmlns.com/foaf/0.1/  og: http://ogp.me/ns#  rdfs: http://www.w3.org/2000/01/rdf-schema#  schema: http://schema.org/  sioc: http://rdfs.org/sioc/ns#  sioct: http://rdfs.org/sioc/types#  skos: http://www.w3.org/2004/02/skos/core#  xsd: http://www.w3.org/2001/XMLSchema# ">\n  <head>\n    <meta charset="utf-8" /><script type="text/javascript">window.NREUM||(NREUM={}),__nr_require=function(e,t,n){function r(n){if(!t[n]){var o=t[n]={exports:{}};e[n][0].call(o.exports,function(t){var o=e[n][1][t];return r(o||t)},o,o.exports)}return t[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[func

In [55]:
soup = BeautifulSoup(page, 'html5lib')

In [53]:
print(soup.prettify()[:1000])

<!DOCTYPE html>
<html dir="ltr" lang="en" prefix="content: http://purl.org/rss/1.0/modules/content/  dc: http://purl.org/dc/terms/  foaf: http://xmlns.com/foaf/0.1/  og: http://ogp.me/ns#  rdfs: http://www.w3.org/2000/01/rdf-schema#  schema: http://schema.org/  sioc: http://rdfs.org/sioc/ns#  sioct: http://rdfs.org/sioc/types#  skos: http://www.w3.org/2004/02/skos/core#  xsd: http://www.w3.org/2001/XMLSchema# " xmlns:article="http://ogp.me/ns/article#" xmlns:book="http://ogp.me/ns/book#" xmlns:product="http://ogp.me/ns/product#" xmlns:profile="http://ogp.me/ns/profile#" xmlns:video="http://ogp.me/ns/video#">
 <head>
  <meta charset="utf-8"/>
  <script type="text/javascript">
   window.NREUM||(NREUM={}),__nr_require=function(e,t,n){function r(n){if(!t[n]){var o=t[n]={exports:{}};e[n][0].call(o.exports,function(t){var o=e[n][1][t];return r(o||t)},o,o.exports)}return t[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[functi

We want to write a function that, when called, retrieves the legislative alerts page and returns a list of the alerts. Each alert will be represented by a dict with the items 'date' (the date of the alert in human format), 'link' (the link to the full letter), and 'title' (the title of the letter).

In [57]:
print(soup.prettify()[28800:30000])

:48:51-0400">
                   May 9, 2018
                  </time>
                 </div>
                </a>
                <div>
                </div>
               </div>
              </div>
              <div class="block block-content col-12 col-lg-4">
               <div class="content-details ">
                <a class="b-inner" href="/about/advocacy/legislative-alerts/letter-opposing-legislation-would-cut-federal-retirement-benefits">
                 <div class="b-text">
                  <h5 class="content-type">
                   Legislative Alert
                  </h5>
                  <h2 class="content-title">
                   <span>
                    Letter Opposing Legislation that Would Cut Federal Retirement Benefits
                   </span>
                  </h2>
                  <time datetime="2018-05-08T17:18:08-0400">
                   May 8, 2018
                  </time>
                 </div>
                </a>
                <div>
 

In [59]:
help(soup.find_all)

Help on method find_all in module bs4.element:

find_all(name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs) method of bs4.BeautifulSoup instance
    Extracts a list of Tag objects that match the given
    criteria.  You can specify the name of the Tag and any
    attributes you want the Tag to have.
    
    The value of a key-value pair in the 'attrs' map can be a
    string, a list of strings, a regular expression object, or a
    callable that takes a string and returns whether or not the
    string matches for some custom definition of 'matches'. The
    same is true of the tag name.



In [61]:
type(soup.find_all('div'))

bs4.element.ResultSet

In [64]:
alerts = soup.find_all('div', class_='content-details')
len(alerts)

18

In [68]:
alerts[0].find('a')

<a class="b-inner" href="/about/advocacy/legislative-alerts/letter-opposing-legislation-would-put-consumers-risk">
	  <div class="b-text">
              <h5 class="content-type">Legislative Alert</h5>
        <h2 class="content-title"><span>Letter Opposing Legislation That Would Put Consumers At Risk</span>
</h2>
              <time datetime="2018-05-22T10:37:18-0400">May 22, 2018</time>
          </div>
	</a>

In [69]:
alerts[0].find('a').get_text()

'\n\t  \n              Legislative Alert\n        Letter Opposing Legislation That Would Put Consumers At Risk\n\n              May 22, 2018\n          \n\t'

In [70]:
alerts[0].find('a')['href']

'/about/advocacy/legislative-alerts/letter-opposing-legislation-would-put-consumers-risk'

In [72]:
test_alert = alerts[0]

In [75]:
full_link = 'https://aflcio.org' + test_alert.find('a')['href']
full_link

'https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-legislation-would-put-consumers-risk'

In [78]:
test_alert.find('span').get_text()

'Letter Opposing Legislation That Would Put Consumers At Risk'

In [80]:
test_alert.find('time')['datetime']

'2018-05-22T10:37:18-0400'

In [85]:
def aflcio_alerts():
    
    url = 'https://aflcio.org/what-unions-do/social-economic-justice/advocacy/legislative-alerts'

    r = requests.get(url)
    page = r.content
    soup = BeautifulSoup(page, 'html5lib')
    alerts = soup.find_all('div', class_='content-details')

    alerts_list = []

    for alert in alerts:
        full_link = 'https://aflcio.org' + alert.find('a')['href']
        title = alert.find('span').get_text()
        date = alert.find('time')['datetime']
        result = {'link' : full_link, 'title' : title, 'date' : date}

        alerts_list.append(result)

    return alerts_list

aflcio_alerts()

[{'date': '2018-05-22T10:37:18-0400',
  'link': 'https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-legislation-would-put-consumers-risk',
  'title': 'Letter Opposing Legislation That Would Put Consumers At Risk'},
 {'date': '2018-05-21T16:54:33-0400',
  'link': 'https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-bill-would-make-it-more-difficult-americans-feed',
  'title': 'Letter Opposing Bill That Would Make It More Difficult for Americans to Feed Their Families'},
 {'date': '2018-05-21T16:48:34-0400',
  'link': 'https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-legislation-would-help-privatize-va',
  'title': 'Letter Opposing Legislation That Would Help Privatize the VA'},
 {'date': '2018-05-21T16:43:40-0400',
  'link': 'https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-michael-truncales-nomination-eastern-district',
  'title': "Letter Opposing Michael Truncale's Nomination to the Eastern District of Texas"},

In [87]:
import pandas as pd

df = pd.DataFrame(aflcio_alerts())
df.head()

Unnamed: 0,date,link,title
0,2018-05-22T10:37:18-0400,https://aflcio.org/about/advocacy/legislative-...,Letter Opposing Legislation That Would Put Con...
1,2018-05-21T16:54:33-0400,https://aflcio.org/about/advocacy/legislative-...,Letter Opposing Bill That Would Make It More D...
2,2018-05-21T16:48:34-0400,https://aflcio.org/about/advocacy/legislative-...,Letter Opposing Legislation That Would Help Pr...
3,2018-05-21T16:43:40-0400,https://aflcio.org/about/advocacy/legislative-...,Letter Opposing Michael Truncale's Nomination ...
4,2018-05-21T16:34:44-0400,https://aflcio.org/about/advocacy/legislative-...,Letter in Support of Amendment that Will Prote...


In [88]:
df['date'] = pd.to_datetime(df['date'])

In [89]:
df.head()

Unnamed: 0,date,link,title
0,2018-05-22 14:37:18,https://aflcio.org/about/advocacy/legislative-...,Letter Opposing Legislation That Would Put Con...
1,2018-05-21 20:54:33,https://aflcio.org/about/advocacy/legislative-...,Letter Opposing Bill That Would Make It More D...
2,2018-05-21 20:48:34,https://aflcio.org/about/advocacy/legislative-...,Letter Opposing Legislation That Would Help Pr...
3,2018-05-21 20:43:40,https://aflcio.org/about/advocacy/legislative-...,Letter Opposing Michael Truncale's Nomination ...
4,2018-05-21 20:34:44,https://aflcio.org/about/advocacy/legislative-...,Letter in Support of Amendment that Will Prote...


In [90]:
json.dumps(aflcio_alerts())

'[{"link": "https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-legislation-would-put-consumers-risk", "title": "Letter Opposing Legislation That Would Put Consumers At Risk", "date": "2018-05-22T10:37:18-0400"}, {"link": "https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-bill-would-make-it-more-difficult-americans-feed", "title": "Letter Opposing Bill That Would Make It More Difficult for Americans to Feed Their Families", "date": "2018-05-21T16:54:33-0400"}, {"link": "https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-legislation-would-help-privatize-va", "title": "Letter Opposing Legislation That Would Help Privatize the VA", "date": "2018-05-21T16:48:34-0400"}, {"link": "https://aflcio.org/about/advocacy/legislative-alerts/letter-opposing-michael-truncales-nomination-eastern-district", "title": "Letter Opposing Michael Truncale\'s Nomination to the Eastern District of Texas", "date": "2018-05-21T16:43:40-0400"}, {"link": "https:

# Ultra easy scraping with pandas

When the data we want is already formatted as a table, we can do it even more easily! Just use `pandas.read_html`:

In [97]:
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll',
                      header=0)

type(tables)

list

In [99]:
tables[2]

Unnamed: 0,Deaths,Date,Incident,Location
0,583,27 March 1977,Pan Am Flight 1736 and KLM Flight 4805,"Tenerife, Canary Islands, Spain"
1,520,12 August 1985,Japan Airlines Flight 123,"Ueno, Japan"
2,349,12 November 1996,Saudi Arabian Flight 763 and Kazakhstan Airlin...,"Charkhi Dadri, Haryana, India"
3,346,3 March 1974,Turkish Airlines Flight 981,"Fontaine-Chaalis, France"
4,301,19 August 1980,Saudia Flight 163,"Riyadh, Saudi Arabia"
5,275,19 February 2003,2003 Iran Ilyushin Il-76 crash,"Kerman, Iran"
6,273,25 May 1979,American Airlines Flight 191,"Des Plaines, Illinois, United States"
7,265,12 November 2001,American Airlines Flight 587,"Queens, New York, United States"
8,264,26 April 1994,China Airlines Flight 140,"Japan-KomakiKomaki, Japan"
9,261,11 July 1991,Nigeria Airways Flight 2120,"Saudi Arabia-JeddahJeddah, Saudi Arabia"


# Building web services with Flask

[Flask](http://flask.pocoo.org/docs/1.0/) is a framework for building web applications.

Building a simple Web Service is extremely easy: you just create an app, define a function that generates the result and tie it to a route, and run the app.



In [103]:
from flask import Flask
from werkzeug.serving import run_simple

app = Flask('My first web service!')

@app.route('/saludamajo')
def hello_world():
    return 'Holi!'

In [106]:
run_simple('localhost', 5000, app)

 * Running on http://localhost:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [02/Jun/2018 13:10:54] "GET /saludamajo HTTP/1.1" 200 -
127.0.0.1 - - [02/Jun/2018 13:10:55] "GET /favicon.ico HTTP/1.1" 404 -


In [125]:
app = Flask('union-alerts')

@app.route('/latest-alerts')
def aflcio_alerts():
    
    url = 'https://aflcio.org/what-unions-do/social-economic-justice/advocacy/legislative-alerts'

    r = requests.get(url)
    page = r.content
    soup = BeautifulSoup(page, 'html5lib')
    alerts = soup.find_all('div', class_='content-details')

    alerts_list = []

    for alert in alerts:
        full_link = 'https://aflcio.org' + alert.find('a')['href']
        title = alert.find('span').get_text()
        date = alert.find('time')['datetime']
        result = {'link' : full_link, 'title' : title, 'date' : date}

        alerts_list.append(result)

    return json.dumps(alerts_list)

run_simple('localhost', 5000, app)

 * Running on http://localhost:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [02/Jun/2018 13:29:00] "GET /aritmetica?n=5 HTTP/1.1" 404 -
127.0.0.1 - - [02/Jun/2018 13:29:07] "GET /latest-alerts HTTP/1.1" 200 -
127.0.0.1 - - [02/Jun/2018 13:29:21] "GET /latest-alerts HTTP/1.1" 200 -
127.0.0.1 - - [02/Jun/2018 13:29:38] "GET /latest-alerts HTTP/1.1" 200 -


Accepting request parameters is easy too:

In [123]:
from flask import request

app = Flask('My first web service!')

@app.route('/aritmetica', methods=['GET'])
def hello_world():
    n = int(request.args.get('n'))
    return '%d al cuadrado es %d, o es que no lo sabias pringao?' % (n, n ** 2)

run_simple('localhost', 5000, app)

 * Running on http://localhost:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [02/Jun/2018 13:26:14] "GET /aritmetica?n=5 HTTP/1.1" 200 -


# Annex II: exercises

#### Exercise:

Get a random fact from the [Internet Chuck Norris Database](http://www.icndb.com/api/).

#### Exercise

Write a function that uses query parameters to get a Chuck Norris fact to talk about you.

#### Exercise:

Extract the date of the worst aviation disaster from: https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll

Prerequisites: pandas, pd.read_html

#### Exercise: 

Assuming the list is exhaustive, calculate how many people died in accidental explosions per decade in the XX century. Plot it.

Data: 
https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll

Prerequisites: pandas, pd.read_html, pd.to_datetime, matplotlib or seaborn

#### Exercise

Build a small Flask app that serves the total number of deaths by accidental explosion and  a list of accidents when given a decade in the 20th century as a parameter.

#### Exercise: 

create a function that, given the two tables extracted from http://en.wikipedia.org/wiki/List_of_S%26P_500_companies and a date, returns the list of companies in the S&P 500 at that date.