# Introduction to Data Analysis with Python III


<img src="https://www.python.org/static/img/python-logo.png" alt="yogen" style="width: 200px; float: right;"/>
<br>
<br>
<br>
<img src="../assets/yogen-logo.png" alt="yogen" style="width: 200px; float: right;"/>

#  Web APIs

An API, or aplication programming interface, is the way programs communicate with one another. 

Web APIs are the way programs communicate with one another _over the internet_

[RESTful](https://en.wikipedia.org/wiki/Representational_state_transfer) APIs respect a series of design principles that make them simple to use.

The basic tools we are going to use are: POST and GET requests to urls we'll specify and json objects that we'll receive as response or send as payload (in a POST command, for example).

In [13]:
import requests

response = requests.get('https://elpais.com/')

response

<Response [200]>

In [16]:
response.content[:1000]

b'<!DOCTYPE html><html lang="es"><head><title>EL PA\xc3\x8dS: el peri\xc3\xb3dico global</title><meta name="lang" content="es"/><meta name="author" content="Ediciones El Pa\xc3\xads"/><meta name="robots" content="index,follow"/><meta name="description" content="Noticias de \xc3\xbaltima hora sobre la actualidad en Espa\xc3\xb1a y el mundo: pol\xc3\xadtica, econom\xc3\xada, deportes, cultura, sociedad, tecnolog\xc3\xada, gente, opini\xc3\xb3n, viajes, moda, televisi\xc3\xb3n, los blogs y las firmas de EL PA\xc3\x8dS. Adem\xc3\xa1s especiales, v\xc3\xaddeos, fotos, audios, gr\xc3\xa1ficos, entrevistas, promociones y todos los servicios de EL PA\xc3\x8dS."/><meta http-equiv="Refresh" content="900"/><meta name="organization" content="Ediciones EL PA\xc3\x8dS S.L."/><meta property="article:publisher" content="https://www.facebook.com/elpais/"/><meta property="og:title" content="EL PA\xc3\x8dS: el peri\xc3\xb3dico global"/><meta property="og:description" content="Noticias de \xc3\xbaltima ho

This is an API that returns the current position of the ISS:

In [17]:
response = requests.get('http://api.open-notify.org/iss-now.json')
response.content

b'{"message": "success", "iss_position": {"latitude": "-29.7929", "longitude": "-139.7316"}, "timestamp": 1603533255}'

In [18]:
response.text

'{"message": "success", "iss_position": {"latitude": "-29.7929", "longitude": "-139.7316"}, "timestamp": 1603533255}'

In [20]:
response.content['iss_position']

TypeError: byte indices must be integers or slices, not str

We can convert a json-formatted string such as the one we get in the response into a Python object with the json library:

In [23]:
import json 

my_data = json.loads(response.content)
my_data

{'message': 'success',
 'iss_position': {'latitude': '-29.7929', 'longitude': '-139.7316'},
 'timestamp': 1603533255}

In [25]:
my_data['iss_position']['latitude']

'-29.7929'

We also can go in the other direction and generate json-formatted strings from Python objects:

In [27]:
export_data = [{'name' : 'Daniel', 'surname' : 'Mateos'}, {'name' : 'Hermenelgildo', 'surname' : 'floriez'}]

json.dumps(export_data)

'[{"name": "Daniel", "surname": "Mateos"}, {"name": "Hermenelgildo", "surname": "floriez"}]'

#### Exercise:
Write a function that returns the duration of the next 5 overhead passes of the ISS for a given latitude and longitude. Use http://open-notify.org/Open-Notify-API/ISS-Pass-Times/
. We are going to need to encode the parameters in the url as per the specification.

For example, for Madrid:

http://api.open-notify.org/iss-pass.json?lat=40.4&lon=-3.7

In [36]:
def iss_pass(lat, lon):
    
    url = 'http://api.open-notify.org/iss-pass.json?lat=%f&lon=%f' % (lat, lon)
    
    response = requests.get(url)
    
    data = json.loads(response.content)
    
    result = [ pass_['duration'] for pass_ in data['response'] ]
    
    return result

iss_pass(40, -25)

[642, 339, 493, 653]

Although we managed to get the response, more complicated sets of parameters will be a complicated and error-prone thing to encode. Thankfully, the `requests` library can do that work for us.

In [39]:
madrid = {'lat' :  40, 'lon' : -3}

response = requests.get('http://api.open-notify.org/iss-pass.json', params=madrid )
response.content

b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1603535594, \n    "latitude": 40.0, \n    "longitude": -3.0, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 388, \n      "risetime": 1603546993\n    }, \n    {\n      "duration": 464, \n      "risetime": 1603595561\n    }, \n    {\n      "duration": 651, \n      "risetime": 1603601240\n    }, \n    {\n      "duration": 605, \n      "risetime": 1603607088\n    }\n  ]\n}\n'

# Web scraping

![HTML to DOM](http://www.cs.toronto.edu/~shiva/cscb07/img/dom/treeStructure.png)

![DOM TREE](http://www.openbookproject.net/tutorials/getdown/css/images/lesson4/HTMLDOMTree.png)



In [43]:
import requests
from bs4 import BeautifulSoup

html = requests.get('https://www.elmundotoday.com/')
soup = BeautifulSoup(html.content)

In [46]:
section_header

<ul class="td-mobile-main-menu" id="menu-menu-mobile"><li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-first menu-item-65940" id="menu-item-65940"><a href="https://www.elmundotoday.com/login/">Iniciar sesión</a></li>
<li class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-65931" id="menu-item-65931"><a href="https://www.elmundotoday.com/noticias/internacional/">Internacional</a></li>
<li class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-65932" id="menu-item-65932"><a href="https://www.elmundotoday.com/noticias/espanya/">España</a></li>
<li class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-65933" id="menu-item-65933"><a href="https://www.elmundotoday.com/noticias/sociedad/">Sociedad</a></li>
<li class="menu-item menu-item-type-taxonomy menu-item-object-category menu-item-65934" id="menu-item-65934"><a href="https://www.elmundotoday.com/noticias/tecnologia/">Ciencia y Tecnología</a>

In [47]:
section_header = soup.find('ul')

for section in section_header.find_all('li'):
    print(section.text)
    print(section.find('a')['href'])
    

Iniciar sesión
https://www.elmundotoday.com/login/
Internacional
https://www.elmundotoday.com/noticias/internacional/
España
https://www.elmundotoday.com/noticias/espanya/
Sociedad
https://www.elmundotoday.com/noticias/sociedad/
Ciencia y Tecnología
https://www.elmundotoday.com/noticias/tecnologia/
Cultura
https://www.elmundotoday.com/noticias/cultura/
Gente
https://www.elmundotoday.com/noticias/gente/
Deportes
https://www.elmundotoday.com/noticias/deportes/
Vídeos
https://www.elmundotoday.com/noticias/videos/


In [51]:
results = []

for headline in soup.find_all('h3'):
    text = headline.text
    url = headline.find('a')['href']
    
    results.append((text, url))
    
results

[('España se da cuenta ahora de que el «botellón» representa el 80% de su PIB',
  'https://www.elmundotoday.com/2020/10/espana-se-da-cuenta-ahora-de-que-el-botellon-representa-el-80-de-su-pib/'),
 ('Desalojan el Congreso por una inundación de lágrimas de facha',
  'https://www.elmundotoday.com/2020/10/desalojan-el-congreso-por-una-inundacion-de-lagrimas-de-facha/'),
 ('Pablo Casado confirma su «no» a Vox y su adhesión al socialcomunismo bolivariano del Gobierno',
  'https://www.elmundotoday.com/2020/10/pablo-casado-confirma-su-no-a-vox-y-su-adhesion-al-socialcomunismo-bolivariano-del-gobierno/'),
 ('El papa Francisco apoya las uniones civiles entre homosexuales y presenta a su pareja, Antonio Ferrara',
  'https://www.elmundotoday.com/2020/10/el-papa-francisco-apoya-las-uniones-civiles-entre-homosexuales-y-presenta-a-su-pareja-antonio-ferrara/'),
 ('███████████████████████████████████████████████████████████████████████████████████████████',
  'https://www.elmundotoday.com/2020/10/un-re

# Annex: ultra easy scraping with pandas!

When the data we want is already formatted as a table, we can do it even more easily! Just use `pandas.read_html`:

In [53]:
import pandas as pd

tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll')

tables[4]

Unnamed: 0,Deaths,Date,Incident
0,20000,30 May 1626,"Wanggongchang Explosion in Beijing, China in t..."
1,3000,18 August 1769,A lightning bolt caused the Brescia explosion ...
2,"3,000?",1 November 1948,Boiler and ammunition explosion aboard an unid...
3,"1,400–2,280",6 March 1862,Ammunition warehouse explodes and kills almost...
4,1950,6 December 1917,"Halifax Explosion in Nova Scotia, Canada[69]"
...,...,...,...
364,4,19 March 2019,"An explosion in Yilong County, Sichuan, China...."
365,4,6 May 1971,"DuPont Powder Line Explosion in Louviers, Colo..."
366,2,4 May 1988,PEPCON rocket fuel chemical plant explosion in...
367,1,19 September 1980,1980 Damascus Titan missile explosion in Van B...


# Annex II: exercises

### Exercise:

Extract the date of the worst aviation disaster from: https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll

Prerequisites: pandas, pd.read_html

### Exercise: 

Assuming the list is exhaustive, calculate how many people died in accidental explosions per decade in the XX century. Plot it.

Data: 
https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll

Prerequisites: pandas, pd.read_html, pd.to_datetime, matplotlib or seaborn

### Exercise: 

create a function that, given the two tables extracted from http://en.wikipedia.org/wiki/List_of_S%26P_500_companies and a date, returns the list of companies in the S&P 500 at that date.