# Introduction to Data Analysis with Python III


<img src="https://www.python.org/static/img/python-logo.png" alt="yogen" style="width: 200px; float: right;"/>
<br>
<br>
<br>
<img src="../assets/yogen-logo.png" alt="yogen" style="width: 200px; float: right;"/>

#  Web APIs

An API, or aplication programming interface, is the way programs communicate with one another. 

Web APIs are the way programs communicate with one another _over the internet_

[RESTful](https://en.wikipedia.org/wiki/Representational_state_transfer) APIs respect a series of design principles that make them simple to use.

The basic tools we are going to use are: POST and GET requests to urls we'll specify and json objects that we'll receive as response or send as payload (in a POST command, for example).

This is an API that returns the current position of the ISS:

We can convert a json-formatted string such as the one we get in the response into a Python object with the json library:

We also can go in the other direction and generate json-formatted strings from Python objects:

#### Exercise:

https://agify.io/ hosts an API that estimates the age of a person based on their name.  

Write a function that wraps the API.

Although we managed to get the response, more complicated sets of parameters will be a complicated and error-prone thing to encode. Thankfully, the `requests` library can do that work for us.

Even more complicated sets of parameters are sometimes required. When that is the case, API designers often decide to require them in json format, received via a `POST` request.

For example, take a look at the [Google Maps API](https://developers.google.com/maps/documentation). In the documentation, they define the body of the request, which we will have to provide, and of the response, which they'll provide back.

## Things that you can do with web APIs

Basically anything, but some examples are:

- Query addresses to get coordinates, or ask what is in some coordinates ([Google Maps](https://developers.google.com/maps/documentation/geocoding/overview))
- Access your files in cloud services: eg Dropbox, Google Drive, etc.
- Query user details or song information, modify your playlists... ([Spotify](https://developer.spotify.com/documentation/web-api/)).
- Make or receive payments ([PayPal](https://developer.paypal.com/api/rest/), [Square](https://squareup.com/us/en)...)
- Order pizza: https://apilist.fun/api/order-pizza-api
- Make bookings (https://connect.booking.com/user_guide/site/en-US/res/).

# Web scraping

![HTML to DOM](http://www.cs.toronto.edu/~shiva/cscb07/img/dom/treeStructure.png)

![DOM TREE](http://www.openbookproject.net/tutorials/getdown/css/images/lesson4/HTMLDOMTree.png)



## Basic web scraping

#### Exercise

Get the titles and urls of all articles in the front page of `elpais.com` into a csv.

## Blocking 

# Annex: ultra easy scraping with pandas!

When the data we want is already formatted as a table, we can do it even more easily! Just use `pandas.read_html`:

# Annex II: exercises

### Exercise:

Extract the date of the worst aviation disaster from: https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll

Prerequisites: pandas, pd.read_html

### Exercise: 

Assuming the list is exhaustive, calculate how many people died in accidental explosions per decade in the XX century. Plot it.

Data: 
https://en.wikipedia.org/wiki/List_of_accidents_and_disasters_by_death_toll

Prerequisites: pandas, pd.read_html, pd.to_datetime, matplotlib or seaborn

### Exercise: 

create a function that, given the two tables extracted from http://en.wikipedia.org/wiki/List_of_S%26P_500_companies and a date, returns the list of companies in the S&P 500 at that date.

# References / Further reading

https://realpython.com/api-integration-in-python/

https://j2logo.com/flask/tutorial-como-crear-api-rest-python-con-flask/

https://www.scrapingbee.com/blog/selenium-python/

https://www.scrapingbee.com/blog/practical-xpath-for-web-scraping/