## 1 Working with APIs

### 1.1 What is an API?

We've worked with data sets pretty extensively so far. While they're popular resources, there are many cases where it's impractical to use one.

Here are a few situations where data sets don't work well:

- The data change frequently. It doesn't really make sense to regenerate a data set of stock prices, for example, and download it every minute. This approach would require a lot of bandwidth, and be very slow.
- You only want a small piece of a much larger data set. [Reddit comments](https://www.reddit.com/) are one example. What if you want to pull just your own comments from reddit? It doesn't make much sense to download the entire reddit database, then filter it for a few items.
- It involves repeated computation. For example, Spotify has an API that can tell you the genre of a piece of music. You could theoretically create your own classifier and use it to categorize music, but you'll never have as much data as Spotify does.

In cases like these, an **application program interface (API)** is the right solution. An API is a set of methods and tools that allows different applications to interact with each other. Programmers use APIs to query and retrieve data dynamically (which they can then integrate with their own apps). A client can retrieve information quickly and effectively through an API.

Reddit, Spotify, Twitter, Facebook, and many other companies provide free APIs that enable developers to access the information they store on their servers; others charge for access to their APIs.

In this section, we'll query a basic API to retrieve data about the [IBGE (Insituto Brasileiro de Geografia e Estatística in portuguese)  API](https://servicodados.ibge.gov.br/api/docs). Using an API will save us time and effort, instead of doing all the computation ourselves.

### 1.2 Introduction to API Requests

Organizations host their APIs on **Web servers**. When you type www.google.com in your browser's address bar, your computer is actually asking the www.google.com server for a Web page, which it then returns to your browser.

APIs work much the same way, except instead of your Web browser asking for a Web page, your program asks for data. The API usually returns this data in [JavaScript Object Notation](http://json.org/) (JSON) format. We'll discuss JSON more later on in this section.

We make an API request to the Web server we want to get data from. The server then replies and sends it to us. In Python, we use the [requests library](http://www.python-requests.org/en/latest/) to do this.

In [0]:
# install the latest version of requests
!pip install requests==v2.21.0



In [0]:
import requests
requests.__version__

'2.21.0'

### 1.3 Types of Requests

There are many different types of requests. The most common is a **GET request**, which we use to retrieve data. We'll explore the other types in later missions.

We can use a simple GET request to retrieve information from the [IBGE API](https://servicodados.ibge.gov.br/api/docs).

IBGE has several **API endpoints**. An endpoint is a server route for retrieving specific data from an API. For example, the **/comments** endpoint on the reddit API might retrieve information about comments, while the **/users** endpoint might retrieve data about users.

The first endpoint we'll look at on IBGE is the [localities endpoint](https://servicodados.ibge.gov.br/api/docs/localidades?versao=1). This endpoint gets the information about mesoregions, microregions, municipalites, regions and states of Brazil. A data set wouldn't be a great fit for this task because the information changes often, and involves some calculation on the server.


Lets practice!!! We've imported requests for you already.


**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

- The server will send a **status code** indicating the success or failure of your request. You can get the status code of the response from **response.status_code**.
- Assign the status code to the variable **status_code**
- Retrieve the content of the response with **response.content.** Assign the content to the **variable content.**

In [0]:
# configure a generical header
headers = {
    'Content-Type': 'application/json;charset=UTF-8',
    'User-Agent': 'google-colab',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
    'Connection': 'keep-alive',
}

response = requests.get("https://servicodados.ibge.gov.br/api/v1/localidades/estados",
                       headers=headers)
# put your code here

In [0]:
response.status_code

200

In [0]:
response.content

b'[{"id":11,"sigla":"RO","nome":"Rond\xc3\xb4nia","regiao":{"id":1,"sigla":"N","nome":"Norte"}},{"id":12,"sigla":"AC","nome":"Acre","regiao":{"id":1,"sigla":"N","nome":"Norte"}},{"id":13,"sigla":"AM","nome":"Amazonas","regiao":{"id":1,"sigla":"N","nome":"Norte"}},{"id":14,"sigla":"RR","nome":"Roraima","regiao":{"id":1,"sigla":"N","nome":"Norte"}},{"id":15,"sigla":"PA","nome":"Par\xc3\xa1","regiao":{"id":1,"sigla":"N","nome":"Norte"}},{"id":16,"sigla":"AP","nome":"Amap\xc3\xa1","regiao":{"id":1,"sigla":"N","nome":"Norte"}},{"id":17,"sigla":"TO","nome":"Tocantins","regiao":{"id":1,"sigla":"N","nome":"Norte"}},{"id":21,"sigla":"MA","nome":"Maranh\xc3\xa3o","regiao":{"id":2,"sigla":"NE","nome":"Nordeste"}},{"id":22,"sigla":"PI","nome":"Piau\xc3\xad","regiao":{"id":2,"sigla":"NE","nome":"Nordeste"}},{"id":23,"sigla":"CE","nome":"Cear\xc3\xa1","regiao":{"id":2,"sigla":"NE","nome":"Nordeste"}},{"id":24,"sigla":"RN","nome":"Rio Grande do Norte","regiao":{"id":2,"sigla":"NE","nome":"Nordeste"}}

### 1.4 Understanding the status code

The request we just made returned a status code of 200. Web servers return status codes every time they receive an API request. A status code provides information about what happened with a request. Here are some codes that are relevant to GET requests:

- **200** - Everything went okay, and the server returned a result (if any).
- **301** - The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint's name has changed.
- **401** - The server thinks you're not authenticated. This happens when you don't send the right credentials to access an API (we'll talk about this in a later mission).
- **400** - The server thinks you made a bad request. This can happen when you don't send the information the API requires to process your request, among other things.
- **403** - The resource you're trying to access is forbidden; you don't have the right permissions to see it.
- **404** - The server didn't find the resource you tried to access.


### 1.5 JSON Format

You may have noticed that the **content** of the API response we received earlier was a **string**. Strings are the way we pass information back and forth through APIs, but it's hard to get the information we want out of them. How do we know how to decode the string we receive and work with it in Python?

Luckily, there's a format we call **JSON**. We mentioned it earlier in the lesson. This format encodes data structures like **lists** and **dictionaries** as strings to ensure that machines can read them easily. JSON is the primary format for sending and receiving data through APIs.

Python offers great support for JSON through its [json library](https://docs.python.org/3/library/json.html). We can convert lists and dictionaries to JSON, and vice versa. Our IBGE data, for example, is a dictionary encoded as a string in JSON format.

<img width="400" src="https://drive.google.com/uc?export=view&id=1f7zUfNDunBJUw5RFEi1-6sscLNjm3cGb">



The JSON library has two main methods:

- **dumps** -- Takes in a Python object, and converts it to a string
- **loads** -- Takes a JSON string, and converts it to a Python object



In [0]:
# Import the JSON library.
import json

# Make a list of cities
cities = ["Currais Novos", "Caico", "Acari"]
print(type(cities))

# Use json.dumps to convert cities variable to a string.
cities_string = json.dumps(cities)
print(type(cities_string))

# Convert cities_string back to a list.
print(type(json.loads(cities_string)))

<class 'list'>
<class 'str'>
<class 'list'>




**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">


- Use the JSON function **loads** to convert **cities_population_string** to a Python object.
- Assign the resulting Python object to **cities_population_2**.


In [0]:
# Make a dictionary
cities_population = {
    "Currais Novos": 44000,
    "Caico": 67000,
    "Acari": 11000
}

# We can also dump a dictionary to a string
cities_population_string = json.dumps(cities_population)

# put your code here

In [0]:
cities_population_string

'{"Currais Novos": 44000, "Caico": 67000, "Acari": 11000}'

In [0]:
json.loads(cities_population_string)

{'Acari': 11000, 'Caico': 67000, 'Currais Novos': 44000}

### 1.6 Getting JSON From a Request

We can get the content of a response as a Python object by using the **.json()** method on the response.

In [0]:
response = requests.get("https://servicodados.ibge.gov.br/api/v1/localidades/estados",
                       headers=headers)
json_data = response.json()
print(type(json_data))
json_data

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

- Get the information about the Rio Grande do Norte state
  - identification (id)
  - name of region
  - abbrevation
- Read the [IBGE endpoint 'municípios'](https://servicodados.ibge.gov.br/api/docs/localidades?versao=1#api-Municipios-estadosUFMunicipiosGet) and retrieve all municipalities names of RN to a list named **cities_rn** and their respectives identification to other list named **cities_id_rn**.
- IBGE has other API in order to collect raw data from the [Sidra DataBase](http://api.sidra.ibge.gov.br/). You can find help in [documentation](http://api.sidra.ibge.gov.br/home/ajuda). Using Sidra API is possible to extract the estimation of population for a specific year. For the sake of understanding, to retrieve the **estimated population** of Currais Novos in 2018 (note that Currais Novos id is **2403103**) you must use:

```python
response2 = requests.get("http://api.sidra.ibge.gov.br/values/t/6579/p/2018/v/9324/N6/2403103",headers=headers)
int(response2.json()[1]["V"])
44664
```
- From **cities_id_rn** and Sidra API retrieve the estimated population for all minicipalities of RN. Store the result to variable **population_rn**.
- Use the [Dataframe.from_dict](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html) function to create a daframe from the variables **cities_id_rn** ,**cities_rn**, **population_rn**. Consider to use the strings **Id**, **City** and **Population** as label of columns. Store the result to the variable **data_rn**.

In [0]:
# put your code here

In [0]:
response = requests.get("https://servicodados.ibge.gov.br/api/v1/localidades/estados/24/municipios",
                       headers=headers)
cities = response.json()
cities_rn = [city["nome"] for city in cities]
cities_id_rn = [city["id"] for city in cities]

In [0]:
cities_id_rn

In [0]:
cities

In [0]:
population_rn = [] 

for city in cities_id_rn:
  endnode = "http://api.sidra.ibge.gov.br/values/t/6579/p/2018/v/9324/N6/" + str(city)
  response = requests.get(endnode,headers=headers)
  population_rn.append(int(response.json()[1]["V"]))


In [0]:
import pandas as pd

data_rn = pd.DataFrame.from_dict({"Id": cities_id_rn, 
                        "City": cities_rn, 
                        "Population": population_rn})
data_rn.head()

Unnamed: 0,Id,City,Population
0,2400109,Acari,11152
1,2400208,Açu,57644
2,2400307,Afonso Bezerra,11041
3,2400406,Água Nova,3230
4,2400505,Alexandria,13602


**Guided Exercise**

<img width="150" src="https://drive.google.com/uc?export=view&id=1G4E_Qy3afI5Q_8r2JlGXAbbaFpGBsi4F">

In [0]:
# households by neighborhood in Natal (2010)
# from Sidra's Table 185 (http://api.sidra.ibge.gov.br/desctabapi.aspx?c=185)
endnode = "http://api.sidra.ibge.gov.br/values/t/185/p/2010/v/allxp/N102/in%20n6%202408102"

response = requests.get(endnode,headers=headers)

# for curiosity, take a look in this variable
raw_data = response.json()

neigh_id = []
neigh_name = []
neigh_house = []

# first position is only the header
for data in raw_data[1:]:
  neigh_id.append(int(data["D3C"]))
  neigh_house.append(int(data["V"]))
  neigh_name.append(data["D3N"].split(" -")[0])
  
neigh_df = pd.DataFrame.from_dict({"neighborhood_id": neigh_id,
                                  "name":neigh_name,
                                  "households": neigh_house})
neigh_df.head()

Unnamed: 0,neighborhood_id,name,households
0,2408102001,Santos Reis,1531
1,2408102002,Praia do Meio,1620
2,2408102003,Rocas,3067
3,2408102004,Ribeira,764
4,2408102005,Petrópolis,1733


## 2 Working with GeoJSON files

The **GeoJSON** data format is a data format based on the open data standard JSON. As previously described, JSON is the most common data format used for asynchronous browser/server communication and uses text to transmit data objects consisting of attribute–value pairs. The format is easy to read and offers web developers an easy way to extend existing APIs.

The GeoJSON format was designed for the representation of **simple geographical features**, along with their non-spatial attributes and now supports a number of geometry types such as **points**, **line strings**, **polygons** and **multi-part** collections of these types. 

<img width="450" src="https://drive.google.com/uc?export=view&id=1BYLE4At7b1XEQrmPX7dofdoXYzrzwwX9">



GeoJSON was created and maintained by an [internet working group of developers](http://geojson.org/). Since the first GeoJSON format specification in 2008, the adoption of GeoJSON in spatial databases, web APIs, and open data platforms has grown significantly, resulting in a need for standardization. This led to the creation of a Geographic JSON working group which released a [RFC document](https://tools.ietf.org/html/rfc7946) on GeoJSON in August 2016. GeoJSON uses either **.json** or **.geojson** as filename extension.




### 2.1 Create your own GeoJSON



A great resource for creating your own GeoJSON data is http://geojson.io, which enables you to draw features on a map, and optionally, add non-spatial attributes to these features in a code editor and save the results in a variety of data formats, such as GeoJSON. 

GeoJSON is supported differently by numerous mapping APIs (including [IBGE](https://servicodados.ibge.gov.br/api/docs/malhas?versao=2)), GIS (Geographic Information System) software packages and companies such as Mapbox, Carto and Safe Software and [Folium](https://python-visualization.github.io/folium/quickstart.html#GeoJSON/TopoJSON-Overlays). 


The [IBGE API](https://servicodados.ibge.gov.br/api/docs/malhas?versao=2) implements different endpoints in order to retrieve GeoJSON information about Brazil and its municipalities. You can check some examples as follows:

- https://servicodados.ibge.gov.br/api/v2/malhas/24/?resolucao=5
- https://servicodados.ibge.gov.br/api/v2/malhas/24/?formato=application/vnd.geo+json&resolucao=5
- https://servicodados.ibge.gov.br/api/v2/malhas/2?resolucao=5&f?formato=image/svg+xml
- https://servicodados.ibge.gov.br/api/v2/malhas/?resolucao=2
- https://servicodados.ibge.gov.br/api/v2/malhas/24/?resolucao=3


We can use the same procedure described in Section 1 to retrieve GeoJSON data from IBGE API.

In [0]:
response = requests.get("https://servicodados.ibge.gov.br/api/v2/malhas/24/"+
                        "?formato=application/vnd.geo+json&resolucao=5",
                       headers=headers)
data_json = response.json()

In [0]:
data_json

### 2.2 Using GeoJSON files in Folium

Folium is a powerful data visualisation library in Python that was built primarily to help people visualize geospatial data. With Folium, one can create a map of any location in the world as long as its latitude and longitude values are known. [Folium has support to GeoJSON files](https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/GeoJSONWithoutTitles.ipynb). 

In [0]:
# Install the latest release
!pip install folium==0.8.2

Collecting folium==0.8.2
[?25l  Downloading https://files.pythonhosted.org/packages/47/28/b3199bf87100e389c1dff88a44a38936d27e5e99eece870b5308186217c8/folium-0.8.2-py2.py3-none-any.whl (87kB)
[K    100% |████████████████████████████████| 92kB 3.4MB/s 
Collecting branca>=0.3.0 (from folium==0.8.2)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
[31mdatascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.2 which is incompatible.[0m
Installing collected packages: branca, folium
  Found existing installation: folium 0.2.1
    Uninstalling folium-0.2.1:
      Successfully uninstalled folium-0.2.1
Successfully installed branca-0.3.1 folium-0.8.2


In [0]:
# check the version installed
import folium
folium.__version__

'0.8.2'

It is very easy to add a GeoJSON file to a Folium map. Follow the example below:

In [0]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=7,
    tiles='Stamen Terrain'
)

# Configure geojson layer
folium.GeoJson(data_json).add_to(m)

m

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">


- Using Folium, IBGE API and GeoJSON files:
  - create a map of Brazil considering all municipalities.
  - create a map of Rio Grande do Norte containing only its mesoregions.

In [0]:
# put your code here

### 2.3 Importing GeoJSON files from overpass-turbo 



Other interesting tool to retrieve GeoJSON files is the http://overpass-turbo.eu/. Overpass implements a query API in order to generate objects in the map. For the sake of understanding,  the follow query generates the shapes of neighborhoods  in Natal-RN.

Query to [Natal neighborhoods](http://wiki.openstreetmap.org/wiki/Natal#Bairros):
```python
[out:json][timeout:25];
{{geocodeArea:Natal RN Brasil}}->.searchArea;
(
  relation["admin_level"="10"](area.searchArea);
);
out body;
>;
out skel qt;
```

<img width="800" src="https://drive.google.com/uc?export=view&id=1-NJ9JT6wN0jgAne1giOHKMsebEWbGC_P">

You can edit GeoJSON file using http://geojson.io. Note that in overpass some icons/circle points were also generated. It is necessary to delete them. Paste the output of overpass in geojson.io and right after export the file to **natal.geojson**.

<img width="800" src="https://drive.google.com/uc?export=view&id=16DTptKW5ft9Owquns9T0PcFGz6w9PcaB">


In [0]:
# load the GeoJSON data and use 'UTF-8'encoding
geo_json_natal = json.load(open('natal.geojson'))

neighborhood = []
# list all neighborhoods
for neigh in geo_json_natal['features']:
    neighborhood.append(neigh['properties']['name'])
    
# print neighborhood names
neighborhood

In [0]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=11,
    tiles='Stamen Terrain'
)

# Configure geojson layer
folium.GeoJson(geo_json_natal).add_to(m)
m

**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">


- Using Folium, overpass, geojson.io and GeoJSON file:
  - create a map highlighting all neighborhood zones of Natal. Tip: in overpass change the admin_level variable to 9.

In [0]:
# put your code here

## 3 Drawing choropleth maps

Choropleth maps display divided geographical areas or regions that are coloured, shaded or patterned in relation to a data variable. This provides a way to visualise values over a geographical area, which can show variation or patterns across the displayed location.

<img width="600" src="https://drive.google.com/uc?export=view&id=1re5akMUpMH8ju0lPEBYTduyMOpGxTePt">


The data variable uses colour progression to represent itself in each region of the map. Typically, this can be a blending from one colour to another, a single hue progression, transparent to opaque, light to dark or an entire colour spectrum.


A choropleth map can be created with Folium and Pandas. Pull data into a dataframe (gives the values of each zone), bind to a feature of the GeoJSON (it gives the boundaries of every zone that you want to represent), map it. Folium allows you to specify any of the color brewer sequential color groups, and also allows you to specify the quantize scale range. Please see examples in [Folium quickstart tutorial](https://python-visualization.github.io/folium/quickstart.html#Choropleth-maps) and [advanced choropleth maps](https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/plugin-Search.ipynb).

### 3.1 Prepare the data

In [0]:
# some neighborhoods names are different in IBGE API and GeoJSON file
neigh_df.loc[neigh_df.name == "Pitimbú",'name'] = "Pitimbu"
neigh_df.loc[neigh_df.name == "Mãe Luíza",'name'] = "Mãe Luiza"
neigh_df.loc[neigh_df.name == "Filipe Camarão",'name'] = "Felipe Camarão"
neigh_df.loc[neigh_df.name == "Guarapés",'name'] = "Guarapes"

neigh_df.set_index('name',inplace=True)
neigh_df.index.name = None
neigh_df["name"] = neigh_df.index
neigh_df.head()

Unnamed: 0,households,neighborhood_id,name
Santos Reis,1531,2408102001,Santos Reis
Praia do Meio,1620,2408102002,Praia do Meio
Rocas,3067,2408102003,Rocas
Ribeira,764,2408102004,Ribeira
Petrópolis,1733,2408102005,Petrópolis


### 3.2 Create a colormap bar

In [0]:
from branca.colormap import linear

# colormap yellow and green (YlGn)
colormap = linear.YlGn_03.scale(
    neigh_df.households.min(),
    neigh_df.households.max())

colormap.caption="#Households in Natal-RN"

print(colormap(5000.0))

colormap


#d8f0a7


### 3.3 Create a choropleth map using folium.GeoJson()

In [0]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=11,
    tiles='Stamen Terrain'
)

# Insert additional information ('households') into GeoJSON file
for neigh in geo_json_natal['features']:
    name_aux = neigh['properties']['name']
    neigh['properties']['households'] = str(neigh_df.loc[name_aux,"households"])
    
# Create a Choropleth using folium.GeoJson()
folium.GeoJson(geo_json_natal,
               name='Households',
               style_function=lambda x: {'fillColor': colormap(neigh_df.loc[x['properties']['name'],
                                                                           "households"]),
                                         'color': 'black','weight':2, 'fillOpacity':0.8},
               tooltip=folium.GeoJsonTooltip(fields=['name',"households"], 
                                            aliases=['Name:',"Households:"], 
                                            localize=True)
              ).add_to(m)

# Add a LayerControl.
folium.LayerControl().add_to(m)

# And the Color Map legend.
colormap.add_to(m)

m

In [0]:
with open('data.geojson', 'w') as outfile:  
    json.dump(geo_json_natal, outfile)

from google.colab import files
files.download('data.geojson') 

### 3.4 Create a choropleth map using folium.Choropleth()

In [0]:
import numpy as np

# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=11,
    tiles='Stamen Terrain', width='85%',height='85%'
)

# create a threshold of legend
bins = np.linspace(neigh_df.households.min(),
                   neigh_df.households.max(),
                   6).tolist()


folium.Choropleth(
    geo_data=geo_json_natal,
    data=neigh_df,
    name= "neighborhoods",
    columns=['name', 'households'],
    key_on='feature.properties.name',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.5,
    legend_name='#households in Natal-RN',
    bins=bins
).add_to(m)


folium.LayerControl().add_to(m)

m


**Exercise**

<img width="100" src="https://drive.google.com/uc?export=view&id=1E8tR7B9YYUXsU_rddJAyq0FrM0MSelxZ">

- Using IBGE API, Folium and GeoJSON files:
  - Create a choropleth map using all municipalities of Rio Grande do Norte State and **population estimation** as an evaluation metric.

In [0]:
# put your code here