# Choropleth from the Brazil's northeast
<hr>

<div style="text-align: justify">In this notebook we utilize Folium libraries to create a choropleth of the northeast of Brazil. According to Wikpedia (https://en.wikipedia.org/wiki/Choropleth_map) a choropleth map (from Greek χῶρος ("area/region") + πλῆθος ("multitude")) is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. In this notebook we will make a choropleth with the numbers of population of Brazil's northeast according to Brazil's CENSUS 2010 - https://ww2.ibge.gov.br/english/estatistica/populacao/censo2010/</div>

<strong>Group components:</strong>
    <ul>
        <li>Marco Olimpio - marco.olimpio at gmail</li>
        <li>Rebecca Betwel - bekbetwel at gmail</li>
    </ul>
    
<strong>Short explanation video (PT-BR):</strong>https://youtu.be/2JaCGJ2HU40

<h2>The begining</h2>
<hr>
Below we have the very beginning of the kernel itself. Firtly we load all necessary libraries and the data collected and after start analysing it.

In [192]:
#System libraries
import os
import sys

#Basic libraries for data analysis
import numpy as np
from numpy import random
import pandas as pd

#Choropleth necessary libraries
##GeoJson data
import json

##Necessary to create shapes in folium
from shapely.geometry import Polygon
from shapely.geometry import Point

##Choropleth itself
import folium

##Colormap
from branca.colormap import linear

## Data: Importing, arranging and putting all together
<hr>

In [2]:
# dataset name
dataset_pop_2017 = os.path.join('data', 'population_2017.csv')

# read the data to a dataframe
data2017 = pd.read_csv(dataset_pop_2017)

# eliminate spaces in name of columns
data2017.columns = [cols.replace(' ', '_') for cols in data2017.columns]

data2017.head()

Unnamed: 0,UF,COD._UF,COD._MUNIC,NOME_DO_MUNICÍPIO,POPULAÇÃO_ESTIMADA
0,RO,11.0,15.0,Alta Floresta D'Oeste,25437.0
1,RO,11.0,23.0,Ariquemes,107345.0
2,RO,11.0,31.0,Cabixi,6224.0
3,RO,11.0,49.0,Cacoal,88507.0
4,RO,11.0,56.0,Cerejeiras,17934.0


In [163]:
# Filtering data about northeast of Brazil
dataStateNames = data2017[(data2017['UF'] == 'RN') | (data2017['UF'] == 'PB')  | (data2017['UF'] == 'PE') | (data2017['UF'] == 'MA') | (data2017['UF'] == 'CE') | (data2017['UF'] == 'BA') | (data2017['UF'] == 'AL') | (data2017['UF'] == 'PI') | (data2017['UF'] == 'SE')]

# Used to diff municipalities
#dataStateNames.to_csv('nomesIBGE_CidadesOrdenado.csv')

# Sort dataset by city name
dataStateNames = dataStateNames.sort_values('NOME_DO_MUNICÍPIO')
dataStateNames

Unnamed: 0,UF,COD._UF,COD._MUNIC,NOME_DO_MUNICÍPIO,POPULAÇÃO_ESTIMADA
891,CE,23.0,101.0,Abaiara,11605.0
1828,BA,29.0,207.0,Abaré,20189.0
1827,BA,29.0,108.0,Abaíra,9199.0
1465,PE,26.0,54.0,Abreu e Lima,99364.0
1829,BA,29.0,306.0,Acajutiba,15727.0
892,CE,23.0,150.0,Acarape,16543.0
893,CE,23.0,200.0,Acaraú,62199.0
1075,RN,24.0,109.0,Acari,11333.0
667,PI,22.0,53.0,Acauã,6966.0
894,CE,23.0,309.0,Acopiara,53572.0


In [164]:
# searching the files in geojson/geojs-xx-mun.json
ma_states = os.path.join('geojson', 'geojs-21-mun.json')
pi_states = os.path.join('geojson', 'geojs-22-mun.json')
ce_states = os.path.join('geojson', 'geojs-23-mun.json')
rn_states = os.path.join('geojson', 'geojs-24-mun.json')
pb_states = os.path.join('geojson', 'geojs-25-mun.json')
pe_states = os.path.join('geojson', 'geojs-26-mun.json')
al_states = os.path.join('geojson', 'geojs-27-mun.json')
se_states = os.path.join('geojson', 'geojs-28-mun.json')
ba_states = os.path.join('geojson', 'geojs-29-mun.json')

# load the data and use 'latin-1'encoding because the accent
geo_json_data_ma = json.load(open(ma_states,encoding='latin-1'))
geo_json_data_pi = json.load(open(pi_states,encoding='latin-1'))
geo_json_data_ce = json.load(open(ce_states,encoding='latin-1'))
geo_json_data_rn = json.load(open(rn_states,encoding='latin-1'))
geo_json_data_pb = json.load(open(pb_states,encoding='latin-1'))
geo_json_data_pe = json.load(open(pe_states,encoding='latin-1'))
geo_json_data_al = json.load(open(al_states,encoding='latin-1'))
geo_json_data_se = json.load(open(se_states,encoding='latin-1'))
geo_json_data_ba = json.load(open(ba_states,encoding='latin-1'))

#Merging all files in a single json structure
geo_json_data_northeast = geo_json_data_ma
geo_json_data_northeast['features'].extend(geo_json_data_pi['features'])
geo_json_data_northeast['features'].extend(geo_json_data_ce['features'])
geo_json_data_northeast['features'].extend(geo_json_data_rn['features'])
geo_json_data_northeast['features'].extend(geo_json_data_pb['features'])
geo_json_data_northeast['features'].extend(geo_json_data_pe['features'])
geo_json_data_northeast['features'].extend(geo_json_data_al['features'])
geo_json_data_northeast['features'].extend(geo_json_data_se['features'])
geo_json_data_northeast['features'].extend(geo_json_data_ba['features'])


In [166]:
# Used to diff municipalities
i=0
for cities in geo_json_data_northeast['features'][:]:
    #print(str(i)+' '+cities['properties']['name'])
    print(cities['properties']['name'])
    i = i+1


Açailândia
Afonso Cunha
Água Doce do Maranhão
Alcântara
Aldeias Altas
Altamira do Maranhão
Alto Alegre do Maranhão
Alto Alegre do Pindaré
Alto Parnaíba
Amapá do Maranhão
Amarante do Maranhão
Anajatuba
Anapurus
Apicum-Açu
Araguanã
Araioses
Arame
Arari
Axixá
Bacabal
Bacabeira
Bacuri
Bacurituba
Balsas
Barão de Grajaú
Barra do Corda
Barreirinhas
Belágua
Bela Vista do Maranhão
Benedito Leite
Bequimão
Bernardo do Mearim
Boa Vista do Gurupi
Bom Jardim
Bom Jesus das Selvas
Bom Lugar
Brejo
Brejo de Areia
Buriti
Buriti Bravo
Buriticupu
Buritirana
Cachoeira Grande
Cajapió
Cajari
Campestre do Maranhão
Cândido Mendes
Cantanhede
Capinzal do Norte
Carolina
Carutapera
Caxias
Cedral
Central do Maranhão
Centro do Guilherme
Centro Novo do Maranhão
Chapadinha
Cidelândia
Codó
Coelho Neto
Colinas
Conceição do Lago-Açu
Coroatá
Cururupu
Davinópolis
Dom Pedro
Duque Bacelar
Esperantinópolis
Estreito
Feira Nova do Maranhão
Fernando Falcão
Formosa da Serra Negra
Fortaleza dos Nogueiras
Fortuna
Godofredo Viana
Gon

Tabuleiro do Norte
Tamboril
Tarrafas
Tauá
Tejuçuoca
Tianguá
Trairi
Tururu
Ubajara
Umari
Umirim
Uruburetama
Uruoca
Varjota
Várzea Alegre
Viçosa do Ceará
Acari
Açu
Afonso Bezerra
Água Nova
Alexandria
Almino Afonso
Alto do Rodrigues
Angicos
Antônio Martins
Apodi
Areia Branca
Arês
Augusto Severo
Baía Formosa
Baraúna
Barcelona
Bento Fernandes
Bodó
Bom Jesus
Brejinho
Caiçara do Norte
Caiçara do Rio do Vento
Caicó
Campo Redondo
Canguaretama
Caraúbas
Carnaúba dos Dantas
Carnaubais
Ceará-Mirim
Cerro Corá
Coronel Ezequiel
Coronel João Pessoa
Cruzeta
Currais Novos
Doutor Severiano
Parnamirim
Encanto
Equador
Espírito Santo
Extremoz
Felipe Guerra
Fernando Pedroza
Florânia
Francisco Dantas
Frutuoso Gomes
Galinhos
Goianinha
Governador Dix-Sept Rosado
Grossos
Guamaré
Ielmo Marinho
Ipanguaçu
Ipueira
Itajá
Itaú
Jaçanã
Jandaíra
Janduís
Januário Cicco
Japi
Jardim de Angicos
Jardim de Piranhas
Jardim do Seridó
João Câmara
João Dias
José da Penha
Jucurutu
Jundiá
Lagoa d'Anta
Lagoa de Pedras
Lagoa de Velhos


We found some misinformation about the na me of the municipalities regarding to IBGE information and the GeoJson information. Below we sumarize what we have found that there is no match:
     and 

    
     
|State | IBGE                   | GEOJSON               |Current name            | Reference                       |
|------|------------------------|-----------------------|------------------------|---------------------------------|
|PE| Belém do São Francisco | Belém de São Francisco| Belém do São Francisco | https://pt.wikipedia.org/wiki/Bel%C3%A9m_do_S%C3%A3o_Francisco |
|PB| Tacima                 | Campo de Santana      | Tacima                 | https://en.wikipedia.org/wiki/Tacima   |
|SE| Graccho Cardoso        | Gracho Cardoso        | Graccho Cardoso        | https://pt.wikipedia.org/wiki/Graccho_Cardoso |
|PE| Iguaracy               | Iguaraci              | Iguaracy               | https://pt.wikipedia.org/wiki/Iguaracy |
|CE| Itapajé                | Itapagé               | Itapajé                | https://pt.wikipedia.org/wiki/Itapajé  |
|PB| Joca Claudino          | Santarém              | Joca Claudino          | https://pt.wikipedia.org/wiki/Joca_Claudino |
|PE| Lagoa de Itaenga       | Lagoa do Itaenga      | Lagoa de Itaenga       |https://pt.wikipedia.org/wiki/Lagoa_de_Itaenga |
|PI| Nazária                | <NO INFO>             | Nazária                | https://pt.wikipedia.org/wiki/Naz%C3%A1ria | 
|PE|Quixaba                | Quixabá               | Quixaba                | https://pt.wikipedia.org/wiki/Quixaba_(Pernambuco) |
|PB|Quixaba                | Quixabá               | Quixaba                | https://pt.wikipedia.org/wiki/Quixaba_(Para%C3%ADba) |
|RN| Serra Caiada           | Presidente Juscelino  | Serra Caiada | https://pt.wikipedia.org/wiki/Serra_Caiada
|PB| São Vicente do Seridó  | Seridó                | São Vicente do Seridó  |  https://pt.wikipedia.org/wiki/S%C3%A3o_Vicente_do_Serid%C3%B3




Another references:
https://ww2.ibge.gov.br/home/estatistica/populacao/estimativa2011/tab_Municipios_TCU.pdf
https://biblioteca.ibge.gov.br/visualizacao/dtbs/pernambuco/quixaba.pdf

We did not found any geojson information about the municipalitie <strong>Nazária - PI</strong> and we decided to eliminate Nazária from the IBGE data because Nazaria is a emancipated munipalitie from Teresina, capital of Terezina, and the data about the territory is attached to Teresina.


In [168]:
#Belém de São Francisco -> Belém do São Francisco
geo_json_data_northeast['features'][1031]['properties']['description'] = 'Belém do São Francisco'
geo_json_data_northeast['features'][1031]['properties']['name'] = 'Belém do São Francisco'
print(geo_json_data_northeast['features'][1031]['properties']['name'])

Belém do São Francisco


In [169]:
#Campo de Santana -> Tacima
geo_json_data_northeast['features'][1003]['properties']['description'] = 'Tacima'
geo_json_data_northeast['features'][1003]['properties']['name'] = 'Tacima'
print(geo_json_data_northeast['features'][1003]['properties']['name'])

Tacima


In [170]:
#Gracho Cardoso -> Graccho Cardoso
geo_json_data_northeast['features'][1324]['properties']['description'] = 'Graccho Cardoso'
geo_json_data_northeast['features'][1324]['properties']['name'] = 'Graccho Cardoso'
print(geo_json_data_northeast['features'][1324]['properties']['name'])

Graccho Cardoso


In [171]:
#Iguaraci -> Iguaracy
geo_json_data_northeast['features'][1089]['properties']['description'] = 'Iguaracy'
geo_json_data_northeast['features'][1089]['properties']['name'] = 'Iguaracy'
print(geo_json_data_northeast['features'][1089]['properties']['name'])

Iguaracy


In [172]:
# Itapagé -> Itapajé
geo_json_data_northeast['features'][526]['properties']['description'] = 'Itapajé'
geo_json_data_northeast['features'][526]['properties']['name'] = 'Itapajé'
print(geo_json_data_northeast['features'][526]['properties']['name'])

Itapajé


In [173]:
# Santarém -> Joca Claudino
geo_json_data_northeast['features'][964]['properties']['description'] = 'Joca Claudino'
geo_json_data_northeast['features'][964]['properties']['name'] = 'Joca Claudino'
print(geo_json_data_northeast['features'][964]['properties']['name'])

Joca Claudino


In [174]:
# Lagoa do Itaenga -> Lagoa de Itaenga
geo_json_data_northeast['features'][1111]['properties']['description'] = 'Lagoa de Itaenga'
geo_json_data_northeast['features'][1111]['properties']['name'] = 'Lagoa de Itaenga'
print(geo_json_data_northeast['features'][1111]['properties']['name'])

Lagoa de Itaenga


In [175]:
# Quixabá -> Quixaba
geo_json_data_northeast['features'][1144]['properties']['description'] = 'Quixaba'
geo_json_data_northeast['features'][1144]['properties']['name'] = 'Quixaba'
print(geo_json_data_northeast['features'][1144]['properties']['name'])

Quixaba


In [176]:
# Quixabá -> Quixaba
geo_json_data_northeast['features'][946]['properties']['description'] = 'Quixaba'
geo_json_data_northeast['features'][946]['properties']['name'] = 'Quixaba'
print(geo_json_data_northeast['features'][946]['properties']['name'])

Quixaba


In [177]:
# Presidente Juscelino->Serra Caiada
geo_json_data_northeast['features'][736]['properties']['description'] = 'Serra Caiada'
geo_json_data_northeast['features'][736]['properties']['name'] = 'Serra Caiada'
print(geo_json_data_northeast['features'][736]['properties']['name'])

Serra Caiada


In [178]:
# Seridó->São Vicente do Seridó
geo_json_data_northeast['features'][990]['properties']['description'] = 'São Vicente do Seridó'
geo_json_data_northeast['features'][990]['properties']['name'] = 'São Vicente do Seridó'
print(geo_json_data_northeast['features'][990]['properties']['name'])

São Vicente do Seridó


In [181]:
dataStateNames[(dataStateNames['NOME_DO_MUNICÍPIO']=='Nazária')]

Unnamed: 0,UF,COD._UF,COD._MUNIC,NOME_DO_MUNICÍPIO,POPULAÇÃO_ESTIMADA
803,PI,22.0,6720.0,Nazária,8425.0


Removing Nazária from the municipalities of IBGE

In [233]:
# Removing Nazária from the municipalities of IBGE
dataStateNames = dataStateNames[dataStateNames['NOME_DO_MUNICÍPIO']!='Nazária']
len(dataStateNames)

1793

In [191]:
dataStateNames[dataStateNames['NOME_DO_MUNICÍPIO']=='Nazária']

Unnamed: 0,UF,COD._UF,COD._MUNIC,NOME_DO_MUNICÍPIO,POPULAÇÃO_ESTIMADA


In [232]:
cities_ne = []
# list all cities in the state
for city in geo_json_data_northeast['features']:
    cities_ne.append(city['properties']['description'])

len(cities_ne)

1793

In [30]:

frames = [dataRN, dataPB, dataPE, dataMA, dataCE, dataBA, dataPI, dataSE]
dataNordeste = pd.concat(frames)
print(len(dataNordeste))

#adjusting to the correct data type
dataNordeste['COD._UF'] = dataNordeste['COD._UF'].astype(int)
dataNordeste['COD._MUNIC'] = dataNordeste['COD._MUNIC'].astype(int)
dataNordeste['POPULAÇÃO_ESTIMADA'] = dataNordeste['POPULAÇÃO_ESTIMADA'].astype(int)
dataNordeste.dtypes

1692


UF                    object
COD._UF                int64
COD._MUNIC             int64
NOME_DO_MUNICÍPIO     object
POPULAÇÃO_ESTIMADA     int64
dtype: object

## Choropleth
<hr>

After all the procediments to make the population data and the GeoJson data match with the municipalities names we could now proceed to create the choropleth itself.

In [226]:
dataNordeste.head()

Unnamed: 0,UF,COD._UF,COD._MUNIC,NOME_DO_MUNICÍPIO,POPULAÇÃO_ESTIMADA
1075,RN,24,109,Acari,11333
1077,RN,24,307,Afonso Bezerra,11211
1079,RN,24,505,Alexandria,13827
1080,RN,24,604,Almino Afonso,4854
1081,RN,24,703,Alto do Rodrigues,14365


In [231]:
dataNordeste_dictionary = dataNordeste.set_index('NOME_DO_MUNICÍPIO')['POPULAÇÃO_ESTIMADA']
print(len(dataNordeste))
dataNordeste['id'] = dataNordeste['UF']+dataNordeste['NOME_DO_MUNICÍPIO']

dataNordeste_dict = dataNordeste.set_index('id')['POPULAÇÃO_ESTIMADA']
print(len(dataNordeste_dictionary))
print(len(dataNordeste_dict))

1692
1692
1692


In [195]:
colorscale = linear.YlGnBu.scale(dataNordeste['POPULAÇÃO_ESTIMADA'].min(), dataNordeste['POPULAÇÃO_ESTIMADA'].max())
colorscale

In [241]:
# Create a map object
#Centering at Brazil's northeast
m8 = folium.Map(
    location = [-10.116657, -42.542580],
    zoom_start=6,   
    tiles='cartodbpositron'
)

We could utilize a threshold scale function to differenciate the cities by color. One of most used practices is do linearly split the range of the data with a function like Numpy function

>```python
np.linspace(MIN,MAX, STEPS, TYPE).tolist()
```

Branca library also has a function to create a threshold scale however we did not made use of this functions because we did not liked to linearly split the range of population and match the colors based on this. Linearly spliting the threshold will only show the extremity, all the villages and towns and the megacities. So, we make a manual split, putting the minimum population has the lower level and the max population the upper range of the threhold. We divided the following cities in 250K, 800K, 1.5M and 2M. Making the division in that way we could see the main cities and all the other greaty majority of all cities, under 150k people, could be classified in the same manner/color.

|Threshold Scale |Min  | 2    |  3    |    4   |    5   |   MAX  |
|----------------|-----|------|-------|--------|--------|--------|
|np.linspace     |1228 |591779|1182331| 1772882| 2363434| 2953986|
|our division    |20000|100000|300000 | 1000000| 1500000| 2500000|

In [242]:
m8.add_child(folium.LatLngPopup())

# create a threshold of legend
threshold_scale = np.linspace(dataNordeste['POPULAÇÃO_ESTIMADA'].min(),
                              dataNordeste['POPULAÇÃO_ESTIMADA'].max(), 6, dtype=int).tolist()
print(threshold_scale)
#threshold_scale = [dataNordeste['POPULAÇÃO_ESTIMADA'].min(), 250000, 800000, 150000, 200000, dataNordeste['POPULAÇÃO_ESTIMADA'].max()]
threshold_scale = [20000,100000,300000,1000000,1500000,2500000]
print(threshold_scale)

[1228, 591779, 1182331, 1772882, 2363434, 2953986]
[20000, 100000, 300000, 1000000, 1500000, 2500000]


In [243]:
m8.choropleth(
    geo_data=geo_json_data_northeast,
    data=dataNordeste,
    columns=['NOME_DO_MUNICÍPIO', 'POPULAÇÃO_ESTIMADA'],
    key_on='feature.properties.name',
    fill_color='YlGnBu',
    legend_name='Population estimation (2017)',
    highlight=True,
    threshold_scale = threshold_scale,
    line_color='green',
    line_weight=0.2,
    line_opacity=0.6
)

m8.save('outputFolium.html')