# Welcome to the Brazilian Apocalypse!

We are currently on the second year of the Coronavirus pandemic. It is obvious the institutions aren't ready to face extreme conditions for a prolonged period of time, and neither are the people. While there is a large community of paranoid people in the US who are ready to survive the extinction of civilization, in Brazil this scenario is not something we consider when thinking about the future.

Brazil is a country with continental dimensions, with more than 220 million people spread over 8.5 million square kilometers. There must be some dark forgotten places where one could ride out an extinction level event. There are more than five thousand cities and we will sort through different types of data to determine the best places to enjoy the end of the world.

After ranking the cities from best to worst, we will try to optimize our escape route, choosing our starting city and defining the sequence of cities to get to our final destination. 

## 1. Definition of the problem

There are several different causes for an apocalypse, and people are getting more creative everyday. Zombies, nuclear war, large asteroids, deadly viruses, global warming, exploding volcanoes, massive earthquakes, alien invasion! Although there are huge variations on each of these scenarios, we can look for some information that will help no matter what comes our way. 

Crowded places should be avoided, food must be plentiful, fresh water must be available. We also need medication to cure and prevent diseases, and firearms for protection. It would be smart to be away from nuclear power plants, but solar energy might be useful. Average temperatures around 22 degrees are a good choice. Being close to an airport shoud be interesting, even a small one, for a quick getaway, if you can find and fly a plane. 

We should also be worried about rebuilding civilization, so we also looking at libraries and universities.

We will load data that helps to determine the potential for each city to fulfill one of this requirements, describing its source and how we create a metric to rank the cities. After processing all this information, our final data set will be a table where each row is a city and each column will show a score between 0 and 1 for a certain feature. Our last feature will be the sum of all scores, and we will rank the cities by this metric. Hopefully, we will have a clear winner for the whole country. 

We also have a list of neighboring cities for each location. So we can map a route from any place to the our sanctuary or try to find the best place at a certain distance from a starting point.

## 2. Extraction, Transformation and Loading of the Data

We are gathering here several different information for each Brazilian city. They were all gathered after a long, hard search through several government websites and other sources. We will present the source, the available info and the transformations needed to get our clean table.

All the libraries needed will be loaded here.

In [1]:
import pandas as pd


### 2.1. 2013 Brazilian Human Development Atlas

The Human Development Index (HDI) is a statistic composite index of life expectancy, education, and per capita income indicators. In 2013, the latest report was released, showing the scores for each city, state and the country as a whole, for the years 1990, 2000 and 2010. It can be accessed at <a href="http://www.atlasbrasil.org.br/">Atlas Brasil</a>.

Each row of the table represents a city for one of the given years. There are some ID columns and then all the features that are used to compute the score, the Human Development Index, for each city. From this table, we will need three columns:
1. the city ID;
2. the city population in 2010;
3. the health portion of the HDI in 2010.

The ID will be used as a key to merge all the different data into a single table. The population will be used to compute the population density, and the Health HDI will be used as an indication of how well the city is capable of handle medical emergencies.

In [10]:
#loading the file, bringing the sheet with the information for the cities
df_atlas = pd.read_excel('Atlas 2013_municipal, estadual e Brasil.xlsx', sheet_name='MUN 91-00-10')
#filtering for 2010
df_atlas = df_atlas[df_atlas['ANO']==2010]
#keeping only the needed features
df_atlas_clean = df_atlas[['Codmun6','pesotot','IDHM_L']]
#the ID will be turned into a string
df_atlas_clean['Codmun6'] = df_atlas_clean['Codmun6'].astype(str)
#renaming columns
df_atlas_clean.rename(columns = {'Codmun6':'CityID', 'pesotot':'Population',
                              'IDHM_L':'Health_Index'}, inplace = True)
df_atlas_clean = df_atlas_clean.reset_index(drop=True)
df_atlas_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_atlas_clean['Codmun6'] = df_atlas_clean['Codmun6'].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,CityID,Population,Health_Index
0,110001,24392,0.763
1,110002,90353,0.806
2,110003,6313,0.757
3,110004,78574,0.821
4,110005,17029,0.799


In [12]:
len(df_atlas_clean)

5565

### 2.2. Territorial Data from IBGE - Brazilian Institute of Geography and Statistics

There are a lot of information available at the IBGE website, and at this time we will focus on the area of each of the cities, available at <a href="https://www.ibge.gov.br/geociencias/organizacao-do-territorio/estrutura-territorial/15761-areas-dos-municipios.html">Áreas Territoriais</a>. This will allow us to compute the population density. We will also keep the name and the state of each city, because we will need to use that for data sets that don't have the cityID.

In [42]:
#loading the file, bringing the sheet with the information for the cities
df_area = pd.read_excel('AR_BR_RG_UF_RGINT_RGIM_MES_MIC_MUN_2020.xls', sheet_name='AR_BR_MUN_2020')
#keeping only the needed features
df_area_clean = df_area[['CD_GCMUN','AR_MUN_2020','NM_MUN_2020','NM_UF_SIGLA']]
#the ID will be turned into a string and removing last digit
df_area_clean['CD_GCMUN'] = df_area_clean['CD_GCMUN'].astype(str).str[:6]
#turning the name and state into a single column, inlowercase
df_area_clean['citystate'] = (df_area_clean['NM_MUN_2020']+'-'+df_area_clean['NM_UF_SIGLA']).str.lower()
#keeping only the needed features
df_area_clean = df_area_clean[['CD_GCMUN','AR_MUN_2020','citystate']]
#renaming columns
df_area_clean.rename(columns = {'CD_GCMUN':'CityID', 'AR_MUN_2020':'Area'}, 
                     inplace = True)
df_area_clean = df_area_clean.reset_index(drop=True)

df_area_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_area_clean['CD_GCMUN'] = df_area_clean['CD_GCMUN'].astype(str).str[:6]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_area_clean['citystate'] = (df_area_clean['NM_MUN_2020']+'-'+df_area_clean['NM_UF_SIGLA']).str.lower()


Unnamed: 0,CityID,Area,citystate
0,110001,7067.127,alta floresta d'oeste-ro
1,110002,4426.571,ariquemes-ro
2,110003,1314.352,cabixi-ro
3,110004,3793.0,cacoal-ro
4,110005,2783.3,cerejeiras-ro


In [22]:
len(df_area_clean)

5572

Here we see a problem. The number of cities is different in our data sets. It seems there are new cities since 2010.

### 2.3. Tabnet - DataSUS

There is a national database with information from the Ministry of Health, at <a href="http://tabnet.datasus.gov.br/cgi/deftohtm.exe?cnes/cnv/leiintbr.def">Tabnet</a>. From there, we will get the number of hospital beds available for each city, another indicator that will be taken into account for our medical needs.

In [24]:
#loading the file
df_hospital = pd.read_csv('leitoshospitalares.csv',sep=';')
#getting the city ID from ID+Name
df_hospital['Município'] = df_hospital['Município'].str[:6]
#renaming columns
df_hospital.rename(columns = {'Município':'CityID', 'Quantidade existente':'Hospital_beds'},
                              inplace = True)
df_hospital = df_hospital.reset_index(drop=True)
df_hospital.head()

Unnamed: 0,CityID,Hospital_beds
0,110001,49
1,110037,24
2,110040,16
3,110034,35
4,110002,240


In [25]:
len(df_hospital)

3549

Here we have another thing to be aware of. Not all cities have hospitals. We will have other features like this, and we will tackle this problem after loading all of them

### 2.4. National Energy Agency

From the National Energy Agency, ANEEL, we can get the data with the location and energy production of all the power plants <a href="https://www.aneel.gov.br/dados/geracao">here</a>. We will pivot this data to get the cities as rows and the kind of power plant as columns, with how much is produced as our values.

Here we will keep only four types of energy production: solar, wind, biomass and nuclear power. Solar power is easy to mantain, Biomass is a nice alternative, Wind turbines should work for a long time without maintenance, and Nuclear power should be as far away as possible, because it fill fail without the proper care. Hydreletric power plants are too complex to mantain, and thermoeletric are dependent of fuel.

In [33]:
#loading file
df_energy = pd.read_csv('EmpreendimentoOperacao.csv')
#keeping only the needed columns
df_energy = df_energy[['dscMunicipio','dscOrigemCombustivel','mdaPotenciaOutorgadakW']]
#removing power plants that don't have a location
df_energy = df_energy.dropna(axis = 0, how ='any')
#finding cityID in the name+ID
df_energy['pos'] = df_energy['dscMunicipio'].str.find('CodIbge:').astype(int)
#getting cityID
df_energy['CodMunic'] = df_energy.apply(lambda x: x['dscMunicipio'][x['pos']+9:x['pos']+16],axis=1)
#keeping only the needed columns
df_energy_clean = df_energy[['CodMunic','dscOrigemCombustivel','mdaPotenciaOutorgadakW']]
#making sure the power production is integer
df_energy_clean['mdaPotenciaOutorgadakW'] = df_energy_clean['mdaPotenciaOutorgadakW'].astype(int)
#grouping production by city and type
df_energy_final =df_energy_clean.groupby(['CodMunic','dscOrigemCombustivel'], as_index=False).agg({'mdaPotenciaOutorgadakW':sum})
#pivoting to get types as columns
df_energy_pivot = df_energy_final.pivot(index='CodMunic', columns='dscOrigemCombustivel', values='mdaPotenciaOutorgadakW')
#keeping only the needed columns
df_energy_pivot = df_energy_pivot.drop(columns=['Fóssil','Hídrica','Undi-Elétrica'])
#resetting index
df_energy_pivot.reset_index(inplace=True)

#renaming columns
df_energy_pivot.rename(columns = {'CodMunic':'CityID','Biomassa':'BiomassPower','Eólica': 'WindPower',
                                  'Solar':'SolarPower','Nuclear':'NuclearPower'},
                              inplace = True)

#filling empty values with zero
df_energy_pivot = df_energy_pivot.fillna(0)
df_energy_pivot.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_energy_clean['mdaPotenciaOutorgadakW'] = df_energy_clean['mdaPotenciaOutorgadakW'].astype(int)


dscOrigemCombustivel,CityID,BiomassPower,WindPower,NuclearPower,SolarPower
0,1100023,0.0,0.0,0.0,0.0
1,1100064,0.0,0.0,0.0,0.0
2,1100072,0.0,0.0,0.0,0.0
3,1100080,0.0,0.0,0.0,0.0
4,1100106,0.0,0.0,0.0,0.0


In [34]:
len(df_energy_pivot)

1625

Again, not all cities are present in this data set.

### 2.5. INPE - National Institute for Space Research

One of the labs that are a part of INPE is LABREN - Modeling and Study of Renewable Energy Resources Lab, that released the <a href="http://labren.ccst.inpe.br/atlas_2017.html">2017 Brazilian Solar Power Atlas</a>. As we said before, Solar Power is one of the best alternatives for our post-apocalyptic new life.

From this data set, we will get the average solar incidence for each city. One thing that will come in handy are the coordinates for the center of each city. This will be useful when we'll need the distance between the cities.

In [46]:
#loading the file
df_solar = pd.read_csv('direct_normal_means_sedes-munic.csv',sep=';')
#keeping only the needed features
df_solar_clean = df_solar[['LON','LAT','NAME','STATE','ANNUAL']]
#replacing the state name by the stateID
df_solar_clean = df_solar_clean.replace(['ACRE', 'ALAGOAS', 'AMAPÁ', 'AMAZONAS', 'BAHIA', 'CEARÁ',
       'DISTRITO FEDERAL', 'ESPÍRITO SANTO', 'GOIÁS', 'MARANHÃO',
       'MATO GROSSO', 'MATO GROSSO DO SUL', 'MINAS GERAIS', 'PARÁ',
       'PARAÍBA', 'PARANÁ', 'PERNAMBUCO', 'PIAUÍ', 'RIO DE JANEIRO',
       'RIO GRANDE DO NORTE', 'RIO GRANDE DO SUL', 'RONDÔNIA', 'RORAIMA',
       'SANTA CATARINA', 'SÃO PAULO', 'SERGIPE', 'TOCANTINS'],['AC', 'AL', 'AP', 'AM', 'BA', 'CE',
       'DF', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 
        'RO', 'RR', 'SC', 'SP', 'SE', 'TO'])

#turning the name and state into a single column, inlowercase
df_solar_clean['citystate'] = (df_solar_clean['NAME']+'-'+df_solar_clean['STATE']).str.lower()
#getting the cityID from the area dataset
df_areasolar = pd.merge(df_solar_clean, df_area_clean, on='citystate', how='inner')
#keeping only the needed features
df_solar_final = df_areasolar[['CityID','LON','LAT','ANNUAL']]



#renaming columns
df_solar_final.rename(columns = {'LON':'Longitude','LAT':'Latitude','ANNUAL': 'SolarIncidence'},
                              inplace = True)

df_solar_final.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,CityID,Longitude,Latitude,SolarIncidence
0,120010,-68.7463,-11.0109,3438
1,120025,-68.7442,-11.0293,3438
2,120005,-69.5658,-10.9394,3453
3,120070,-68.4973,-10.652,3413
4,120017,-67.6768,-10.5749,3269


In [None]:
import folium
from bs4 import BeautifulSoup 
import requests 
import os

## Altitude, Temperature and Rainfall

In [None]:
df_climate = pd.read_excel('KoppenBrazilianmunicipalities.xls', sheet_name='Data')

In [None]:
df_climate.head()

In [None]:
df_climate['mean_temp'] = df_climate[df_climate.columns[6:18]].mean(axis=1)
df_climate['mean_rain'] = df_climate[df_climate.columns[18:]].mean(axis=1)

df_climate_clean = df_climate[['Municipality', 'IBGE-Code','State', 'Altitude', 'mean_temp', 'mean_rain']]
df_climate_clean.head()

In [None]:
df_climate_clean.loc[:,'IBGE-Code'] = df_climate_clean.loc[:,'IBGE-Code'].astype(str) 

In [None]:
df_climate_clean.describe().T

## Loading maps

In [None]:
city_json = 'geojs-100-mun.json'

In [None]:
map_brazil = folium.Map(location = [-15.793889, -47.882778],
                       zoom_start=4)
map_brazil

In [None]:
folium.Choropleth(
    geo_data=city_json,
    name='temperature',
    data=df_climate_clean,
    columns=['IBGE-Code', 'mean_temp'],
    key_on='properties.id',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Altitude'
).add_to(map_brazil)

In [None]:
map_brazil

 ## Population and Health Human Development Index XXXXXX

In [None]:
df_atlas = pd.read_excel('Atlas 2013_municipal, estadual e Brasil.xlsx', sheet_name='MUN 91-00-10')
df_atlas = df_atlas[df_atlas['ANO']==2010]
df_atlas_clean = df_atlas[['Codmun7','Município','pesotot','IDHM_L']]
df_atlas_clean.loc[:,'Codmun7'] = df_atlas_clean.loc[:,'Codmun7'].astype(str)
df_atlas_clean.head()

In [None]:
map_brazil = folium.Map(location = [-15.793889, -47.882778],
                       zoom_start=4)
folium.Choropleth(
    geo_data=city_json,
    name='idhm',
    data=df_atlas_clean,
    columns=['Codmun7', 'IDHM_L'],
    key_on='properties.id',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='IDHM_L'
).add_to(map_brazil)

In [None]:
map_brazil

## Area XXXXXXXX

In [None]:
df_area = pd.read_excel('AR_BR_RG_UF_RGINT_RGIM_MES_MIC_MUN_2020.xls', sheet_name='AR_BR_MUN_2020')
df_area_clean = df_area[['CD_GCMUN','NM_MUN_2020','AR_MUN_2020']]
df_area_clean.loc[:,'CD_GCMUN'] = df_area_clean.loc[:,'CD_GCMUN'].astype(int).astype(str)
df_area_clean.head()

In [None]:
map_brazil = folium.Map(location = [-15.793889, -47.882778],
                       zoom_start=4)
folium.Choropleth(
    geo_data=city_json,
    name='area',
    data=df_area_clean,
    columns=['CD_GCMUN', 'AR_MUN_2020'],
    key_on='properties.id',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='area'
).add_to(map_brazil)
map_brazil

## Solar incidence XXXXXXXXXXX

In [None]:
df_solar = pd.read_csv('direct_normal_means_sedes-munic.csv',sep=';')
df_solar_clean = df_solar[['LON','LAT','NAME','STATE','ANNUAL']]
df_solar_clean.head()

In [None]:
df_solar_clean = df_solar_clean.replace(['ACRE', 'ALAGOAS', 'AMAPÁ', 'AMAZONAS', 'BAHIA', 'CEARÁ',
       'DISTRITO FEDERAL', 'ESPÍRITO SANTO', 'GOIÁS', 'MARANHÃO',
       'MATO GROSSO', 'MATO GROSSO DO SUL', 'MINAS GERAIS', 'PARÁ',
       'PARAÍBA', 'PARANÁ', 'PERNAMBUCO', 'PIAUÍ', 'RIO DE JANEIRO',
       'RIO GRANDE DO NORTE', 'RIO GRANDE DO SUL', 'RONDÔNIA', 'RORAIMA',
       'SANTA CATARINA', 'SÃO PAULO', 'SERGIPE', 'TOCANTINS'],['AC', 'AL', 'AP', 'AM', 'BA', 'CE',
       'DF', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 
        'RO', 'RR', 'SC', 'SP', 'SE', 'TO'])
df_solar_clean.head()

In [None]:
df_solar_clean['citystate'] = df_solar_clean['NAME']+df_solar_clean['STATE']
df_solar_clean.head()

In [None]:
df_climate_clean['citystate'] = df_climate_clean['Municipality']+df_climate_clean['State']
df_climate_clean.head()

In [None]:
df_climatesolar = pd.merge(df_solar_clean, df_climate_clean, on='citystate', how='inner')
df_climatesolar_clean = df_climatesolar[['Municipality','State','IBGE-Code','LON','LAT','Altitude','mean_temp','mean_rain','ANNUAL']]
df_climatesolar_clean.head()

## Energy sources XXXXXXXXXXXX

In [None]:
df_energy = pd.read_csv('EmpreendimentoOperacao.csv')
df_energy = df_energy[['dscMunicipio','dscOrigemCombustivel','mdaPotenciaOutorgadakW']]
df_energy = df_energy.dropna(axis = 0, how ='any')
df_energy.head()

In [None]:
df_energy['pos'] = df_energy['dscMunicipio'].str.find('CodIbge:').astype(int)
df_energy.head()

In [None]:
df_energy['CodMunic'] = df_energy.apply(lambda x: x['dscMunicipio'][x['pos']+9:x['pos']+16],axis=1)
df_energy.head()

In [None]:
df_energy_clean = df_energy[['CodMunic','dscOrigemCombustivel','mdaPotenciaOutorgadakW']]
df_energy_clean.head()

In [None]:
df_energy_clean['mdaPotenciaOutorgadakW'] = df_energy_clean['mdaPotenciaOutorgadakW'].astype(int)

In [None]:
df_energy_final =df_energy_clean.groupby(['CodMunic','dscOrigemCombustivel'], as_index=False).agg({'mdaPotenciaOutorgadakW':sum})
df_energy_final.head()

In [None]:
df_energy_pivot = df_energy_final.pivot(index='CodMunic', columns='dscOrigemCombustivel', values='mdaPotenciaOutorgadakW')
df_energy_pivot.reset_index(inplace=True)
df_energy_pivot.head()

## Hospital beds XXXXXXXXXX

In [None]:
df_hospital = pd.read_csv('leitoshospitalares.csv',sep=';')
df_hospital.head()

In [None]:
df_hospital['Município'] = df_hospital.apply(lambda x: x['Município'][:6],axis=1)
df_hospital.head()

## Neighbor cities

In [None]:
df_neighbors = pd.read_csv('adjacency_matrix.csv')
df_neighbors.head()

## Agriculture production

In [None]:
df_agro1 = pd.read_excel('tabela1612.xlsx')
df_agro2 = pd.read_excel('tabela1613.xlsx')
df_agro = pd.merge(df_agro1, df_agro2, on=['Cód.','Município'], how='inner')
df_agro = df_agro.fillna(0)
df_agro['Total prod'] = (df_agro['Produto das lavouras temporárias'].astype(int) + df_agro['Produto das lavouras permanentes'].astype(int))/100.
df_agro_clean = df_agro[['Cód.','Total prod']]
df_agro_clean.head()

## Airports

In [None]:
url = 'https://en.wikipedia.org/wiki/List_of_airports_in_Brazil'
html_data = requests.get(url).text
soup = BeautifulSoup(html_data,"html5lib") 

In [None]:
#get the list
table_contents = []
table = soup.findAll('table')
for row in table[0].findAll('tr'):
    temp=[]
    for cell in row.findAll('td'):
        temp.append(cell.text)
    table_contents.append(temp)
table_contents[:5]

In [None]:
df_airports = pd.DataFrame(table_contents,columns=['Municipio','Estado','Sigla','Sigla2','Nome','outro'])
df_airports = df_airports.drop([0]).reset_index(inplace=False)
df_airports_clean = df_airports[['Municipio','Estado']]
df_airports_clean.head()

In [None]:
df_airports_clean['Estado'] = df_airports_clean['Estado'].replace(['Acre', 'Alagoas', 'Amapá', 'Amazonas', 'Bahia', 'Ceará',
       'Federal District', 'Espírito Santo', 'Goiás', 'Maranhão', 'Mato Grosso', 'Mato Grosso do Sul', 
        'Minas Gerais', 'Pará', 'Paraíba', 'Paraná', 'Pernambuco', 'Piauí', 'Rio de Janeiro',
       'Rio Grande do Norte', 'Rio Grande do Sul', 'Rondônia', 'Roraima', 'Santa Catarina', 'São Paulo', 
        'Sergipe', 'Tocantins'],['AC', 'AL', 'AP', 'AM', 'BA', 'CE',
       'DF', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 
        'RO', 'RR', 'SC', 'SP', 'SE', 'TO'])
df_airports_clean.head()

In [None]:
df_airports_clean['citystate'] = df_airports_clean['Municipio']+df_airports_clean['Estado']
df_climateairport = pd.merge(df_airports_clean, df_climate_clean, on='citystate', how='inner')
df_climateairport.head()

In [None]:
df_airports_final = pd.DataFrame(df_climateairport[['IBGE-Code']].value_counts(),columns=['Airports'])
df_airports_final.reset_index(inplace=True)
df_airports_final.head()

## Military Bases

In [None]:
url = 'https://en.wikipedia.org/wiki/List_of_Brazilian_military_bases'
html_data = requests.get(url).text
soup = BeautifulSoup(html_data,"html5lib") 

In [None]:
#get the list
table_contents = []
tables = soup.findAll('table')
for table in tables:
    for row in table.findAll('tr'):
        temp=[]
        for cell in row.findAll('td'):
            temp.append(cell.text)
        table_contents.append(temp)
table_contents[:5]

In [None]:
df_military = pd.DataFrame(table_contents,columns=['Location', 'State', 'ICAO', 'Code', 'Basename','outro1','outro2'])
df_military = df_military.drop([0]).reset_index(inplace=False)
df_military_clean = df_military[['Location','State']]
df_military_clean.head()

In [None]:
df_military_clean['State'] = df_military_clean['State'].replace(['Acre', 'Alagoas', 'Amapá', 'Amazonas', 'Bahia', 'Ceará',
       'Federal District', 'Espírito Santo', 'Goiás', 'Maranhão', 'Mato Grosso', 'Mato Grosso do Sul', 
        'Minas Gerais', 'Pará', 'Paraíba', 'Paraná', 'Pernambuco', 'Piauí', 'Rio de Janeiro',
       'Rio Grande do Norte', 'Rio Grande do Sul', 'Rondônia', 'Roraima', 'Santa Catarina', 'São Paulo', 
        'Sergipe', 'Tocantins'],['AC', 'AL', 'AP', 'AM', 'BA', 'CE',
       'DF', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 
        'RO', 'RR', 'SC', 'SP', 'SE', 'TO'])
df_military_clean['citystate'] = df_military_clean['Location']+df_military_clean['State']
df_climatemilitary = pd.merge(df_military_clean, df_climate_clean, on='citystate', how='inner')
df_climatemilitary.head()

In [None]:
df_military_final = pd.DataFrame(df_climatemilitary[['IBGE-Code']].value_counts(),columns=['Military'])
df_military_final.reset_index(inplace=True)
df_military_final.head()

## Libraries

In [None]:
path='libraries/'
df_libraries = pd.DataFrame(columns=['MUNICÍPIO','NOME DA BIBLIOTECA','VÍNCULO','ENDEREÇO', 'BAIRRO','TELEFONE','E-MAIL','outro','estado'])
for filename in os.listdir(path):
        df_temp = pd.read_csv(path+filename,skiprows=3)
        df_temp['estado'] = filename[-6:-4]
        df_libraries = df_libraries.append(df_temp,ignore_index=True)
df_libraries.head()

In [None]:
df_libraries['MUNICÍPIO'] = df_libraries['MUNICÍPIO'].fillna(method='ffill')
df_libraries_clean = df_libraries[['MUNICÍPIO','estado']]
df_libraries_clean['citystate'] = df_libraries_clean['MUNICÍPIO']+df_libraries_clean['estado']
df_climatelibrary = pd.merge(df_libraries_clean, df_climate_clean, on='citystate', how='inner')
df_libraries_final = pd.DataFrame(df_climatelibrary[['IBGE-Code']].value_counts(),columns=['Libraries'])
df_libraries_final.reset_index(inplace=True)
df_libraries_final.head()

## Universities

In [None]:
df_universities = pd.read_csv('SUP_IES_2019.CSV',encoding='iso-8859-1' ,sep='|')
df_universities_final = pd.DataFrame(df_universities[['CO_MUNICIPIO']].value_counts(),columns=['Universities'])
df_universities_final.head()