<a href="https://colab.research.google.com/github/lucasestrela/Dissertacao/blob/main/Construindo_DataFrame.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1) Importando pacotes

In [8]:
import numpy as np
import pandas as pd
from google.colab import files

In [3]:
#############################################################################
######################## --- Lendo Base de Dados --- ########################
#############################################################################

cbi = pd.read_excel('https://github.com/lucasestrela/Dissertacao/blob/main/Dados/CBI_data_2019.xlsx?raw=true',
                    sheet_name = 'Sheet1'
                    )



fr = pd.read_stata('https://github.com/lucasestrela/Dissertacao/blob/main/Dados/fiscal_rule_database.dta?raw=true'
                    )



ipd = pd.read_excel('https://github.com/lucasestrela/Dissertacao/blob/main/Dados/IPD_2016.xlsx?raw=true',
                    sheet_name = 'Variables',
                    header = 0
                    )


pwt = pd.read_excel('https://github.com/lucasestrela/Dissertacao/blob/main/Dados/pwt100.xlsx?raw=true',
                    sheet_name = 'Data'
                    )

# 2) Construindo um dicionário

Nesta seção, irei construir um dicionário com **nomes dos países** (que por vezes estão escritos de diferentes maneiras) e **código dos países**. Assim, conseguirei manter um padrão (usando o código do país) entre as diferentes bases de dados

In [4]:
####################################################################
######################## --- Dicionário --- ########################
####################################################################

# Dicionario1: Extraindo os codigos dos países em PWT
countries_code = pwt[['country', 'countrycode']].drop_duplicates('country', keep = 'first')



# Dicionario2: Extraindo os codigos dos países em IPD
countries_code2 = ipd[['Country', 'Code']].drop_duplicates('Country', keep = 'first')
countries_code2.columns = ['country', 'countrycode']



# Empilhando ambos os dicionarios
countries_code3 = pd.concat([countries_code, countries_code2], ignore_index=True)


####################################################################


# Checando países em que o merge nao deu certo, isto é, existe o
# país em cbi2 (onde tem mais países) e não existe código corres
# pondente em fr2
# paises_sem_codigo = cbi2[cbi2['countrycode'].isnull()]['country']


paises_faltantes = np.array([
                            ['Antigua & Barbuda', 'ATG'],
                            ['Bosnia-Herzegovina', 'BIH'],
                            ['Cape Verde', 'CPV'],
                            ['Congo, Democratic Republic of / Za', 'COG'],
                            ['Congo, Republic of', 'COG'],
                            ['Ethiopia (incl. Eritrea)', 'ETH'],
                            ['Iran', 'IRN'],
                            ['Ivory Coast', 'CIV'],
                            ['Korea, Republic of', 'KOR'],
                            ['Laos', 'LAO'],
                            ['Myanmar (Burma)', 'MMR'],
                            ['Papua New Guinea', 'PNG'],
                            ['Samoa', 'WSM'],
                            ['San Marino', 'SMR'],
                            ['Serbia and Montenegro','SRB'], #2003, 2004, 2005 (data from CBI)
                            ['Solomon Islands', 'SLB'],
                            ['St. Kitts and Nevis', 'KNA'],
                            ['Swaziland', 'SWZ'], # Mudou de nome para Eswatini (from CBI)
                            ['Syria', 'SYR'],
                            ['Timor-Leste', 'TLS'],
                            ['Tonga', 'TON'],
                            ['United States of America', 'USA'],
                            ['Vanuatu', 'VUT'],
                            ['Venezuela', 'VEN'],
                            ['Yemen, North', 'YEM'],
                            ['Yemen, North/Yemen Arab Rep.', 'YEM'],
                            ['Yugoslavia (Serbia-Montenegro)','SRB'], #92-2002 (serbia montenegro)
                            ['Yugoslavia (Socialist Rep)', 'SRB'], # 70-91 (socialist republic)
                            ['Zaire', 'COG'] # Congo entre 71-97
                            ])

# PWT: 1950-2019 é tudo SERBIA SRB 


# Transformando em dataframe do pandas
paises_faltantes = pd.DataFrame(paises_faltantes, columns = ['country', 'countrycode'])


#Empilhando ambos os dicionarios
countries_code3 = pd.concat([paises_faltantes, countries_code3], ignore_index=True)

# 3) Manipulando os dados

Nesta seção, para os dataframes que possuem séries temporais calculo a média total entre todos os anos, como uma forma de agregar as observações (*).

Para cada dataframe incorporo o dicionário de códigos, padronizando o nome de cada país para o seu respectivo código (ISO Alpha 3 Code).

Além disso, junto todos os dataframes.


---
(*)   Para agregar, posso testar usar a variância ou alguma outra métrica.



In [5]:
###########################################################################
######################## --- Manipulando Dados --- ########################
###########################################################################


# Calculando a média entre os anos, renomeando e acrescentando o código do país
cbi2 = cbi.groupby('cname').mean()
cbi2 = cbi2.rename_axis('country').reset_index()
cbi2 = pd.merge(cbi2, countries_code3, on='country', left_index=True, right_index=True, how='left')

# Selecionando apenas as colunas que interessam
cbi2 = cbi2[['countrycode', 'lvau_garriga', 'lvaw_garriga']]


################################################################

# - Alerta: Varias células indicam 'NAN' (não havia regra fiscal no momento). Poderia mudar para 0?
# Sim. Os papers que constroem indices de regras fiscais mudam para 0.
fr2  = fr[:]
fr2[['stab_n', 'stab_s']]  = fr2[['stab_n', 'stab_s']].fillna(0)

# Calculando a média entre os anos, renomeando e acrescentando o código do país
fr2  = fr2.groupby('Country').mean()
fr2  = fr2.rename_axis('country').reset_index()
fr2  = pd.merge(fr2, countries_code3, on='country', left_index=True, right_index=True, how='left')



# Selecionando apenas as colunas que indicam regra fiscal que estabiliza ou nao o ciclo
fr2[['stab_n', 'stab_s']] = fr2[['stab_n', 'stab_s']].fillna(0)
fr2  = fr2[['countrycode', 'stab_n', 'stab_s']]

################################################################


# Calculando a média entre os anos e renomeando
pwt['gdp_percapta'] = pwt['csh_c']/pwt['pop']
pwt['emp_pop'] = pwt['emp']/pwt['pop']
pwt2 = pwt.groupby('countrycode').mean()
pwt2 = pwt2.rename_axis('countrycode').reset_index()

# Selecionando pib per capita, PO, horas trabalhadas média por pessoa, Human capital index
# Welfare-relevant TFP, Share of labour compensation in GDP
pwt2 = pwt2[['countrycode', 'gdp_percapta', 'emp_pop', 'avh', 'hc', 'rwtfpna', 'labsh']]



################################################################

# Renomenado a coluna Code
ipd2 = ipd[:]
ipd2.rename(columns={'Code':'countrycode'}, inplace=True)

ipd2 = ipd2.drop(['Country', 'Year', 'Income level', 'Region'], axis = 1)

################################################################

# Juntando todas as variáveis em um único dataframe
dataframe = pd.merge( cbi2, fr2, on ='countrycode', left_index=True, right_index=True, how='left')
dataframe = pd.merge( dataframe, pwt2, on='countrycode', left_index=True, right_index=True, how='left')
dataframe = pd.merge( dataframe, ipd2, on='countrycode', left_index=True, right_index=True, how='left')

# Os países que nao tem FR nao aparecem nesse banco de dados
dataframe[['stab_n', 'stab_s']] = dataframe[['stab_n', 'stab_s']].fillna(0)



#4) Exportando 

In [9]:
# Exportanto dados

dataframe.to_csv('countries_data.csv')
files.download('countries_data.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>