**SYPA: Fundamental Analysis of Foreign Direct Investment** <br>
*1_API* <br>
Harvard SYPA <br>
User: Jake Schneider <br>
Date Created: February 7, 2020 <br>
Date Updated: February 8, 2020

----

**Load Packages**

In [1]:
#Import libraries
import sys
import pandas as pd
from datetime import date, datetime, time, timedelta
import pendulum
import json
import requests
import numpy as np
import math

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
sns.set(style='ticks', context='talk')

from matplotlib.offsetbox import AnchoredText
from sklearn import linear_model
from sklearn.metrics import r2_score
import statsmodels.api as sm
#import fbprophet

import warnings
import itertools

from flask import jsonify, make_response

from varname import varname

from matplotlib.backends.backend_pdf import PdfPages
from fpdf import FPDF
from PIL import Image, ImageDraw, ImageFont

In [2]:
# Create function 'jprint'

def jprint(obj):
    # create a formatted string of the Python JSON object
    text = json.dumps(obj, sort_keys=True, indent=4)
    print(text)

----

**Pull Economic Data from APIs: World Bank, Systemic Peace Project (Polity IV), World Wide Governance Indicators** <br>
*Bloomberg API for Python?*

In [3]:
# Configure World Bank

import config_world_bank

#print(config_world_bank)

#api_key_wb = config_world_bank.api_key #API key no longer required
user_agent = 'Jake_Schneider'

In [4]:
#Test

test = requests.get('http://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?date=2000:2001&format=json')
jprint(test.json())

[
    {
        "lastupdated": "2019-12-20",
        "page": 1,
        "pages": 11,
        "per_page": 50,
        "sourceid": "2",
        "total": 528
    },
    [
        {
            "country": {
                "id": "1A",
                "value": "Arab World"
            },
            "countryiso3code": "ARB",
            "date": "2001",
            "decimal": 0,
            "indicator": {
                "id": "SP.POP.TOTL",
                "value": "Population, total"
            },
            "obs_status": "",
            "unit": "",
            "value": 288432163
        },
        {
            "country": {
                "id": "1A",
                "value": "Arab World"
            },
            "countryiso3code": "ARB",
            "date": "2000",
            "decimal": 0,
            "indicator": {
                "id": "SP.POP.TOTL",
                "value": "Population, total"
            },
            "obs_status": "",
            "unit": "",
            "value

In [5]:
# Create Function For Pulling Economic Data from World Bank

def parse_wb(country, series, start_date = '1970', end_date = '2020'):
    values = requests.get('http://api.worldbank.org/v2/country/{}/indicator/{}?date={}:{}&format=json'.format(country, series, start_date, end_date))
    length = len(values.json())
    #print(length)
    jprint(values.json()) #["observations"][length-1])
    return values.json() #["observations"][length-1]

In [6]:
# Run Function 'Parse_wb'
# 'SP.POP.TOTL' = Total Population

wb_indicators = ['SP.POP.TOTL']

wb_master_data = []
for indicator in wb_indicators:
    wb_data = parse_wb(country = 'all', series = indicator)
    wb_master_data.append(wb_data)

[
    {
        "lastupdated": "2019-12-20",
        "page": 1,
        "pages": 264,
        "per_page": 50,
        "sourceid": "2",
        "total": 13200
    },
    [
        {
            "country": {
                "id": "1A",
                "value": "Arab World"
            },
            "countryiso3code": "ARB",
            "date": "2019",
            "decimal": 0,
            "indicator": {
                "id": "SP.POP.TOTL",
                "value": "Population, total"
            },
            "obs_status": "",
            "unit": "",
            "value": null
        },
        {
            "country": {
                "id": "1A",
                "value": "Arab World"
            },
            "countryiso3code": "ARB",
            "date": "2018",
            "decimal": 0,
            "indicator": {
                "id": "SP.POP.TOTL",
                "value": "Population, total"
            },
            "obs_status": "",
            "unit": "",
            "value":

---

**Import Data through Pandas Data Reader** <br>
Docs: https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#world-bank

*Import PDR*

In [7]:
# Import pandas_datareader

from pandas_datareader import data, wb
import pandas_datareader as pdr

*Test PDR*

In [8]:
## TEST: Use PDR for FRED
#
#pdr.get_data_fred('GS10')

*Find Indicators: Macro and Governance*

In [9]:
# Find Indicators for FDI Project: GDP

matches_gdp = wb.search('gdp.*.*')
print(len(matches_gdp))
matches_gdp.head()

494


Unnamed: 0,id,name,unit,source,sourceNote,sourceOrganization,topics
641,5.51.01.10.gdp,Per capita GDP growth,,Statistical Capacity Indicators,GDP per capita is the sum of gross value added...,b'World Development Indicator (WDI) databank. ...,
643,6.0.GDP_current,GDP (current $),,LAC Equity Lab,GDP is the sum of gross value added by all res...,b'World Development Indicators (World Bank)',Economy & Growth
644,6.0.GDP_growth,GDP growth (annual %),,LAC Equity Lab,Annual percentage growth rate of GDP at market...,b'World Development Indicators (World Bank)',Economy & Growth
645,6.0.GDP_usd,GDP (constant 2005 $),,LAC Equity Lab,GDP is the sum of gross value added by all res...,b'World Development Indicators (World Bank)',Economy & Growth
646,6.0.GDPpc_constant,"GDP per capita, PPP (constant 2011 internation...",,LAC Equity Lab,GDP per capita based on purchasing power parit...,b'World Development Indicators (World Bank)',Economy & Growth


In [10]:
# Store Indicators As a Variable: GDP

list_gdp = list(matches_gdp['id'])
print("The number of GDP variables are: ", len(list_gdp))
#print(list_gdp)

The number of GDP variables are:  494


*Download Indicators*

In [11]:
#for num, indicator in zip(range(1, len(list_gdp)+1), list_gdp):
#    if num % 10 == 0:
#        print(num, "/", len(list_gdp), ":", indicator)

data_wb = wb.download(indicator='5.51.01.10.gdp', country='all', start=1960, end=2018)

In [12]:
data_wb

Unnamed: 0_level_0,Unnamed: 1_level_0,5.51.01.10.gdp
country,year,Unnamed: 2_level_1
East Asia & Pacific (excluding high income),2018,
East Asia & Pacific (excluding high income),2017,
East Asia & Pacific (excluding high income),2016,
East Asia & Pacific (excluding high income),2015,
East Asia & Pacific (excluding high income),2014,
...,...,...
Zimbabwe,2008,1.0
Zimbabwe,2007,1.0
Zimbabwe,2006,1.0
Zimbabwe,2005,1.0


----

**Import Data through wbdata** <br>
Docs: https://wbdata.readthedocs.io/en/stable/ <br>
Example: https://mcmayer.net/getting-worldbank-data-with-python-easily/

*Import wbdata*

In [13]:
# Import wbdata

import wbdata

*Download and Explore*

In [14]:
# Download wbdata sources

sources = wbdata.get_source(display = False)
sources

[{'id': '1', 'name': 'Doing Business', 'description': '', 'url': ''},
 {'id': '2',
  'name': 'World Development Indicators',
  'description': '',
  'url': ''},
 {'id': '3',
  'name': 'Worldwide Governance Indicators',
  'description': '',
  'url': ''},
 {'id': '5',
  'name': 'Subnational Malnutrition Database',
  'description': '',
  'url': ''},
 {'id': '6',
  'name': 'International Debt Statistics',
  'description': '',
  'url': ''},
 {'id': '11',
  'name': 'Africa Development Indicators',
  'description': '',
  'url': ''},
 {'id': '12', 'name': 'Education Statistics', 'description': '', 'url': ''},
 {'id': '13', 'name': 'Enterprise Surveys', 'description': '', 'url': ''},
 {'id': '14', 'name': 'Gender Statistics', 'description': '', 'url': ''},
 {'id': '15', 'name': 'Global Economic Monitor', 'description': '', 'url': ''},
 {'id': '16',
  'name': 'Health Nutrition and Population Statistics',
  'description': '',
  'url': ''},
 {'id': '18',
  'name': 'IDA Results Measurement System',


*Find Indicators*

In [15]:
# Search for Indicators: GDP

wbdata.search_indicators("gdp per capita")

6.0.GDPpc_constant        	GDP per capita, PPP (constant 2011 international $) 
FB.DPT.INSU.PC.ZS         	Deposit insurance coverage (% of GDP per capita)
NV.AGR.PCAP.KD.ZG         	Real agricultural GDP per capita growth rate (%)
NY.GDP.PCAP.CD            	GDP per capita (current US$)
NY.GDP.PCAP.CN            	GDP per capita (current LCU)
NY.GDP.PCAP.KD            	GDP per capita (constant 2010 US$)
NY.GDP.PCAP.KD.ZG         	GDP per capita growth (annual %)
NY.GDP.PCAP.KN            	GDP per capita (constant LCU)
NY.GDP.PCAP.PP.CD         	GDP per capita, PPP (current international $)
NY.GDP.PCAP.PP.KD         	GDP per capita, PPP (constant 2011 international $)
NY.GDP.PCAP.PP.KD.87      	GDP per capita, PPP (constant 1987 international $)
NY.GDP.PCAP.PP.KD.ZG      	GDP per capita, PPP annual growth (%)
SE.XPD.PRIM.PC.ZS         	Government expenditure per student, primary (% of GDP per capita)
SE.XPD.SECO.PC.ZS         	Government expenditure per student, secondary (% of GDP per c

*Cull Datasets: WDI (2), Doing Business (1), WGI (3), International Debt Statistics (6), CPIA (31), Worldwide Bureaucracy Indicators (64), ESG (75)*

In [16]:
# Download World Development Indicators (2)

wdi = wbdata.get_indicator(source=2, display = False)
wdi

[{'id': 'AG.AGR.TRAC.NO',
  'name': 'Agricultural machinery, tractors',
  'source': {'id': '2', 'value': 'World Development Indicators'},
  'sourceNote': 'Agricultural machinery refers to the number of wheel and crawler tractors (excluding garden tractors) in use in agriculture at the end of the calendar year specified or during the first quarter of the following year.',
  'sourceOrganization': 'Food and Agriculture Organization, electronic files and web site.',
  'topics': [{'id': '1', 'value': 'Agriculture & Rural Development  '}]},
 {'id': 'AG.CON.FERT.PT.ZS',
  'name': 'Fertilizer consumption (% of fertilizer production)',
  'source': {'id': '2', 'value': 'World Development Indicators'},
  'sourceNote': 'Fertilizer consumption measures the quantity of plant nutrients used per unit of arable land. Fertilizer products cover nitrogenous, potash, and phosphate fertilizers (including ground rock phosphate). Traditional nutrients--animal and plant manures--are not included. For the purpo

In [17]:
#Create list of WDI IDs

wdi_ids = []
wdi_names = []

for num in range(0, len(wdi)):
    wdi_ids.append(wdi[num]["id"])
    wdi_names.append(wdi[num]["name"])
    
print(len(wdi_ids))
print(wdi_ids[0:5])
print(wdi_names[0:5])

1429
['AG.AGR.TRAC.NO', 'AG.CON.FERT.PT.ZS', 'AG.CON.FERT.ZS', 'AG.LND.AGRI.K2', 'AG.LND.AGRI.ZS']
['Agricultural machinery, tractors', 'Fertilizer consumption (% of fertilizer production)', 'Fertilizer consumption (kilograms per hectare of arable land)', 'Agricultural land (sq. km)', 'Agricultural land (% of land area)']


In [18]:
# Download Doing Business (1)

doing_business = wbdata.get_indicator(source=1, display = False)
doing_business

[{'id': 'ENF.CONT.COEN.ATDR',
  'name': 'Enforcing contracts: Alternative dispute resolution (0-3) (DB16-20 methodology)',
  'source': {'id': '1', 'value': 'Doing Business'},
  'sourceNote': 'The alternative dispute resolution evaluates two aspects: (i) whether domestic commercial arbitration is regulated by law, all disputes can be submitted to arbitration and valid arbitration clauses are usually enforced by courts; and (ii) whether voluntary mediation and/or conciliation are a recognized way of resolving commercial disputes, they are regulated by law and there are financial incentives for parties to attempt mediation of conciliation. The index is computed based on the methodology in the DB17-20 studies.',
  'sourceOrganization': '',
  'topics': []},
 {'id': 'ENF.CONT.COEN.ATFE.PR',
  'name': 'Enforcing contracts: Attorney fees (% of claim)',
  'source': {'id': '1', 'value': 'Doing Business'},
  'sourceNote': 'The attorney fees are the fees that plaintiff must advance to a local atto

In [19]:
#Create list of Doing Business IDs

doing_business_ids = []
doing_business_names = []

for num in range(0, len(doing_business)):
    doing_business_ids.append(doing_business[num]["id"])
    doing_business_names.append(doing_business[num]["name"])
    
print(len(doing_business_ids))
print(doing_business_ids[0:5])
print(doing_business_names[0:5])

205
['ENF.CONT.COEN.ATDR', 'ENF.CONT.COEN.ATFE.PR', 'ENF.CONT.COEN.COST.ZS', 'ENF.CONT.COEN.COST.ZS.DFRN', 'ENF.CONT.COEN.CSMG']
['Enforcing contracts: Alternative dispute resolution (0-3) (DB16-20 methodology)', 'Enforcing contracts: Attorney fees (% of claim)', 'Enforcing contracts: Cost (% of claim)', 'Enforcing contracts: Cost (% of claim) - Score', 'Enforcing contracts: Case management (0-6) (DB16-20 methodology)']


In [20]:
# Download WGI (3)

wgi = wbdata.get_indicator(source=3, display = False)
wgi

[{'id': 'CC.EST',
  'name': 'Control of Corruption: Estimate',
  'source': {'id': '3', 'value': 'Worldwide Governance Indicators'},
  'sourceNote': 'Control of Corruption captures perceptions of the extent to which public power is exercised for private gain, including both petty and grand forms of corruption, as well as "capture" of the state by elites and private interests. Estimate gives the country\'s score on the aggregate indicator, in units of a standard normal distribution, i.e. ranging from approximately -2.5 to 2.5.',
  'sourceOrganization': 'Detailed documentation of the WGI, interactive tools for exploring the data, and full access to the underlying source data available at www.govindicators.org.The WGI are produced by Daniel Kaufmann (Natural Resource Governance Institute and Brookings Institution) and Aart Kraay (World Bank Development Research Group).  Please cite Kaufmann, Daniel, Aart Kraay and Massimo Mastruzzi (2010).  "The Worldwide Governance Indicators:  Methodolog

In [21]:
#Create list of WGI IDs

wgi_ids = []
wgi_names = []

for num in range(0, len(wgi)):
    wgi_ids.append(wgi[num]["id"])
    wgi_names.append(wgi[num]["name"])
    
print(len(wgi_ids))
print(wgi_ids[0:5])
print(wgi_names[0:5])

36
['CC.EST', 'CC.NO.SRC', 'CC.PER.RNK', 'CC.PER.RNK.LOWER', 'CC.PER.RNK.UPPER']
['Control of Corruption: Estimate', 'Control of Corruption: Number of Sources', 'Control of Corruption: Percentile Rank', 'Control of Corruption: Percentile Rank, Lower Bound of 90% Confidence Interval', 'Control of Corruption: Percentile Rank, Upper Bound of 90% Confidence Interval']


In [22]:
# Download International Debt Stats (6)

debt = wbdata.get_indicator(source=6, display = False)
debt

[{'id': 'BM.GSR.TOTL.CD',
  'name': 'Imports of goods, services and primary income (BoP, current US$)',
  'source': {'id': '6', 'value': 'International Debt Statistics'},
  'sourceNote': 'Imports of goods, services and primary income is the sum of goods imports, service imports and primary income payments. Data are in current U.S. dollars.',
  'sourceOrganization': 'International Monetary Fund, Balance of Payments Statistics Yearbook and data files.',
  'topics': [{'id': '3', 'value': 'Economy & Growth'},
   {'id': '20', 'value': 'External Debt'},
   {'id': '21', 'value': 'Trade'}]},
 {'id': 'BN.CAB.XOKA.CD',
  'name': 'Current account balance (BoP, current US$)',
  'source': {'id': '6', 'value': 'International Debt Statistics'},
  'sourceNote': 'Current account balance is the sum of net exports of goods and services, net primary income, and net secondary income. Data are in current U.S. dollars.',
  'sourceOrganization': 'International Monetary Fund, Balance of Payments Statistics Yea

In [23]:
#Create list of International Debt Statistics IDs

debt_ids = []
debt_names = []

for num in range(0, len(debt)):
    debt_ids.append(debt[num]["id"])
    debt_names.append(debt[num]["name"])
    
print(len(debt_ids))
print(debt_ids[0:5])
print(debt_names[0:5])

497
['BM.GSR.TOTL.CD', 'BN.CAB.XOKA.CD', 'BX.GRT.EXTA.CD.DT', 'BX.GRT.TECH.CD.DT', 'BX.GSR.TOTL.CD']
['Imports of goods, services and primary income (BoP, current US$)', 'Current account balance (BoP, current US$)', 'Grants, excluding technical cooperation (current US$)', 'Technical cooperation grants (current US$)', 'Exports of goods, services and primary income (BoP, current US$)']


In [24]:
# Download CPIA (31)

cpia = wbdata.get_indicator(source=31, display = False)
cpia

[{'id': 'IQ.CPA.BREG.XQ',
  'name': 'CPIA business regulatory environment rating (1=low to 6=high)',
  'source': {'id': '31',
   'value': 'Country Policy and Institutional Assessment'},
  'sourceNote': 'Business regulatory environment assesses the extent to which the legal, regulatory, and policy environments help or hinder private businesses in investing, creating jobs, and becoming more productive.',
  'sourceOrganization': 'World Bank Group, CPIA database (http://www.worldbank.org/ida).',
  'topics': [{'id': '13', 'value': 'Public Sector '}]},
 {'id': 'IQ.CPA.DEBT.XQ',
  'name': 'CPIA debt policy rating (1=low to 6=high)',
  'source': {'id': '31',
   'value': 'Country Policy and Institutional Assessment'},
  'sourceNote': 'Debt policy assesses whether the debt management strategy is conducive to minimizing budgetary risks and ensuring long-term debt sustainability.',
  'sourceOrganization': 'World Bank Group, CPIA database (http://www.worldbank.org/ida).',
  'topics': [{'id': '13', 

In [25]:
#Create list of International Debt Statistics IDs

cpia_ids = []
cpia_names = []

for num in range(0, len(cpia)):
    cpia_ids.append(cpia[num]["id"])
    cpia_names.append(cpia[num]["name"])
    
print(len(cpia_ids))
print(cpia_ids[0:5])
print(cpia_names[0:5])

21
['IQ.CPA.BREG.XQ', 'IQ.CPA.DEBT.XQ', 'IQ.CPA.ECON.XQ', 'IQ.CPA.ENVR.XQ', 'IQ.CPA.FINQ.XQ']
['CPIA business regulatory environment rating (1=low to 6=high)', 'CPIA debt policy rating (1=low to 6=high)', 'CPIA economic management cluster average (1=low to 6=high)', 'CPIA policy and institutions for environmental sustainability rating (1=low to 6=high)', 'CPIA quality of budgetary and financial management rating (1=low to 6=high)']


In [26]:
# Download World Bureaucracy (64)

bureaucracy = wbdata.get_indicator(source=64, display = False)
bureaucracy

[{'id': 'BI.EMP.FRML.PB.ZS',
  'name': 'Public sector employment as a share of formal employment',
  'source': {'id': '64', 'value': 'Worldwide Bureaucracy Indicators'},
  'sourceNote': '',
  'sourceOrganization': '',
  'topics': []},
 {'id': 'BI.EMP.PWRK.PB.FE.ZS',
  'name': 'Public sector employment as a share of paid employment by gender (Female)',
  'source': {'id': '64', 'value': 'Worldwide Bureaucracy Indicators'},
  'sourceNote': '',
  'sourceOrganization': '',
  'topics': []},
 {'id': 'BI.EMP.PWRK.PB.MA.ZS',
  'name': 'Public sector employment as a share of paid employment by gender (Male)',
  'source': {'id': '64', 'value': 'Worldwide Bureaucracy Indicators'},
  'sourceNote': '',
  'sourceOrganization': '',
  'topics': []},
 {'id': 'BI.EMP.PWRK.PB.RU.ZS',
  'name': 'Public sector employment as a share of paid employment by location (Rural)',
  'source': {'id': '64', 'value': 'Worldwide Bureaucracy Indicators'},
  'sourceNote': '',
  'sourceOrganization': '',
  'topics': []},
 

In [27]:
#Create list of Bureaucracy IDs

bureaucracy_ids = []
bureaucracy_names = []

for num in range(0, len(bureaucracy)):
    bureaucracy_ids.append(bureaucracy[num]["id"])
    bureaucracy_names.append(bureaucracy[num]["name"])
    
print(len(bureaucracy_ids))
print(bureaucracy_ids[0:5])
print(bureaucracy_names[0:5])

87
['BI.EMP.FRML.PB.ZS', 'BI.EMP.PWRK.PB.FE.ZS', 'BI.EMP.PWRK.PB.MA.ZS', 'BI.EMP.PWRK.PB.RU.ZS', 'BI.EMP.PWRK.PB.UR.ZS']
['Public sector employment as a share of formal employment', 'Public sector employment as a share of paid employment by gender (Female)', 'Public sector employment as a share of paid employment by gender (Male)', 'Public sector employment as a share of paid employment by location (Rural)', 'Public sector employment as a share of paid employment by location (Urban)']


In [28]:
# Download ESG (75)

esg = wbdata.get_indicator(source=64, display = False)
esg

[{'id': 'BI.EMP.FRML.PB.ZS',
  'name': 'Public sector employment as a share of formal employment',
  'source': {'id': '64', 'value': 'Worldwide Bureaucracy Indicators'},
  'sourceNote': '',
  'sourceOrganization': '',
  'topics': []},
 {'id': 'BI.EMP.PWRK.PB.FE.ZS',
  'name': 'Public sector employment as a share of paid employment by gender (Female)',
  'source': {'id': '64', 'value': 'Worldwide Bureaucracy Indicators'},
  'sourceNote': '',
  'sourceOrganization': '',
  'topics': []},
 {'id': 'BI.EMP.PWRK.PB.MA.ZS',
  'name': 'Public sector employment as a share of paid employment by gender (Male)',
  'source': {'id': '64', 'value': 'Worldwide Bureaucracy Indicators'},
  'sourceNote': '',
  'sourceOrganization': '',
  'topics': []},
 {'id': 'BI.EMP.PWRK.PB.RU.ZS',
  'name': 'Public sector employment as a share of paid employment by location (Rural)',
  'source': {'id': '64', 'value': 'Worldwide Bureaucracy Indicators'},
  'sourceNote': '',
  'sourceOrganization': '',
  'topics': []},
 

In [29]:
#Create list of ESG IDs

esg_ids = []
esg_names = []

for num in range(0, len(esg)):
    esg_ids.append(esg[num]["id"])
    esg_names.append(esg[num]["name"])
    
print(len(esg_ids))
print(esg_ids[0:5])
print(esg_names[0:5])

87
['BI.EMP.FRML.PB.ZS', 'BI.EMP.PWRK.PB.FE.ZS', 'BI.EMP.PWRK.PB.MA.ZS', 'BI.EMP.PWRK.PB.RU.ZS', 'BI.EMP.PWRK.PB.UR.ZS']
['Public sector employment as a share of formal employment', 'Public sector employment as a share of paid employment by gender (Female)', 'Public sector employment as a share of paid employment by gender (Male)', 'Public sector employment as a share of paid employment by location (Rural)', 'Public sector employment as a share of paid employment by location (Urban)']


*Create Country Lists*

In [30]:
# Download Countries

countries = wbdata.get_country(display = False)
countries[0:5]

[{'id': 'ABW',
  'iso2Code': 'AW',
  'name': 'Aruba',
  'region': {'id': 'LCN', 'value': 'Latin America & Caribbean '},
  'adminregion': {'id': '', 'value': ''},
  'incomeLevel': {'id': 'HIC', 'value': 'High income'},
  'lendingType': {'id': 'LNX', 'value': 'Not classified'},
  'capitalCity': 'Oranjestad',
  'longitude': '-70.0167',
  'latitude': '12.5167'},
 {'id': 'AFG',
  'iso2Code': 'AF',
  'name': 'Afghanistan',
  'region': {'id': 'SAS', 'value': 'South Asia'},
  'adminregion': {'id': 'SAS', 'value': 'South Asia'},
  'incomeLevel': {'id': 'LIC', 'value': 'Low income'},
  'lendingType': {'id': 'IDX', 'value': 'IDA'},
  'capitalCity': 'Kabul',
  'longitude': '69.1761',
  'latitude': '34.5228'},
 {'id': 'AFR',
  'iso2Code': 'A9',
  'name': 'Africa',
  'region': {'id': 'NA', 'value': 'Aggregates'},
  'adminregion': {'id': '', 'value': ''},
  'incomeLevel': {'id': 'NA', 'value': 'Aggregates'},
  'lendingType': {'id': '', 'value': 'Aggregates'},
  'capitalCity': '',
  'longitude': '',
 

In [31]:
#Create list of Individual and Aggregate Countries with IDs

countries_individual = []
countries_aggregates = []

countries_individual_ids = []
countries_aggregates_ids = []

for num in range(0, len(countries)):
    if countries[num]['region']['value'] != 'Aggregates':
        countries_individual.append(countries[num])
        countries_individual_ids.append(countries[num]["id"])
    else:
        countries_aggregates.append(countries[num])
        countries_aggregates_ids.append(countries[num]["id"])

In [32]:
# View Number of Countries Individual and Aggregates

print("Countries Individual: ", len(countries_individual))
print("Countries Individual IDs: ", len(countries_individual_ids))

print("Country Aggregates: ", len(countries_aggregates))
print("Country Aggregate IDs: ", len(countries_aggregates_ids))

print("Total Countries: ", len(countries))

Countries Individual:  218
Countries Individual IDs:  218
Country Aggregates:  86
Country Aggregate IDs:  86
Total Countries:  304


In [33]:
# View Countries IDs

countries_individual_ids[0:5]

['ABW', 'AFG', 'AGO', 'ALB', 'AND']

*Cull Data Using Indicators and Countries List*

In [34]:
# TEST Culling Data

test_df = wbdata.get_dataframe({'FP.CPI.TOTL.ZG': 'value'}, country=countries_individual_ids)
test_df

Unnamed: 0_level_0,Unnamed: 1_level_0,value
country,date,Unnamed: 2_level_1
Aruba,2019,
Aruba,2018,3.580394
Aruba,2017,-0.481315
Aruba,2016,-0.886758
Aruba,2015,0.475067
...,...,...
Zimbabwe,2004,
Zimbabwe,2003,
Zimbabwe,2002,
Zimbabwe,2001,


In [35]:
# TEST 2: WB Data

test_df2 = wbdata.get_dataframe({wdi_ids[100]: wdi_names[100]}, country=countries_individual_ids)
test_df2

Unnamed: 0_level_0,Unnamed: 1_level_0,"Net bilateral aid flows from DAC donors, Italy (current US$)"
country,date,Unnamed: 2_level_1
Aruba,2019,
Aruba,2018,
Aruba,2017,
Aruba,2016,
Aruba,2015,
...,...,...
Zimbabwe,2004,1440000.0
Zimbabwe,2003,1120000.0
Zimbabwe,2002,2090000.0
Zimbabwe,2001,1740000.0


In [36]:
## Pull WDI Data
#
#wdi_df = pd.DataFrame()
#
#for num, indicator, name in zip(range(1, len(wdi) + 1), wdi_ids, wdi_names):
#    if num % 10 == 0:
#        print(num, " / ", len(wdi), " : ", name)
#    df = wbdata.get_dataframe({indicator: name}, country=countries_individual_ids)
#    wdi_df = pd.concat([wdi_df, df], axis = 1)

In [37]:
## View WDI Data
#
#wdi_df

In [38]:
# Pull Doing Business Data

doing_business_df = pd.DataFrame()

for num, indicator, name in zip(range(1, len(doing_business) + 1), doing_business_ids, doing_business_names):
    if num % 10 == 0:
        print(num, " / ", len(doing_business), " : ", name)
    try:
        df = wbdata.get_dataframe({indicator: name}, country=countries_individual_ids)
    except:
        print(indicator, "failed")
        pass
    doing_business_df = pd.concat([doing_business_df, df], axis = 1)

10  /  205  :  Enforcing contracts (DB04-15 methodology) - Score
20  /  205  :  Enforcing contracts: Quality of the judicial processes index (0-18) (DB17-20 methodology)
30  /  205  :  Ease of doing business index (1=most business-friendly regulations)
40  /  205  :  Dealing with construction permits: Quality of building regulations index (0-2) (DB16-20 methodology)
50  /  205  :  Getting credit (DB15-20 methodology) - Score
60  /  205  :  Getting credit: Credit bureau coverage (% of adults)
IC.ELC.ACES.RK DB19 failed
70  /  205  :  Getting electricity: Mechanisms for monitoring outages (0-1) (DB16-20 methodology)
80  /  205  :  Getting electricity: System average interruption duration index (SAIDI) (DB16-20 methodology)
90  /  205  :  Starting a business: Time - Women (days)- Score
100  /  205  :  Registering property (DB05-15 methodology) - Score
110  /  205  :  Registering property: Procedures (number) - Score
120  /  205  :  Paying taxes: Time to comply with corporate income tax co

In [39]:
# View Doing Business Data

doing_business_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Enforcing contracts: Alternative dispute resolution (0-3) (DB16-20 methodology),Enforcing contracts: Attorney fees (% of claim),Enforcing contracts: Cost (% of claim),Enforcing contracts: Cost (% of claim) - Score,Enforcing contracts: Case management (0-6) (DB16-20 methodology),Enforcing contracts: Court automation (0-4) (DB17-20 methodology),Enforcing contracts: Court fees (% of claim),Enforcing contracts: Court structure and proceedings (0-5) (DB16 methodology),Enforcing contracts: Court structure and proceedings (0-5) (DB17-20 methodology),Enforcing contracts (DB04-15 methodology) - Score,...,Trading across borders: Cost to import (US$ per container deflated)(DB06-15 methodology),Trading across borders: Cost to import (US$ per container)(DB06-15 methodology) - Score,Trading across borders: Cost to import: Documentary compliance (USD) (DB16-20 methodology),Trading across borders: Cost to import: Documentary compliance (USD) (DB16-20 methodology) - Score,Time to import: Documentary compliance (hours) (DB16-20 methodology),Time to import (days) (DB06-15 methodology),Trading across borders: Time to import: Border compliance (hours) (DB16-20 methodology) - Score,Trading across borders: Time to import: Documentary compliance (hours) (DB16-20 methodology) - Score,Trading across borders: Time to import (days) (DB06-15 methodology) - Score,Rank: Trading across borders (1=most business-friendly regulations)
country,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Afghanistan,1960,,,,,,,,,,,...,,,,,,,,,,
Afghanistan,1961,,,,,,,,,,,...,,,,,,,,,,
Afghanistan,1962,,,,,,,,,,,...,,,,,,,,,,
Afghanistan,1963,,,,,,,,,,,...,,,,,,,,,,
Afghanistan,1964,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zimbabwe,2016,2.0,70.0,83.1,6.63667,1.0,0.0,8.0,,3.0,,...,,,150.0,78.57143,81.0,,78.97252,66.5272,,
Zimbabwe,2017,2.0,70.0,83.1,6.63667,1.0,0.0,8.0,,3.0,,...,,,150.0,78.57143,81.0,,18.75747,66.5272,,
Zimbabwe,2018,2.0,70.0,83.1,6.63667,1.0,0.0,8.0,,3.0,,...,,,150.0,78.57143,81.0,,18.75747,66.5272,,
Zimbabwe,2019,2.0,70.0,83.1,6.63667,1.0,0.5,8.0,,3.0,,...,,,150.0,78.57143,81.0,,18.75747,66.5272,,


In [40]:
# Save Doing Business Data

doing_business_df.to_csv('../../2_Inputs/doing_business_df.csv')

In [41]:
# Pull, Print and Save WGI Data

wgi_df = pd.DataFrame()

for num, indicator, name in zip(range(1, len(wgi) + 1), wgi_ids, wgi_names):
    if num % 10 == 0:
        print(num, " / ", len(wgi), " : ", name)
    try:
        df = wbdata.get_dataframe({indicator: name}, country=countries_individual_ids)
    except:
        print(indicator, "failed")
        pass
    wgi_df = pd.concat([wgi_df, df], axis = 1)
    
print(wgi_df)

wgi_df.to_csv('../../2_Inputs/wgi_df.csv')

10  /  36  :  Government Effectiveness: Percentile Rank, Lower Bound of 90% Confidence Interval
20  /  36  :  Rule of Law: Number of Sources
30  /  36  :  Regulatory Quality: Standard Error
               Control of Corruption: Estimate  \
country  date                                    
Aruba    2018                         1.252027   
         2017                         1.291643   
         2016                         1.285848   
         2015                         1.297111   
         2014                         1.018918   
...                                        ...   
Zimbabwe 2003                        -1.245055   
         2002                        -1.228440   
         2000                        -0.982141   
         1998                        -0.480248   
         1996                        -0.278847   

               Control of Corruption: Number of Sources  \
country  date                                             
Aruba    2018                            

In [42]:
# Pull, Print and Save Debt Data

debt_df = pd.DataFrame()

for num, indicator, name in zip(range(1, len(debt) + 1), debt_ids, debt_names):
    if num % 10 == 0:
        print(num, " / ", len(debt), " : ", name)
    try:
        df = wbdata.get_dataframe({indicator: name}, country=countries_individual_ids)
    except:
        print(indicator, "failed")
        pass
    debt_df = pd.concat([debt_df, df], axis = 1)
    
print(debt_df)

debt_df.to_csv('../../2_Inputs/debt_df.csv')

10  /  497  :  PPG, bilateral (AMT, current US$)
20  /  497  :  Principal repayments on external debt, general government sector (PPG) (AMT, current US$)
30  /  497  :  PPG, multilateral (AMT, current US$)
40  /  497  :  PPG, official creditors (AMT, current US$)
50  /  497  :  PPG, commercial banks (AMT, current US$)
60  /  497  :  PRVG, other private creditors (AMT, current US$)
70  /  497  :  Principal arrears, private creditors (current US$)
80  /  497  :  Commitments, official creditors (COM, current US$)
90  /  497  :  Currency composition of PPG debt, Pound sterling (%)
100  /  497  :  OPS, bilateral concessional (DIS, current US$)
110  /  497  :  Disbursements on external debt, public and publicly guaranteed (PPG) (DIS, current US$)
120  /  497  :  GG, multilateral concessional (DIS, current US$)
130  /  497  :  GG, bonds (DIS, current US$)
140  /  497  :  PNG, commercial banks and other creditors (DIS, current US$)
150  /  497  :  PRVG, private creditors (DIS, current US$)
160

In [43]:
# Pull, Print and Save CPIA Data

cpia_df = pd.DataFrame()

for num, indicator, name in zip(range(1, len(cpia) + 1), cpia_ids, cpia_names):
    if num % 10 == 0:
        print(num, " / ", len(cpia), " : ", name)
    try:
        df = wbdata.get_dataframe({indicator: name}, country=countries_individual_ids)
    except:
        print(indicator, "failed")
        pass
    cpia_df = pd.concat([cpia_df, df], axis = 1)
    
print(cpia_df)

cpia_df.to_csv('../../2_Inputs/cpia_df.csv')

10  /  21  :  IDA resource allocation index (1=low to 6=high)
20  /  21  :  CPIA trade rating (1=low to 6=high)
               CPIA business regulatory environment rating (1=low to 6=high)  \
country  date                                                                  
Aruba    2019                                                NaN               
         2018                                                NaN               
         2017                                                NaN               
         2016                                                NaN               
         2015                                                NaN               
...                                                          ...               
Zimbabwe 2004                                                NaN               
         2003                                                NaN               
         2002                                                NaN               
        

In [45]:
# Pull, Print and Save Bureacracy Data

bureaucracy_df = pd.DataFrame()

for num, indicator, name in zip(range(1, len(bureaucracy) + 1), bureaucracy_ids, bureaucracy_names):
    if num % 10 == 0:
        print(num, " / ", len(bureaucracy), " : ", name)
    try:
        df = wbdata.get_dataframe({indicator: name}, country=countries_individual_ids)
    except:
        print(indicator, "failed")
        pass
    bureaucracy_df = pd.concat([bureaucracy_df, df], axis = 1)
    
print(bureaucracy_df)

bureaucracy_df.to_csv('../../2_Inputs/bureaucracy_df.csv')

10  /  87  :  Public sector employment as a share of total employment by location (Rural)
20  /  87  :  Share of private paid employees with a contract
30  /  87  :  Females as a share of private paid employees by occupation (Professionals)
40  /  87  :  Share of public paid employees with a contract
50  /  87  :  Number of public paid employees
60  /  87  :  Number of paid employees
70  /  87  :  Public sector wage premium by occupation (Professionals) (compared to formal wage employees)
80  /  87  :  Relative wage of Technicians in private sector  (using clerk as reference) 
                  Public sector employment as a share of formal employment  \
country     date                                                             
Afghanistan 2016                                                NaN          
            2015                                                NaN          
            2014                                                NaN          
            2013          

In [46]:
# Pull, Print and Save ESG Data

esg_df = pd.DataFrame()

for num, indicator, name in zip(range(1, len(esg) + 1), esg_ids, esg_names):
    if num % 10 == 0:
        print(num, " / ", len(esg), " : ", name)
    try:
        df = wbdata.get_dataframe({indicator: name}, country=countries_individual_ids)
    except:
        print(indicator, "failed")
        pass
    esg_df = pd.concat([esg_df, df], axis = 1)
    
print(esg_df)

esg_df.to_csv('../../2_Inputs/esg_df.csv')

10  /  87  :  Public sector employment as a share of total employment by location (Rural)
20  /  87  :  Share of private paid employees with a contract
30  /  87  :  Females as a share of private paid employees by occupation (Professionals)
40  /  87  :  Share of public paid employees with a contract
50  /  87  :  Number of public paid employees
60  /  87  :  Number of paid employees
70  /  87  :  Public sector wage premium by occupation (Professionals) (compared to formal wage employees)
80  /  87  :  Relative wage of Technicians in private sector  (using clerk as reference) 
                  Public sector employment as a share of formal employment  \
country     date                                                             
Afghanistan 2016                                                NaN          
            2015                                                NaN          
            2014                                                NaN          
            2013          