# Economic Data Processing (WDI)
## Data Dictionary
| Code | Indicator Name |
|----------------|---------------------------------------------------------------------------------|
| DT.DOD.DIMF.CD | Use of IMF credit (DOD, current US\$) |
| DT.DOD.PVLX.CD | Present value of external debt (current US\$) |
| NY.GNP.PCAP.CD | GNI per capita, Atlas method (current US\$) |
| NY.GNP.ATLS.CD | GNI, Atlas method (current US\$) |
| DT.DIS.IDAG.CD | IDA grants (current US\$) |
| DT.ODA.ODAT.CD | Net official development assistance received (current US\$) |
| SI.POV.NAHC | Poverty headcount ratio at national poverty lines (% of population) |
| SI.POV.URHC | Urban poverty headcount ratio at national poverty lines (% of urban population) |
| SI.POV.RUHC | Rural poverty headcount ratio at national poverty lines (% of rural population) |
| SI.POV.NAGP | Poverty gap at national poverty lines (%) |
| SI.POV.GINI | GINI index (World Bank estimate) |

In [1]:
import re

import numpy as np
import pandas as pd
import pycountry

%matplotlib inline

pd.set_option('display.float_format', lambda x: '%.3f' % x)

## Load The File

In [2]:
df = pd.read_excel("../data/external/Economy/WDI/Data_Extract_From_World_Development_Indicators.xlsx")

In [3]:
df.sample(5)

Unnamed: 0,Time,Time Code,Country Name,Country Code,"Use of IMF credit (DOD, current US$) [DT.DOD.DIMF.CD]",Present value of external debt (current US$) [DT.DOD.PVLX.CD],"GNI per capita, Atlas method (current US$) [NY.GNP.PCAP.CD]","GNI, Atlas method (current US$) [NY.GNP.ATLS.CD]",IDA grants (current US$) [DT.DIS.IDAG.CD],Net official development assistance received (current US$) [DT.ODA.ODAT.CD],Poverty headcount ratio at national poverty lines (% of population) [SI.POV.NAHC],Urban poverty headcount ratio at national poverty lines (% of urban population) [SI.POV.URHC],Rural poverty headcount ratio at national poverty lines (% of rural population) [SI.POV.RUHC],Poverty gap at national poverty lines (%) [SI.POV.NAGP],GINI index (World Bank estimate) [SI.POV.GINI]
272,1996,YR1996,Armenia,ARM,116474762.400,..,520.0,1641685571.546,0,292280000,..,..,..,..,..
4773,2013,YR2013,Bermuda,BMU,..,..,106140.0,6899462134.798,..,..,..,..,..,..,..
4202,2010,YR2010,Low & middle income,LMY,148920234365.600,..,3284.892,18835842751599.348,2201576761.420,130272200000,..,..,..,..,..
4615,2012,YR2012,Mexico,MEX,4382059019,..,9750.0,1178346291391.847,0,407930000,45.500,48.300,62.800,0.400,45.400
3954,2009,YR2009,South Asia (IDA & IBRD),TSA,..,..,1058.83,1702242274033.618,..,14434160000,..,..,..,..,..


## Standardize Country Codes

In [4]:
""" Only Select rows with valid country codes
"""
country_locations = []
for country in df['Country Code']:
    try:
        pycountry.countries.lookup(country)
        country_locations.append(True)
    except LookupError:
        country_locations.append(False)
df = df[country_locations]

## Standardize Indexes

In [5]:
df.rename(
    {
        "Time": "Year"
    },
    axis='columns',
    inplace=True)

In [6]:
df.set_index(["Country Code", "Year"], inplace=True)

## Clean Data

### Header

In [7]:
df.drop(["Time Code", "Country Name"],
        axis='columns',
        inplace=True)

In [14]:
c = [ re.search(r"\[(\w+\.)+\w+\]",d)[0].replace("[","").replace("]","") for d in df.columns ]

In [16]:
c_names = {}
for x in range(len(c)):
    c_names[df.columns[x]] = c[x]

In [18]:
df.rename(c_names,axis='columns',inplace=True)

### Data Types

In [23]:
""" Replace '..' with np.nan for better parsing
"""
df = df.replace('..', np.NaN)

In [26]:
df = df.astype(float)

## Save Data

In [29]:
df.to_pickle("../data/processed/Economic_WDI.pickle")