# Economic Data Processing (MADDISON)

>Maddison Project Database, version 2018. Bolt, Jutta, Robert Inklaar, Herman de Jong and Jan Luiten van Zanden (2018), “Rebasing ‘Maddison’: new income comparisons and the shape of long-run economic development”, Maddison Project Working paper 10

## Data Dictionary

| Full data | Data in  single table |
|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| countrycode | 3-letter ISO country code |
| country | Country name |
| year | Year |
| cgdppc | Real GDP per capita in 2011US\$, multiple benchmarks (suitable for cross-country income comparisons) |
| rgdpnapc | Real GDP per capita in 2011US\$, 2011 benchmark (suitable for cross-country growth comparisons) |
| pop | Population, mid-year (thousands) |
| i_cig | 0/1/2: observation is extrapolated (0), benchmark (1), or interpolated (2) |
| i_bm | For benchmark observations: 1: ICP PPP estimates, 2: Historical income benchmarks, 3: Real wages and urbanization, 4: Multiple of subsistence, 5: Braithwaite (1968) PPPs |
| Partial countries | Data for selected sub-national units with long time series |

In [1]:
import pandas as pd
import pycountry

%matplotlib inline

pd.set_option('display.float_format', lambda x: '%.3f' % x)

## Load The File

In [2]:
df = pd.read_excel("../data/external/Economy/MADDISON/mpd2018.xlsx",
                   sheet_name='Full data')

In [3]:
df.sample(5)

Unnamed: 0,countrycode,country,year,cgdppc,rgdpnapc,pop,i_cig,i_bm
15194,PRY,Paraguay,1910,,,554.0,,
11609,MMR,Myanmar,2016,6139.0,5284.0,55174.0,Extrapolated,
19456,YUG,Former Yugoslavia,2014,15524.0,14627.0,21946.0,Benchmark,ICP PPP estimates
11706,MNG,Mongolia,1976,1393.0,3809.0,1487.0,Extrapolated,
9006,ITA,Italy,1939,3196.0,6076.0,43865.0,Interpolated,


## Standardize Country Codes

In [4]:
""" Only Select rows with valid country codes
"""
country_locations = []
for country in df['countrycode']:
    try:
        pycountry.countries.lookup(country)
        country_locations.append(True)
    except LookupError:
        country_locations.append(False)
df = df[country_locations]

## Standardize Indexes

### Years (1995≤ x ≥2017)

In [5]:
df = df[df['year'] >= 1995]
df = df[df['year'] <= 2017]

### Reindex & Rename

In [6]:
df.rename(
    {
        "year": "Year",
        "countrycode": "Country Code",
        "cgdppc": "Maddison GDPPC"
    },
    axis='columns',
    inplace=True)

In [7]:
df.set_index(["Country Code", "Year"], inplace=True)

## Clean Data
### Remove unneeded variables

In [8]:
df.drop(["country", "i_cig", "i_bm", "rgdpnapc", "pop"],
        axis='columns',
        inplace=True)

### Data Types

In [9]:
df.dtypes

Maddison GDPPC    float64
dtype: object

## Save Data

In [10]:
df.to_pickle("../data/processed/Economic_MADDISON.pickle")