Term paper for the course in Microeconometrics. Summer 2021, M.Sc. Economics, University of Bonn. Melih Damar

---
# Replication of Acemoglu et al.(2014): Institutions, Human Capital, and Development
---


This paper contains my replication of the following paper:
  * [Acemoglu,D., Gallego,F.A., & Robinson, J.A.(2014). Institutions, Human Capital, and Development. *Annual Reviews of   Economics, 6*, 875-912.](https://doi.org/10.1146/annurev-economics-080213-041119)
  
#### Information about the organization of the paper
* Throught the paper, I will be in line with the original structure of the paper in order for readers to compare the results with the original paper. I excluded the section called "Comments on the Previous Literature" All tables and figures will be labeled in accordence with the original paper.

* All sections contining my independent contributions to the original paper will be labeled as *extentions*

## Table of Content

* [1. Introduction](#Introduction)<br>
    * [1.1 Background](#background)<br>
    * [1.2 This Article](#this_article)<br>
* [2. Colonization and Human Capital](#colonization_and_human_capital)<br>
* [3. Data and Descriptive Statistics](#data_and_descriptive_statistics)<br>
    * [3.1 Cross-Country Data](#cross_country_data)<br>
    * [3.2 Sources of Variation in Human Capital](#sources_of_variation_in_human_capital)<br>
    * [3.3 Regional Data](#regional_data)<br>
* [4. Cross-Country Evidence](#cross_country_evidence)<br>
    * [4.1 Ordinary Least Squares Regressions](#ordinary_least_squares_regressions)<br>  
    * [4.2 Semistructural Models](#semistructural_models)<br>  
    * [4.3 Full Two-Stage Least Squares Models](#full_two-stage_least_squares_models)   
    * [4.4 Full Two-Stage Least Squares Models](#does_human_capital_cause_institutions)
* [5. Cross-Regional Evidence](#cross_regional_evidence)<br>
    * [5.1 Ordinary Least Squares Regressions](#ordinary_least_squares_regressions2)<br>
    * [5.2 Two-Stage Least Squares Models](#two_stage_least_squares_models)<br>
* [6. Conclusion](#conclusion)<br>

In [8]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from stargazer.stargazer import Stargazer
from IPython.core.display import HTML
from statsmodels.iolib import summary2
from statsmodels.iolib import summary
from linearmodels import IV2SLS
from linearmodels import IVLIML

from auxiliary.project_auxiliary_table import *

---

## 1. Introduction <a class="anchor" id="Introduction"></a>

---

### 1.1. Background <a class="anchor" id="background"></a>

In this paper, the authors revisited the relationship between institutions, human capital and development since they think that taking human capital and instituions as an exogenous variables in emprical researches causes to estimate very large returns on human capital compered to the Mincerian estimates because of omitted variable bias problem and differantial measurement error in the variables. They take historicaly determined differences in human capital and the effect of institutions into account, they found a estimate for the returns of human capital consistent with Mincerian estimates by using cross-country and cross-regional regressions.

North and Thomas(1973) lists innovation (TFP), education and capital accumulation as proximate determinants of economic growth when explaning why some countries are richer than others. The diagrams below shows the intiution behind this argument.
<center>$ fundamental \ determinants \Longrightarrow\ proximate\ determinants\Longrightarrow\ economic\ development$</center>
More spesifically;
<center>$Institutions \Longrightarrow\begin{array}\\ 
TFP\\
human \ capital\\
physical \ capital
\end{array}
\Bigg\}\Longrightarrow\ Economic \ Development $</center>

The same logic also applies when fundamental determinant is culture or geography instead of institutions.

One of the interesting question arising from this argument is why some countries have more innovation, human capital and capital accumulation than others.

Even though institutions are always present in the economic theory, it is generally incleduded implicitly. Most of the early economic modes assumes a set of institutions such as property right of factors of production or ownershio of some assets in economics. However, the the effect of institutions on economic growth was left wasn't paid more attention to examine. However, it is difficult to build a model to obtain convincing result about the relationships of institutions with other economic variable as institutions are generally exogenous and they are mostly related with other factors which are affecting economic performance of a nation.

Recently,there has been an attempt to find different methods to be able to control to pure effect of institutions which are treated as an exogenous variable. Acemoglu et al.(2001) used historically determined factors to obtain exogenous measure of institutions. They suggest that during colonization of New World, Europens brought different type of institutions from their countries based on certain characteristics of the colonized regions. At one extreme sitution, they form extractive institutions in such a way to transfer resources to Europe, which was the reason for the formation of economic institutions which led to the creation of rules enabling slavery, discrimation, monopolies and insecure property rights. Most of the African countries is an example for this type of institutions. At the other extreme situtation, European colonizers settled down to the colonized region and brought European institutions, which contibute more to sustainable economic development. The decision whether to settle down or not was based on the mortality rate of European settlers. If the mortality rate of settlers were high, they constituted more extractive institutions whereas they formed better institutions which were so similar to , even sometimes better than European laws in colonies where mortality rate is low enough to settle down. After the independence of the old colonies, these institutions persisted to exist. Therefore, Acemoglu et al.(2001) argued that the mortality rate of early European colonizers can be used as an instrumental variable for the current institutions of the countries beacuse the mortality rate was determined exogonously and randomly which has no direct effect on current development sitution of a country. The following diagram can show this relations;

<center>$ Potential \ mortality \ rate \ of \ European\ settlers\Longrightarrow\ Settlements \Longrightarrow\ Past \ institutions \Longrightarrow\ Current \ institutions $</center> 
    
   
Acemoglu et al.(2001) used two-stage least squares (2SLS) regression model where log GDP per capita 2005 was the dependent variable and protection against the risk of expropriation, as a measure of current instutions, was the main explanatory variable instrumented by logarithm of settlers mortality rate. Even further, Acemoglu et al.(2012) used capped potential settler mortality rate as an alternative instrumental variable. The results demonstrate that instutions have a large effect on long-run economic development of a country with both of the formulation of instrumental variable, accounting 75% of difference between high and low institutions countries. Moreover, these results were robust to controlling various geographical characteristics of a country that could be correlated with the economic development. The authors didn't control for the proximate determinants of economic development in the framework defined by North&Thomas(1973) as the channel instutions affect economic growth of a country is the proximate determinants and controlling for them would lead to a "bad control" as Angrist & Pischke(2008) suggested.

The framework suggested by Acemoglu(2001) was challenged by Glaeser et al.(2004) for putting institutions before human capital. They suggested that, on the contary to Acemoglu et al(2001), European settlers brought human capital and the places they brought more human capital constitute more organized and better societies which enabled them to experience economic developmnet.

### 1.2. This Article <a class="anchor" id="this_article"></a>      

In this article, Acemoglu et al.(2001) has there main contribution in terms of evaluating the the impact of human capital and institutions on the economic performance of a country. The first main contribution is to give a brief historical explanation about the human capital European colonizers brought to their colonies in order to show that, on the contray to what Glaeser et al.(2004) suppose, more human capital was brouught to the extractive colonies than inclusive colonies, which means the main reason for the differences in economic development of early colonized countries is not the human capital they brought rather the institutions which supports mass schooling. 

The second main contribution of this paper is that when they treat human capital as an exogenous variable or instrument with early Protestant missionary activities, the estimate is 25-35%, which is similar to the results of Glaeser et al.(2004) whereas when they control the human capital and historical determinants of institutions or simultaneously treat them as endogenous variables the impact of human capital on economic development is estimated closer to Mincerian evidence (contribution of one more year of schooling on individual earnings) as 6-10%, which is less than what Galeaser et. al.(2004) estimated. The authours suggested that the main reason why these two numbers are different is omitted variables whose effect is captured by human capital and they think that it is institutions. This results suggest that instituions have effect on long-run development through human capital channel.

The third contribution is that they investigated the effect of human capital on long-run development by using a cross-regional data and they concluded that the huge inequalities between region is correlated with the educational background of the inhabitants of the regions. They found the results similar to the second contribution when they treat the human capital seperately as an exogenous variable and endogonous variable instrumented with Protestant missionary activities. 

To conclude, Acemoglu et al.(2014) suggested that human capital is one of the channel institutions has an effect on economic development once historically instrumental differences are controlled with a plausible rate implied by Mincerian evidence.

### 2. Colonization and Human Capital <a class="anchor" id="colonization_and_human_capital"></a>   

Colonization of New World with Europens created not only different types of institutions in colonized countries but also variation in human capital brought by early colonizers. Glaeser et al.(2004) argued that the variation in economic developments of former colonies of European countries was created by the difference in human capital they brought not the instutitons. However, historical evidences suggest that the conquistadors who colonized South America was more educated than the ones who colonized the North America, which is the opposite of what Glaeser et al.(2004) suggested. Based on the work of Avellaneda(1995), average literacy of the conquistadors in five different expeditions to South America was 78.7%. The main reasons were that early colonizers of South America came from urban areas in Spain and they were mostly second or thirs sons of nobles who could not inherit any land under Spanish law.

Even though they were highly educated people amoong the early colonizers of North America, they consitute the small portion of people who migrated the North America. Grubb(1990) exploited the jury list to come up with a literacy rate and the figures suggest that the literacy rate of colonizers of North America in Virginia was 54% in the 1600s.

When we looked at the literacy and education level in nineteenth century, North America was better than South America. However, this doesn't has nothing to do with the level of human capital they brough when they first settled down. Instead, it has everything to do with the institution they created, which increased the investment in human capital and school construction.
To sum up, historical evidences don't provide support to the idea that the differences in countries' economic development level is because of the variation in human capital they brought when they first settled rather than institutions they formed. 

### 3. Data and Descriptive Statistics<a class="anchor" id="data_and_descriptive_statistics"></a>   

The authors utilized two different data sets. The first data consists of information for 62 countries which were once colonized. The second data includes informations for 684 regions in 48 different former colonies. **[Table 1](#table_1)** shows description statistics for each data sets.

##### 3.1 Cross-Country Data<a class="anchor" id="cross_country_data"></a>   

The main dependent variable is the log GDP per capita( purchasing power parity basis) in 2005. The log GDP per capita of an average country in the cross-country sample is 8.29. The same variable for an average region is so similar, 8,35. Average years of schooling of the population above age 15 in 2005 is used as the main indicator of current educational attainment for cross-country analyis. The average value of current educational attainment variable is 6 years. The main measure for current instutional level of a country is rule of law index for 2005. Rule of law index measures the extent to which countries comply with the rule of law in practive. It has 8 categories; limited government powers, absence of corruptioni, order and security, fundamental rights, open government, regulatory enforcement, civil justice, and criminal justice. The index varies between -2,5 (weak adherence to rule of law) and 2,5 (strong adherence to rule of law). The average value for cross country sample is -0.32. As intrumental varibles, the authors use the log of potential settler mortality(capped at a maximum level of 250) and the lof population density in 1500.

##### Table 1: Summary Statistics<a class="anchor" id="table_1"></a>   

In [4]:
country_data=pd.read_stata("data/xcountry_data.dta")
region_data = pd.read_stata("data/xregion_data.dta")
table1=get_summary_statistics(country_data,region_data)
display(table1)

Unnamed: 0,Unnamed: 1,Observations,Mean,SD
Cross-country sample,Log GDP per capita,62.0,8.291009,1.213168
Cross-country sample,Years of schooling,62.0,6.179061,2.878306
Cross-country sample,Rule of law,62.0,-0.326935,0.89511
Cross-country sample,Primary school enrollment 1900,62.0,16.664515,23.046547
Cross-country sample,Protestant missionaries in the early twentieth century,62.0,0.457945,0.547192
Cross-country sample,Log capped potential settler mortality,62.0,4.444997,0.960907
Cross-country sample,Log population density 1500,62.0,0.545168,1.727288
Cross-country sample,Dummy for different source of Protestant missions,62.0,0.096774,0.298063
Cross-country sample,Latitude,62.0,0.180646,0.134148
Cross-country sample,British colony,62.0,0.387097,0.491062


##### 3.2. Sources of Variation in Human Capital<a class="anchor" id="sources_of_variation_in_human_capital"></a>   

The authors' main source of potentially exogenous variation in human capital is Protestant missionary activities per 10.000 people in the early twentieth century. They use two different data. During all the analysis, a dummy variable will be added to indicate that a different source of information for Protestant missionary activities is used. One can argue that missionary activities are excludable from regressions of economic development for the following reason. First, the location of missionary activities was clearly chosen. Second, missionary activities in French and British colonies were different as well as across different continents. Third, missionary activity might have affected the evolution of some instutitons such as emergence of democracy or schooling system. Fourth, missionary activity might have influenced long-run development by affecting the current religious composition of the population. However, by controlling continent dummies , the identity of the colonial power and institutions, the allocatio of missionary activities across and within countries may be a candidate for insturmental variable for human capital. The average value of Protestant missionary activities per 10.000 people is  0.46.
Another source of variation in human capital is primary school enrollment rates in 1900. Only one data set is utilized and for countries with missing values in the data set, enrollment rate of  0.6% is imputed by the authors. 

##### 3.3. Regional Data<a class="anchor" id="regional_data"></a>   

For regional data, the main dependent variable is again log GDP per capita. The average value for log GDP per capita for regions is 8.35. The main indicator of current education attatintment is again average years of schooling of the population above age 15 in 2005 and the average value is 5.7 years of schooling. The exogenous variable to explain the variation in average years of schooling today is again historical variation in Protestant missionaries. However, in this time, authors utilized the location of mission stations rather than the total number of missionaries normalized by population. A dummy variable is created to distinguish regions with and without Protestant missionaries.

There were 4 main contrubitors for the location of missions station within the countries; climate and geograpy, path dependence in terms of previous missionary work, different strategies applied by Protestant missionaries when they face a group with competing religious, and possible interest in places with a large native population. As a result, the authors created dummy variables for whether the region was landlocked and distance to the sea ( proxies for transportations costs), climate conditions, and the capital of the country around 1920.

Because there is not reliable measures of institutions within country, the authors focused on the returns to human capital using variation in the presence of Protestant missionaries. In robustness check, a proxy for the population density before colonization is used because it might have affected the regional path of instutitional development.

### 4. Cross-Country Evidence<a class="anchor" id="cross_country_evidence"></a>   

In this section, the authors first show the correlation between human capital and instituions, and GDP per capita. The results suffer from omitted variable biases. Then, they present semistructural models in which one of the one of institutions and human capital are instrumented and various historical determinants were controlled to decresea the effect of omitted variable biases problem. The effect of human capital on current economic development is decreased. Finally, the authors present 2SLS and limited information maximum likelihood(LIML) models where both institutions and human capital are treated as endogenous. 

##### 4.1 Ordinary Least Squares Regressions <a class="anchor" id="ordinary_least_squares_regressions1"></a>   

In **[Table 2](#table_2)**, various OLS regressions are run to present the correlation between economic development today (measured by GDP per capita in 2005) and measures of human capital and institutions. The sample consists of 62 former countries for which there is data. Heteroscedasticity robust standard errors are shown in parantheses. In the first column, the results belong to the regression of years of schooling on log GDP per capita in 2005. The relationship between them is significant with a coefficient of 0.352. The coefficient is very large. Considering the fact that the coefficient of years of schooling should match with the coefficient estimated with Mincer equation (generally estimated to be between 0.06 and 0.10) with an elastic supply of capital, no externalitiesand no omitted variable bias, the huge difference between coefficients suggests that there are omitted variable bias as the huge human capital externalities weren't supported by existing literature. Column 2 shows the relationship between the rule of law and log GDP per capital. There is a strong correlation between them with a coefficient of 0.930. Column 3 consists of the result of regression with years of schooling and the rule of law as explanatory variables. The coefficient of the rule of law significantly decreases whereas the effect of years of schooling slightly decreases, which is still relatively higher than Mincerian estimate.
Other columns shows the result of regressions with different combination of control variables. The control variables are latitute( absolute value of distance from the country to the equator), dummy variables for the continent of Africa, America, Asia where Australasia is the omitted group, and dummies for British and French colonies where other European countries are omitted variables. The motivating reason why there are control variables for countries are that they may have had different institutions, human capital policies and different types of missionary activities. The control variables have a small impact on the result. Even though the column 12 includes all the control variables, there is small decrease in years of schooling.

However, there are some potential problems in the results. The first reasons is the possible differential measurement error in human capital and institutions.Because human capital is partly determined by institutions and correlated with them, some of the effects of institutions on log GDP per capita will be loaded on to human capital, which means upward bias in the estimates of human capital and downward bias in the estimates of institutions. This problem can be corrected by including instrumental variable into our model. Second, there might be a reverse causality between human capital and log GDP per capita.Higher income might lead to higher schooling level.

To sum up, even though there is a correlation between human capital and institutions, and economic development of a country, these cannot be seen as a casual relation beacuse of possible omitted variable problem.

##### Table 2 Ordinary least squares (OLS) cross-country regressions<a class="anchor" id="table_2"></a>  

In [7]:
get_table2(country_data)

0,1,2,3,4,5,6,7,8,9,10,11,12
,,,,,,,,,,,,
,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita,Dependent Variable: log GDP per capita
,,,,,,,,,,,,
,(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
,,,,,,,,,,,,
Years of schooling,0.352***,,0.287***,0.332***,,0.286***,0.304***,,0.229***,0.322***,,0.248***
,(0.028),,(0.037),(0.036),,(0.038),(0.055),,(0.058),(0.058),,(0.061)
Rule of law,,0.930***,0.315**,,0.865***,0.280,,0.818***,0.411*,,0.821***,0.428**
,,(0.101),(0.139),,(0.144),(0.196),,(0.177),(0.216),,(0.188),(0.218)
Latitude,,,,1.072,0.801,0.460,1.110,0.067,0.288,1.132,0.053,0.301
