In [1]:
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np
import urllib
import os
import matplotlib as mpl
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col
from IPython.display import Latex
from sklearn.linear_model import LinearRegression

Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib


In [2]:
pathout = './data1/'
if not os.path.exists(pathout):
    os.mkdir(pathout) #si no existe data crea un path
    
pathgraphs = './graphs1/'
if not os.path.exists(pathgraphs):
    os.mkdir(pathgraphs)

To reproduce this data please use the datasets in the PR.

# The European Origins of economic development, happiness and R&D

Juan Pablo Rodríguez - Summer School of Economics

**Abstract** This replication paper is fully based in William Easterly and Ross Levine paper *The European origins of economic development*, and its objective is to add both robustness to the results and complement the analysis. This paper aids the original authors proving that bigger settlements of Europeans during the colonial epoch was growth enhancing. Also, this replication paper attempts to generate more questions of this specific topic, by showing that they may be a strong relation between the share of European descendants of a country with the happiness score and R&D with bigger settlements in countries that were colonized by Europe.

## 1. Introduction


People around the world, including those who study social sciences rigorously, have asked themselves: "Why are some countries more successful than others?". Academics have attempted to answer the question, and point at all historical events that played a role on determining the huge divergence between economies that we can observe nowadays. From all the way back to when even our species didn't existed to events that happened in the past century, researchers have found evidence on different episodes of our history that have had impact and complement the answer to our target question. In the paper that this document will attempt to replicate we can find more evidence of how history has impact our economies today. The authors William Easterly and Ross Levine wrote *The European origins of economic development*, in which they check the relation between how big were the European settlements in colonial times and the current income per capita.

To explain divergent paths, many authors have done research in which they explain different mechanisms in which European share of the population is a factor that determines economic growth. Engerman and Sokoloff (1997) and Acemoglu et. al. (2001, 2002) explain how Europeans had an important effect on political institutions. Taking in mind Acemoglu and Robinsons (2012) interpretation on the different types of institutions, the European share played a determinant role on whether the colonies (during their period as colony and after their independence) adopted inclusive institutions or not. This authors argue that Europeans decided whether they established a big settlement depending on lands, climate, and disease environment. Some variables, that were also included in the robustness check of Easterly and Levine (2016) (EL), are latitude, indigenous mortality based on diseases brought by Europeans and population density before colonization. If the land was suitable they opted for a big settlement, if not they went for a rather small one. Given that in small settlements it was more plausible to go for practices like slavery and high impositive policies, usually this colonies, both at the time and as time passed by, tend to adopt extractive institutions. Large settlements went the other way around.

On the other hand, Glaeser et. al (2004) argues that European share brought to the colonies human capital, and is strongly correlated with human capital held today. This is due to both knowledge brought from the old world and human capital enhancing institutions, which resulted in long-run economic growth (Galor, 2011). Taking these factors into account, we can expect that European share during the colonial epoch has an impact in today's income per capita, showing that all the factors described above have an either negative or positive effect on economic growth. Easterly and Levine (2016) (EL), during the paper that is currently being replicated in this document, attempt to prove that.

In this replication paper, at first, using the replication files given by the authors, I attempt to rebuild the results that the original paper in which this document will like to develop. Using the already processed data, that EL explain more profoundly how it was obtained and why is data that we as readers can trust, I re-run some of the linear regressions and try to summarize the original paper's results in the subsequent section. They are able to show how, with different controls, the relation between European share and current income per capita holds. Then, using that same data, I run an additional robustness check in which I try to prove that, even when a constraint for European share is implemented, the results do not change. After that, this document postrates two additional questions: (1) Given the well-known positive correlation between happiness measurements, does the European share and happiness measurements have a significant and positive correlation as expected? and (2) Given the results presented by Glaeser et. al (2004) and the impact in human capital of European share, does European share affect rates of research and development? Finally, we conclude giving additional ideas for related research topics and give concluding remarks. 


# 2. Original Paper Summary

After giving a proper introduction, the paper discusses where the data for the statistical analysis. All data comes from primary and secondary sources from historical checks done by the authors. They compile data from as much dates as possible, create dummy variables for different sets of data and average variables in order to run the pertinent regressions. Also, the authors generate data of their own that includes the European share of the population in a pertinent date that suits the analysis and the indigenous mortality given the arrival of Europeans to the New World. **Table 0** shows the descriptive statistics of the variables contained in the dataset (to read the entire description check the original paper).

In [3]:
data = pd.read_stata(pathout + "DataEurope.dta")
data

Unnamed: 0,country,country_code,educ,legalorig,ethnic,latitude,london,pop_den1500,govtquality,gold_silver,...,unams,average15001800,average18011900,average15001900,soilsuit,distcr,dumeshare0,indigmort,indy,biogeography
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
227,"Venezuela, RB",VEN,66.112823,0.0,0.052500,0.088889,4254.057129,-0.820980,-1.668789,1.0,...,1.0,0.09775,0.264339,0.139397,0.499980,0.127570,0.0,1.0,0.828431,
228,Vietnam,VNM,55.678280,0.0,0.117647,0.177778,8992.250000,1.815571,-1.328232,0.0,...,0.0,,,,0.506609,0.079995,1.0,0.0,0.171569,
229,"Yemen, Rep.",YEM,44.515240,0.0,0.012200,0.166667,4506.000000,,-2.081896,0.0,...,0.0,,,,0.193277,0.140977,1.0,0.0,,
230,Zambia,ZMB,25.889288,1.0,0.829398,0.166667,4912.000000,-0.235722,-1.043453,1.0,...,0.0,,,,0.579539,0.992196,0.0,0.0,0.078431,-0.670678


In [4]:
print('Table 0: Descriptive Statistics')
data.describe()

Table 0: Descriptive Statistics


Unnamed: 0,educ,legalorig,ethnic,latitude,london,pop_den1500,govtquality,gold_silver,logy,eshare,...,unams,average15001800,average18011900,average15001900,soilsuit,distcr,dumeshare0,indigmort,indy,biogeography
count,123.0,130.0,116.0,130.0,122.0,95.0,129.0,130.0,124.0,129.0,...,130.0,35.0,33.0,42.0,108.0,108.0,130.0,130.0,120.0,84.0
mean,57.822605,0.4,0.379158,0.201885,5410.364746,0.476286,-0.457015,0.276923,8.183579,0.071848,...,0.284615,0.215731,0.227009,0.217485,0.52765,0.340537,0.476923,0.338462,0.257761,-0.012123
std,30.753176,0.491793,0.315732,0.123015,2342.119385,1.541625,1.932904,0.449209,1.252599,0.166132,...,0.452977,0.220815,0.257872,0.230388,0.186639,0.375817,0.501399,0.475017,0.319301,1.304578
min,5.598784,0.0,0.0,0.011111,1381.38501,-3.830918,-4.914392,0.0,5.479712,0.0,...,0.0,0.016952,0.00113,0.00113,0.161921,0.007952,0.0,0.0,0.0,-1.018302
25%,28.761887,0.0,0.066101,0.111111,3873.0625,-0.203956,-1.783444,0.0,7.159359,0.0,...,0.0,0.07736,0.050706,0.072854,0.417782,0.082924,0.0,0.0,0.04902,-0.670678
50%,60.418678,0.0,0.325552,0.183333,4802.0,0.405465,-0.609464,0.0,8.155199,0.002,...,0.0,0.124407,0.088138,0.132767,0.51395,0.210849,0.0,0.0,0.098039,-0.649714
75%,82.171204,1.0,0.709935,0.280278,6328.75,1.442202,0.771718,1.0,9.094742,0.059,...,1.0,0.251345,0.324039,0.227308,0.636835,0.474144,1.0,1.0,0.362745,-0.065386
max,152.842865,1.0,1.0,0.666667,11769.290039,4.609731,4.615444,1.0,11.04429,0.905,...,1.0,0.904798,0.971909,0.927168,0.948877,1.71954,1.0,1.0,1.0,3.791256


Given a complete and trustable dataset to work with, the authors test the arguments given by Engerman and Sokoloff (1997) and Acemoglu et. al. (2001, 2002) in which, using the variables that describe latitude, population density in 1500, indigenous mortality, precious metals in the territory, distance between the colony and London, prehistoric availability of crops, malaria ecology and settlers mortality. The regression will not be shown in this paper, given that it is only relevant to mention that theories provided by Engerman and Sokoloff (1997) and Acemoglu et. al. (2001, 2002) were proven right by the linear regression and is fair to assume that there might be a significant correlation between European share and current income per capita. Given the previous, economic intuition guides us to think that the relation that the original paper searches to prove can stand. 

In **Table 1** we can see the first and most important regression in which the main results are presented, and we can see that European share seems to be positively correlated with current income in almost every case. In column (1) we can observe that a change of 0.1 in European share increases current income in 0.36, but, to give concrete answers, the subsequent columns run controls in order to show that European share is significant in all cases. We can see in columns (2) and (4) that even when adding the legal origins and the years past after independence we don't see a change in significance or the effect of European share on current income, but the controls seem not to be relevant. Column (6) shows a rather similar behavior, although this time we can see that multi ethnicities do have an impact on the current income. Columns (3) and (5) may seem odd, but this goes coherently with the checked literature. In one side, given column (3), we can see an argument that supports Glaeser et. al (2004) postulates, because education seems to cover European share's significance due to the almost perfect multicollinearity that both present. The previous shows that European share do had an impact in human capital building. We can say the same for column (5) and the postulates of Engerman and Sokoloff (1997) and Acemoglu et. al. (2001, 2002), where we can see that the government's quality could present high multicollinearity with European share due to the impact of big settlements in institutions.


In [5]:
mod1 = smf.ols(formula='logy ~ eshare', data=data, missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='logy ~ eshare+legalorig', data=data, missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='logy ~ eshare+educ', data=data, missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='logy ~ eshare+indy', data=data, missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='logy ~ eshare+govtquality', data=data, missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='logy ~ eshare+ethnic', data=data, missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='logy ~ eshare+indy+legalorig+ethnic', data=data, missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 1: Current Income vs Euroshare")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','legalorig','educ','indy','govtquality','ethnic']))

Table 1: Current Income vs Euroshare

               (1)       (2)       (3)       (4)       (5)       (6)        (7)    
-----------------------------------------------------------------------------------
eshare      3.6234*** 3.6255*** 0.6207    3.4531*** 0.5106    3.4367***  3.0889*** 
            (0.4562)  (0.5061)  (0.5117)  (0.5024)  (0.5024)  (0.4120)   (0.5028)  
legalorig             -0.0024                                            0.1442    
                      (0.2273)                                           (0.2245)  
educ                            0.0309***                                          
                                (0.0026)                                           
indy                                      0.3797                         0.3440    
                                          (0.3537)                       (0.3546)  
govtquality                                         0.4290***                      
                                      

Still, the authors question the robustness of the first regression due to the presence of highly developed economies that may alter the regression given that these countries coincide with those countries that have the biggest Europeans as share of their population in the colonial epochs. That is why the authors control by the restriction of only including countries that had less than 12.5% of European share in colonial times. **Table 2** shows that the results gain magnitude with this change and hold how the relevance of European share behaves with every control. We can see that with an increase of 0.1 in European share we will see an increase of 0.83 in current income per capita.

In [6]:
mod1 = smf.ols(formula='logy ~ eshare', data=data[data['eshare']<0.125], missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='logy ~ eshare+legalorig', data=data[data['eshare']<0.125], missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='logy ~ eshare+educ', data=data[data['eshare']<0.125], missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='logy ~ eshare+indy', data=data[data['eshare']<0.125], missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='logy ~ eshare+govtquality', data=data[data['eshare']<0.125], missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='logy ~ eshare+ethnic', data=data[data['eshare']<0.125], missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='logy ~ eshare+indy+legalorig+ethnic', data=data[data['eshare']<0.125], missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 2: Current Income vs Euroshare (Euroshare<0.125)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','legalorig','educ','indy','govtquality','ethnic']))

Table 2: Current Income vs Euroshare (Euroshare<0.125)

               (1)       (2)       (3)       (4)       (5)       (6)        (7)    
-----------------------------------------------------------------------------------
eshare      8.3784*** 8.4009*** -0.9037   8.0929*** 3.6118    9.8456***  8.9866*** 
            (2.3488)  (2.3617)  (2.2944)  (2.5707)  (2.4528)  (2.4132)   (2.6148)  
legalorig             -0.0365                                            0.0831    
                      (0.2442)                                           (0.2290)  
educ                            0.0326***                                          
                                (0.0030)                                           
indy                                      0.4271                         0.4188    
                                          (0.4028)                       (0.3845)  
govtquality                                         0.4267***                      
                    

Even though robustness may be looking pleasantly, the authors run another robustness test with the objective of proving that the size of the settlement during colonial epoch is really what has a significant impact. The authors postulate that European share may be acting as a proxy of the European descendants that would also include all migration that occurred in the 19th and 20th century. EL take a quantitatively represented variable from Putterman and Weil (2010) that shows precisely the European share of descendants nowadays and use it as a control variable for the original regression. In **Table 3** it is shown that the relevance of European share during colonial times hold, but it loses magnitude. This gives a hint that other migration waves that took place also played a big role on the determination of the gap, not only because we can see an increase of only 0.2 in current income for every increase of 0.1 in European share (compared to the previous increase of 0.36), but also because we can see that the European descendants share is also significant in most cases.

In [7]:
mod1 = smf.ols(formula='logy ~ eshare+euro2000pw', data=data, missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='logy ~ eshare+euro2000pw+legalorig', data=data, missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='logy ~ eshare+euro2000pw+educ', data=data, missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='logy ~ eshare+euro2000pw+indy', data=data, missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='logy ~ eshare+euro2000pw+govtquality', data=data, missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='logy ~ eshare+euro2000pw+ethnic', data=data, missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='logy ~ eshare+euro2000pw+indy+legalorig+ethnic', data=data, missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 3: Current Income vs Euroshare (Colonization or Today)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','euro2000pw','legalorig','educ','indy','govtquality','ethnic']))

Table 3: Current Income vs Euroshare (Colonization or Today)

               (1)       (2)       (3)       (4)       (5)       (6)        (7)    
-----------------------------------------------------------------------------------
eshare      1.9051*** 1.6766*** 0.4119    2.0642*** -0.3802   2.1036***  1.9535*** 
            (0.3829)  (0.5737)  (0.5828)  (0.4559)  (0.6294)  (0.3516)   (0.5449)  
euro2000pw  1.3577*** 1.4762*** 0.1621    1.0996*   0.9037**  1.1152***  0.9009    
            (0.3534)  (0.4042)  (0.3711)  (0.6210)  (0.3823)  (0.3900)   (0.5848)  
legalorig             0.1293                                             0.2031    
                      (0.2619)                                           (0.2414)  
educ                            0.0316***                                          
                                (0.0032)                                           
indy                                      0.3190                         0.3550    
              

As with **Table 2**, the authors use a restriction of only countries that had a European share of 12.5% as a robustness check for the regression shown in **Table 3**. In **Table 4** I show that results barely hold, but, as before, we see that magnitudes increase (0.6 increase of current income for every increase of 0.1 of European share). However, we also see that relevance of European share barely holds, but with a significance level of 5% we have similar conclusions compared to **Table 2**.

In [8]:
mod1 = smf.ols(formula='logy ~ eshare+euro2000pw', data=data[data['eshare']<0.125], missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='logy ~ eshare+euro2000pw+legalorig', data=data[data['eshare']<0.125], missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='logy ~ eshare+euro2000pw+educ', data=data[data['eshare']<0.125], missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='logy ~ eshare+euro2000pw+indy', data=data[data['eshare']<0.125], missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='logy ~ eshare+euro2000pw+govtquality', data=data[data['eshare']<0.125], missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='logy ~ eshare+euro2000pw+ethnic', data=data[data['eshare']<0.125], missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='logy ~ eshare+euro2000pw+indy+legalorig+ethnic', data=data[data['eshare']<0.125], missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 4: Current Income vs Euroshare (Colonization or Today)(Euroshare<0.125)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','euro2000pw','legalorig','educ','indy','govtquality','ethnic']))

Table 4: Current Income vs Euroshare (Colonization or Today)(Euroshare<0.125)

               (1)       (2)       (3)       (4)       (5)       (6)        (7)    
-----------------------------------------------------------------------------------
eshare      6.4554**  6.0872**  -1.7091   7.4714**  2.3294    8.7021***  8.6491*** 
            (2.7146)  (2.8657)  (2.5660)  (2.8854)  (3.1451)  (2.6522)   (3.0440)  
euro2000pw  1.0259**  1.1361**  0.2871    0.6208    0.7571    0.7008     0.3504    
            (0.4472)  (0.5005)  (0.4613)  (0.7790)  (0.5140)  (0.4553)   (0.7403)  
legalorig             0.1018                                             0.1620    
                      (0.2683)                                           (0.2429)  
educ                            0.0333***                                          
                                (0.0035)                                           
indy                                      0.3858                         0.4428  

These are some examples of how the original authors arrive to their conclusion running different controls to show a strong relation between economic growth and the size of settlements established by the Europeans during the colonial epoch. This paper shows some examples of how the authors runned several robustness checks, but to see all of them I recommend to check the original paper. The authors also run regressions where they exclude countries that were not colonized by Europeans or even not colonized at all with the approaches of all the regressions represented in this paper up to this point. As expected, the results hold in terms of a relevant positive relation.

# 3. Additional Robustness Checks

The original paper does a great job on the pertinent robustness checks, and is safe to conclude that there is a correlation between size of settlements in colonial times and current income per capita nowadays, but this replication paper shows an extra check to see if results hold with countries that had big settlements when they were colonies. This will help, not only to say if the results given by the original paper are robust, but also to see if there is a critical portion of European share in which countries seem to be equally rich and developed.

In **Table 5** we see that the robust control has been applied with a 5.5% minimum European share of the population during the colonial epoch (this number was chosen using the thumb rule of at least 30 observations to have relatively robust results). Although we can see that results hold (taking into account results given by **Table 2**), this time column (3) and (5) do not behave as they did before, because this time European share seems to be relevant to the regression. The given results make us question one of two different issues that should be solved: the size of the sample may be too small to show trustable results or there is a more significant reason why the European share of the population during colonial times are growth enhancing and create a difference between the big settlement countries.


In [9]:
mod1 = smf.ols(formula='logy ~ eshare', data=data[data['eshare']>0.055], missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='logy ~ eshare+legalorig', data=data[data['eshare']>0.055], missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='logy ~ eshare+educ', data=data[data['eshare']>0.055], missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='logy ~ eshare+indy', data=data[data['eshare']>0.055], missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='logy ~ eshare+govtquality', data=data[data['eshare']>0.055], missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='logy ~ eshare+ethnic', data=data[data['eshare']>0.055], missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='logy ~ eshare+indy+legalorig+ethnic', data=data[data['eshare']>0.055], missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 5: Current Income vs Euroshare (Euroshare>0.055)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','legalorig','educ','indy','govtquality','ethnic']))

Table 5: Current Income vs Euroshare (Euroshare>0.055)

               (1)       (2)       (3)       (4)       (5)       (6)       (7)   
---------------------------------------------------------------------------------
eshare      2.5948*** 2.4634*** 1.9974*** 2.6056*** 1.6421*** 2.5296*** 2.3060***
            (0.2817)  (0.3056)  (0.3043)  (0.2969)  (0.4179)  (0.2354)  (0.4317) 
legalorig             0.1399                                            0.2082   
                      (0.1659)                                          (0.2829) 
educ                            0.0091**                                         
                                (0.0036)                                         
indy                                      -0.0351                       0.0892   
                                          (0.2218)                      (0.3328) 
govtquality                                         0.1467**                     
                                          

I also decided to apply the same restriction for the regression that includes the share of descendants of Europeans of countries nowadays, so that there exists full security that is settlement size what really has an effect on current income per capita. This issue may be more relevant in developed countries given the massive migration from Europe to these countries in the 19th and 20th century. In **Table 6**, adding this variable to the regression does not seem to have changed the regression that much. Compared to the results obtained in **Table 3** the biggest change is that descendants do not seem to be relevant anymore. This again concludes in two different possibilities: problems due to the size of the sample or descendants may be covered by European share of the population during the colonial epoch. This may be due to the success that big settlements provoque for the countries in which they were located, giving what can be possibly the real reason of why the massive migrations of the 19th and 20th century took place.

In [10]:
mod1 = smf.ols(formula='logy ~ eshare+euro2000pw', data=data[data['eshare']>0.055], missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='logy ~ eshare+euro2000pw+legalorig', data=data[data['eshare']>0.055], missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='logy ~ eshare+euro2000pw+educ', data=data[data['eshare']>0.055], missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='logy ~ eshare+euro2000pw+indy', data=data[data['eshare']>0.055], missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='logy ~ eshare+euro2000pw+govtquality', data=data[data['eshare']>0.055], missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='logy ~ eshare+euro2000pw+ethnic', data=data[data['eshare']>0.055], missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='logy ~ eshare+euro2000pw+indy+legalorig+ethnic', data=data[data['eshare']>0.055], missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 6: Current Income vs Euroshare (Colonization or Today)(Euroshare>0.055)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','euro2000pw','legalorig','educ','indy','govtquality','ethnic']))

Table 6: Current Income vs Euroshare (Colonization or Today)(Euroshare>0.055)

               (1)       (2)       (3)       (4)       (5)       (6)       (7)   
---------------------------------------------------------------------------------
eshare      2.1416*** 2.1264*** 1.7901*** 2.0118*** 1.6510*** 2.1432*** 2.1233***
            (0.3064)  (0.3870)  (0.2885)  (0.3516)  (0.4642)  (0.3132)  (0.4331) 
euro2000pw  0.4364    0.4448    0.4905*   0.6912**  0.4796    0.4331    0.6818*  
            (0.2817)  (0.2786)  (0.2736)  (0.3273)  (0.2888)  (0.2916)  (0.3432) 
legalorig             0.0092                                            -0.0906  
                      (0.1772)                                          (0.2294) 
educ                            0.0045                                           
                                (0.0032)                                         
indy                                      -0.2492                       -0.3246  
                   

## 4. Complementary Analysis

The conclusions given by the analysis done by the original paper and the complementary robustness check done in this replication paper give good hints of possible questions that researchers may answer in following years. The rest of this replication paper will attempt to give raw answers to two possible questions that may be answered taking the previous analysis as a basis: (1) How does settlement size determine the happiness of a country? and (2) How does settlement size impact research and development(R&D)? To answer these questions more data was necessary. As for the first question, the data was obtained from the World Happiness Report (2016), and for the second question R&D information was obtained from the World Bank Open Databases (2020). The previously described data was merged with the database from the original paper using country name and country code, where the merge appeared to be well constructed due to the perfect correlation between the income per capita information from both databases. The important variables that were added were: freedom, R&D and happiness score (**Table 0a** shows descriptive statistics).

In [11]:
data = pd.read_stata(pathout + "DataF.dta")
data

Unnamed: 0,CountryName,country_code,RD,educ,legalorig,ethnic,latitude,london,pop_den1500,govtquality,...,happinessrank,happinessscore,standarderror,economygdppercapita,family,healthlifeexpectancy,freedom,trustgovernmentcorruption,generosity,dystopiaresidual
0,Colombia,COL,0.28975,73.094482,0,0.055796,0.044444,4622.706055,-0.037970,-1.260229,...,33,6.477,0.05051,0.91861,1.24018,0.69077,0.53466,0.05120,0.18401,2.85737
1,Syrian Arab Republic,SYR,0.02053,47.857891,0,0.094792,0.388889,3333.399902,,-1.978333,...,156,3.006,0.05015,0.66320,0.47489,0.72193,0.15684,0.18906,0.47179,0.32858
2,Bolivia,BOL,,81.162735,0,0.599412,0.188889,6082.000000,-0.186369,-0.569951,...,51,5.890,0.05642,0.68133,0.97841,0.53920,0.57414,0.08800,0.20536,2.82334
3,Iraq,IRQ,0.03775,39.695057,0,,0.366667,6491.625000,,-4.914392,...,112,4.677,0.05232,0.98549,0.81889,0.60237,0.00000,0.13788,0.17922,1.95335
4,Zimbabwe,ZWE,,41.849960,1,0.598636,0.222222,5131.000000,-0.235722,-2.387497,...,115,4.610,0.04290,0.27100,1.03276,0.33475,0.25861,0.08079,0.18987,2.44191
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97,Saudi Arabia,SAU,,67.600723,1,,0.277778,5035.909180,,-0.338354,...,35,6.411,0.04633,1.39541,1.08393,0.72025,0.31048,0.32524,0.13706,2.43872
98,Kuwait,KWT,0.09660,92.714874,0,,0.325556,6445.200195,,0.958377,...,39,6.295,0.04456,1.55422,1.16594,0.72492,0.55499,0.25609,0.16228,1.87634
99,Tanzania,TZA,,5.598784,1,0.890247,0.066667,6323.200195,0.683378,-1.122524,...,146,3.781,0.05061,0.28520,1.00268,0.38215,0.32878,0.05747,0.34377,1.38079
100,Brazil,BRA,1.34264,104.677261,0,0.055781,0.111111,4904.435059,-2.134937,0.086519,...,16,6.983,0.04076,0.98124,1.23287,0.69702,0.49049,0.17521,0.14574,3.26001


In [12]:
print('Table 0a: Descriptive Statistics 2')
data.describe()

Table 0a: Descriptive Statistics 2


Unnamed: 0,RD,educ,legalorig,ethnic,latitude,london,pop_den1500,govtquality,gold_silver,logy,...,happinessrank,happinessscore,standarderror,economygdppercapita,family,healthlifeexpectancy,freedom,trustgovernmentcorruption,generosity,dystopiaresidual
count,49.0,99.0,102.0,94.0,102.0,96.0,79.0,102.0,102.0,101.0,...,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0
mean,0.722839,54.441555,0.343137,0.395645,0.209601,5432.698242,0.546428,-0.654201,0.313725,8.141905,...,89.254902,5.128676,0.050982,0.735495,0.924067,0.552001,0.427906,0.143282,0.243791,2.102164
std,0.895763,30.465824,0.477101,0.309839,0.128643,2174.058594,1.632908,1.953529,0.466298,1.293482,...,46.095829,1.142712,0.019316,0.406559,0.277217,0.250758,0.14498,0.111451,0.115524,0.601794
min,0.01497,5.598784,0.0,0.0,0.011111,1381.38501,-3.830918,-4.914392,0.0,5.479712,...,5.0,2.839,0.02043,0.0153,0.0,0.0,0.0,0.0,0.05841,0.32858
25%,0.12033,27.950544,0.0,0.095434,0.111111,4014.749756,-0.203956,-1.875347,0.0,7.013269,...,47.25,4.302,0.0378,0.367598,0.772913,0.359783,0.33385,0.073305,0.170837,1.755563
50%,0.41763,56.989048,0.0,0.336859,0.188889,5055.668945,0.431782,-0.838343,0.0,8.019561,...,95.5,4.9345,0.045985,0.758815,0.98181,0.60887,0.43879,0.10877,0.21613,2.10049
75%,0.79846,77.651855,1.0,0.712835,0.3025,6453.725098,1.44785,0.130659,1.0,9.090884,...,130.25,5.98175,0.05636,1.017605,1.115125,0.731332,0.541323,0.17331,0.325037,2.483792
max,4.21702,152.842865,1.0,1.0,0.666667,11358.719727,4.609731,4.615444,1.0,11.04429,...,158.0,7.427,0.13693,1.69042,1.32261,1.02525,0.66246,0.55191,0.5763,3.60214


In **Table 7** we can see the first linear regression for the relation of happiness and European population share during the colonial epoch. In column (1) we can see that the first intuition is that there is a positive relevant correlation between the size of the settlements and happiness nowadays. The results show that an increase of 0.1 in European share has an impact in happiness score by increasing 0.358. This seems logical due to the effect that we know that European share has in income per capita nowadays, that is why column (2) controls for the current income per capita, and shows that, although the control reduces the impact of European share by half, there is a part of European share that still explains happiness. Column (3) is rather peculiar, because this time european descendants do cancel the effect of European Share, giving the feeling that this time migrations after the colonization epoch do have an impact in happiness. Columns (4) and (5) show that controlling by freedom and multiethnicity rate do not have a huge impact in either the magnitude or the relevance of the effect of European share. Column (6) brings back Glaeser et. al (2004) interpretation and reduces the effect of European share on happiness, but it still conserves its relevance to the regression. As for column (7), we can see that the reduction of columns (2) and (6) keep having an impact on the magnitude of the coefficient of European share, but it still has an impact on happiness score.

In [13]:
mod1 = smf.ols(formula='happinessscore ~ eshare', data=data, missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='happinessscore ~ eshare+logy', data=data, missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='happinessscore ~ eshare+euro2000pw', data=data, missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='happinessscore ~ eshare+freedom', data=data, missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='happinessscore ~ eshare+ethnic', data=data, missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='happinessscore ~ eshare+educ', data=data, missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='happinessscore ~ eshare+logy+freedom+ethnic+educ', data=data, missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 7: Happiness vs Euroshare")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','logy','euro2000pw','freedom','ethnic','educ']))

Table 7: Happiness vs Euroshare

              (1)       (2)       (3)       (4)       (5)        (6)       (7)   
---------------------------------------------------------------------------------
eshare     3.5848*** 1.5574*** 0.2934    2.5925*** 3.4111***  1.3042*** 1.0482***
           (0.4435)  (0.3598)  (0.3580)  (0.4074)  (0.4083)   (0.3628)  (0.3554) 
logy                 0.6005***                                          0.2737** 
                     (0.0577)                                           (0.1312) 
euro2000pw                     2.8339***                                         
                               (0.3852)                                          
freedom                                  3.2722***                      1.7839** 
                                         (0.6844)                       (0.6773) 
ethnic                                             -1.0916***           -0.2253  
                                                   (0.3082)      

As the original paper, I introduce a restriction of removing all countries that had a high European population share in the colonial times (12.5% or more, given the same concerns as that the original authors held, and as it is clear in **Table 8** the results hold. Also, its evident that the magnitude of the coefficients grew in most of the regressions and that gives reasons to think that the magnitude is greater for countries that were hosts of smaller settlements (for example, in column (1) an increase of 0.1 in European share results in an increase of 1.3 on the happiness score). Still, in column (7) it is evident that the magnitude of the effect is not significantly greater than the one in column (7) of **Table 7**.

In [14]:
mod1 = smf.ols(formula='happinessscore ~ eshare', data=data[data['eshare']<0.125], missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='happinessscore ~ eshare+logy', data=data[data['eshare']<0.125], missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='happinessscore ~ eshare+euro2000pw', data=data[data['eshare']<0.125], missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='happinessscore ~ eshare+freedom', data=data[data['eshare']<0.125], missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='happinessscore ~ eshare+ethnic', data=data[data['eshare']<0.125], missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='happinessscore ~ eshare+educ', data=data[data['eshare']<0.125], missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='happinessscore ~ eshare+logy+freedom+ethnic+educ', data=data, missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 8: Happiness vs Euroshare (EuroShare<0.125)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','logy','euro2000pw','freedom','ethnic','educ']))

Table 8: Happiness vs Euroshare (EuroShare<0.125)

              (1)        (2)       (3)       (4)        (5)        (6)       (7)   
-----------------------------------------------------------------------------------
eshare     13.1597*** 7.9143*** 4.0271    11.3507*** 14.3662*** 6.4303**  1.0482***
           (3.2639)   (2.7733)  (2.8188)  (3.0102)   (2.9789)   (3.0939)  (0.3554) 
logy                  0.5653***                                           0.2737** 
                      (0.0557)                                            (0.1312) 
euro2000pw                      2.6691***                                          
                                (0.4069)                                           
freedom                                   3.0486***                       1.7839** 
                                          (0.7064)                        (0.6773) 
ethnic                                               -0.8863***           -0.2253  
                         

In **Table 9** I add a similar robustness check compared to the original analysis by taking out of the sample the countries that were not a European colony and the results hold partially. The most significant change presents itself in column (2), where current income per capita seems to cover completely the effect provoqued previously by the European population share during the colonial epoch. Still, this seems to change in column (7) where more controls are added and European share regains its relevance, and, given that this regression is more complete, we can say that results from previous analysis hold.

In [15]:
mod1 = smf.ols(formula='happinessscore ~ eshare', data=data[data['dumeshare0']==0], missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='happinessscore ~ eshare+logy', data=data[data['dumeshare0']==0], missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='happinessscore ~ eshare+euro2000pw', data=data[data['dumeshare0']==0], missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='happinessscore ~ eshare+freedom', data=data[data['dumeshare0']==0], missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='happinessscore ~ eshare+ethnic', data=data[data['dumeshare0']==0], missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='happinessscore ~ eshare+educ', data=data[data['dumeshare0']==0], missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='happinessscore ~ eshare+logy+freedom+ethnic+educ', data=data, missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 9: Happiness vs Euroshare (EuroShare different from 0)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['eshare','logy','euro2000pw','freedom','ethnic','educ']))

Table 9: Happiness vs Euroshare (EuroShare different from 0)

              (1)       (2)       (3)       (4)       (5)       (6)       (7)   
--------------------------------------------------------------------------------
eshare     3.4118*** 1.1224    0.2619    2.1290*** 3.1086*** 1.1270**  1.0482***
           (0.4571)  (0.6808)  (0.3912)  (0.4708)  (0.4603)  (0.4662)  (0.3554) 
logy                 0.6292***                                         0.2737** 
                     (0.1516)                                          (0.1312) 
euro2000pw                     3.1877***                                        
                               (0.4230)                                         
freedom                                  4.1193***                     1.7839** 
                                         (0.8920)                      (0.6773) 
ethnic                                             -0.9438*            -0.2253  
                                               

***Note:*** *This will probably be going to the appendix*

As an additional robustness check, you can see in the appendix a regression that shows if countries that were not colonized by europe are actually less happier than countries that were.


In [16]:
mod1 = smf.ols(formula='happinessscore ~ dumeshare0', data=data, missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='happinessscore ~ dumeshare0+logy', data=data, missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='happinessscore ~ dumeshare0+euro2000pw', data=data, missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='happinessscore ~ dumeshare0+freedom', data=data, missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='happinessscore ~ dumeshare0+ethnic', data=data, missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='happinessscore ~ dumeshare0+educ', data=data, missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
mod7 = smf.ols(formula='happinessscore ~ dumeshare0+logy+family+freedom+ethnic+educ', data=data, missing='drop').fit()
mod7_r = mod7.get_robustcov_results()
print("Table 10: Happiness vs Dummy Euroshare")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r, 
                   mod7_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)', '(7)'), 
                  stars=True, regressor_order=['dumeshare0','logy','euro2000pw','freedom','ethnic','educ']))

Table 10: Happiness vs Dummy Euroshare

              (1)       (2)        (3)       (4)       (5)        (6)       (7)   
----------------------------------------------------------------------------------
dumeshare0 -0.4802** -0.3799*** 0.2367    -0.3863** -0.5940*** -0.3309** -0.2311* 
           (0.2229)  (0.1435)   (0.1979)  (0.1882)  (0.1995)   (0.1496)  (0.1256) 
logy                 0.6673***                                           0.2119*  
                     (0.0531)                                            (0.1217) 
euro2000pw                      3.0708***                                         
                                (0.3140)                                          
freedom                                   4.0686***                      1.5316** 
                                          (0.7039)                       (0.6065) 
ethnic                                              -1.3379***           -0.2209  
                                               

As for R&D, we can see in **Table 11** the respective initial regressions to check if European population share during the colonial epoch has an effect on R&D (as percentage of the GDP) nowadays. In column (1) European share seems to have a positive relevant effect on R&D (an increase of 0.1 in European share provoques an increase of 0.15 percentage points of R&D as percentage of the GDP). In column (2) this time we see that current income per capita cover completely the effect produced by European share, giving the intuition that simply countries that have greater income per capita tend to invest more in Research and Development, and, due to the proven impact of European share on income per capita, column (1) shows relevance. Column (3) shows that controlling with European descendants does not cause an impact on the relevance of European share. The previous may be due to what was proven in the original paper, where European share does not act as a proxy of European descendants and has a direct impact on income per capita (as mentioned before, income per capita seems to have a direct impact in R&D). In column (4) we see the effect where human capital was brought and enhanced by Europeans, and education covers totally the effect produced by European share. In column (5) happiness score and European share seem to not have a relevant effect questioning seriously the robustness of the result from column (1). Finally, in column (6) I add all controls and the regression shows no effect of any variable, giving almost conclusive evidence that European share does not have a significant impact on R&D.

In [17]:
mod1 = smf.ols(formula='RD ~ eshare', data=data, missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='RD ~ eshare+logy', data=data, missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='RD ~ eshare+euro2000pw', data=data, missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='RD ~ eshare+educ', data=data, missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='RD ~ eshare+happinessscore', data=data, missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='RD ~ eshare+logy+educ+happinessscore', data=data, missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
print("Table 11: R&D vs Euroshare")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)'), 
                  stars=True, regressor_order=['eshare','logy','euro2000pw','educ','happinessscore']))

Table 11: R&D vs Euroshare

                  (1)       (2)       (3)      (4)      (5)      (6)   
-----------------------------------------------------------------------
eshare         1.5217*** 0.7873    2.3624*** 0.7010   1.0785*  0.8622  
               (0.4450)  (0.5983)  (0.5757)  (0.7874) (0.6018) (0.7395)
logy                     0.3251**                              0.1834  
                         (0.1355)                              (0.1554)
euro2000pw                         -0.8042*                            
                                   (0.4717)                            
educ                                         0.0132**          0.0096  
                                             (0.0061)          (0.0061)
happinessscore                                        0.1669*  -0.1294 
                                                      (0.0928) (0.1019)
Intercept      0.5938*** -2.1950** 0.6897*** -0.2641  -0.3163  -0.9035 
               (0.1423)  (1.0722)  (

In **Table 12** we run another robustness check in which countries with European share greater than 12.5% are omitted, in which European share does not show relevance in all the regressions shown. Once again R&D is not significantly impact by European share.

In [18]:
mod1 = smf.ols(formula='RD ~ eshare', data=data[data['eshare']<0.125], missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='RD ~ eshare+logy', data=data[data['eshare']<0.125], missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='RD ~ eshare+euro2000pw', data=data[data['eshare']<0.125], missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='RD ~ eshare+educ', data=data[data['eshare']<0.125], missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='RD ~ eshare+happinessscore', data=data[data['eshare']<0.125], missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='RD ~ eshare+logy+educ+happinessscore', data=data[data['eshare']<0.125], missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
print("Table 12: R&D vs Euroshare (Euroshare<0.125)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)'), 
                  stars=True, regressor_order=['eshare','logy','euro2000pw','educ','happinessscore']))

Table 12: R&D vs Euroshare (Euroshare<0.125)

                  (1)      (2)       (3)      (4)      (5)      (6)   
----------------------------------------------------------------------
eshare         -5.5499   -5.7352  -3.9809   -7.3708  -7.4830  -7.3968 
               (4.1227)  (3.8243) (3.9774)  (4.4437) (4.9489) (4.7815)
logy                     0.3141**                             -0.0041 
                         (0.1336)                             (0.1745)
euro2000pw                        -0.2926                             
                                  (0.5564)                            
educ                                        0.0162**          0.0164  
                                            (0.0072)          (0.0098)
happinessscore                                       0.2548** 0.0010  
                                                     (0.1168) (0.0998)
Intercept      0.7491*** -1.9527* 0.7542*** -0.2868  -0.6017  -0.2650 
               (0.1951)  (1.019

However, when we add a different restriction where all countries that were not colonized by European countries are omitted, we see a significant change on the results. It's possible to evidence in **Table 13** that European share keeps being relevant in every single regression. Even when adding controls of variables that have covered the effect of European share in previous regressions, European share seems to keep its relevance and magnitude. These results lead us to think that, in the group of countries that were colonized by Europe, those that had the bigger settlements tend to invest more in research and development nowadays. It may seem odd that these results didn't show on **Table 11**, but it may be given that countries that were not colonized by European countries may have had other incentives or causes that enhanced their investment in R&D.

In [19]:
mod1 = smf.ols(formula='RD ~ eshare', data=data[data['dumeshare0']==0], missing='drop').fit()
mod1_r = mod1.get_robustcov_results()
mod2 = smf.ols(formula='RD ~ eshare+logy', data=data[data['dumeshare0']==0], missing='drop').fit()
mod2_r = mod2.get_robustcov_results()
mod3 = smf.ols(formula='RD ~ eshare+euro2000pw', data=data[data['dumeshare0']==0], missing='drop').fit()
mod3_r = mod3.get_robustcov_results()
mod4 = smf.ols(formula='RD ~ eshare+educ', data=data[data['dumeshare0']==0], missing='drop').fit()
mod4_r = mod4.get_robustcov_results()
mod5 = smf.ols(formula='RD ~ eshare+happinessscore', data=data[data['dumeshare0']==0], missing='drop').fit()
mod5_r = mod5.get_robustcov_results()
mod6 = smf.ols(formula='RD ~ eshare+logy+educ+happinessscore', data=data[data['dumeshare0']==0], missing='drop').fit()
mod6_r = mod6.get_robustcov_results()
print("Table 13: R&D vs Euroshare (Euroshare different from 0)")
print(summary_col([mod1_r,mod2_r,mod3_r,mod4_r, mod5_r, mod6_r],model_names=('(1)', '(2)', '(3)', '(4)','(5)', '(6)'), 
                  stars=True, regressor_order=['eshare','logy','euro2000pw','educ','happinessscore']))

Table 13: R&D vs Euroshare (Euroshare different from 0)

                  (1)       (2)       (3)       (4)       (5)       (6)   
--------------------------------------------------------------------------
eshare         2.1541*** 1.9215*** 2.1551*** 1.8252*** 2.2862*** 1.9878***
               (0.4073)  (0.5018)  (0.5720)  (0.5822)  (0.5202)  (0.5622) 
logy                     0.0807                                  0.0396   
                         (0.0872)                                (0.2021) 
euro2000pw                         -0.0012                                
                                   (0.3672)                               
educ                                         0.0042              0.0053   
                                             (0.0037)            (0.0060) 
happinessscore                                         -0.0534   -0.1453  
                                                       (0.0954)  (0.1029) 
Intercept      0.2370*** -0.4274   0.2373**

## 5. Concluding Remarks

At first, given the original paper and the additional robustness check, it is safe to say that bigger European settlements during the colonial epoch cause economic growth. As argued before, this may be to the postulates given by Engerman and Sokoloff (1997) and Acemoglu et. al. (2001, 2002) where, in countries that had bigger settlements, Europeans opted to implement inclusive institutions that are growth enhancing. Also, as Glaeser et. al (2004) argues, Europeans brought a lot of human capital themselves and created many institutions that promoted human capital building. Given that, more Europeans meant more human capital and economic growth.

As for the new answered questions, happiness seems to be strongly correlated with bigger settlements in the colonial epoch, but this variable seems to be acting as a proxy variable for the number of european descendants. This paper gives the first hints that happiness may be higher by countries that have had more European intervention. On the other hand, when analyzing exclusively countries that were colonized by European countries, we can see that the size of the settlement that Europeans built in the target country has a direct relation on R&D investments that countries choose today.

Many questions follow this replication paper. Although it gives good hints for research for the upcoming years, it does not give conclusive proof other than the one already given by EL. Possible research ideas that come from this paper are: (1) Why is happiness strongly shocked by the number of European descendants of a country?, (2) For what reason bigger settlements impact positively R&D investment in countries that were colonized by Europe?


## 6. References

Acemoglu, D., Johnson, S., & Robinson, J. (2002). Reversal of fortune: Geography and institutions in the
making of the modern world income distribution. Quarterly Journal of Economics, 117(4), 1231–1294.

Acemoglu, D., & Robinson, J. (2012). Why nations fail: The origins of power, prosperity, and poverty. New
York: Crown Publishers.

Easterly, W., Levine, R. The European origins of economic development. J Econ Growth 21, 225–257 (2016).

Engerman, S., & Sokoloff, K. (1997). Factor endowments, institutions, and differential paths of growth among
new world economies: A view from economic historians of the United States. In Haber (Ed.), How Latin
America fell behind (pp. 260–304). Stanford: Stanford University Press

Galor, O. (2011). Unified growth theory. Princeton: Princeton University Press.

Glaeser, E., La Porta, R., Lopez-de-Silanes, F., & Shleifer, A. (2004). Do institutions cause growth? Journal
of Economic Growth, 9(3), 271–303.

Putterman, L., & Weil, D. (2010). Post-1500 population flows and the long run determinants of economic
growth and inequality. Quarterly Journal of Economics, 125(4), 1627–1682.