--- 
Project for the course in Microeconometrics | Summer 2020, M.Sc. Economics, Bonn University | Julia Wilhelm

# Replication of F. Barrera-Osorio, M. Bertrand, L. L. Linden, F. Perez-Calle  (2011) <a class="tocSkip">   
---

This notebook contains my replication of the results from the following paper:

> Barrera-Osorio, Felipe, Marianne Bertrand, Leigh L. Linden, and Francisco Perez-Calle (2011). "Improving the Design of Conditional Transfer Programs: Evidence from a Randomized Education Experiment in Colombia." American Economic Journal: Applied Economics, 3 (2): 167-95. 

The original paper, as well as the data and code provided by the authors can be accessed [here](https://www.aeaweb.org/articles?id=10.1257/app.3.2.167).

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#1.-Introduction" data-toc-modified-id="1.-Introduction-1">1. Introduction</a></span></li><li><span><a href="#2.-Identification" data-toc-modified-id="2.-Identification-2">2. Identification</a></span></li><li><span><a href="#3.-Empirical-Strategy" data-toc-modified-id="3.-Empirical-Strategy-3">3. Empirical Strategy</a></span></li><li><span><a href="#4.-Replication-of-Barrera-Osorio-et-al.-(2011)" data-toc-modified-id="4.-Replication-of-Barrera-Osorio-et-al.-(2011)-4">4. Replication of Barrera-Osorio et al. (2011)</a></span><ul class="toc-item"><li><span><a href="#4.1.-Data-&amp;-Descriptive-Statistics" data-toc-modified-id="4.1.-Data-&amp;-Descriptive-Statistics-4.1">4.1. Data &amp; Descriptive Statistics</a></span></li><li><span><a href="#4.2.-Baseline-Comparison" data-toc-modified-id="4.2.-Baseline-Comparison-4.2">4.2. Baseline Comparison</a></span></li><li><span><a href="#4.3.-Results" data-toc-modified-id="4.3.-Results-4.3">4.3. Results</a></span><ul class="toc-item"><li><span><a href="#4.3.1.-Attendence" data-toc-modified-id="4.3.1.-Attendence-4.3.1">4.3.1. Attendence</a></span></li><li><span><a href="#4.3.2.-Re-enrollment" data-toc-modified-id="4.3.2.-Re-enrollment-4.3.4">4.3.2. Re-enrollment</a></span></li><li><span><a href="#4.3.3.-Heterogeneity" data-toc-modified-id="4.3.3.-Heterogeneity-4.3.3">4.3.3. Heterogeneity</a></span></li><li><span><a href="#4.3.4.-Survey-Based-Outcomes-Graduation-and-Tertiary-Enrollment" data-toc-modified-id="4.3.4.-Survey-Based-Outcomes-Graduation-and-Tertiary-Enrollment-4.3.6">4.3.4. Survey-Based Outcomes - Graduation and Tertiary Enrollment</a></span></li><li><span><a href="#4.3.5.-Siblings-Effects" data-toc-modified-id="4.3.5.-Siblings-Effects-4.3.5">4.3.5. Siblings Effects</a></span></li></ul></li></ul></li><li><span><a href="#5.-Critical-Assessment" data-toc-modified-id="5.-Critical-Assessment-5">5. Critical Assessment</a></span><li><span><a href="#6.-Extensions" data-toc-modified-id="6.-Extensions-7">6. Extensions</a></span><ul class="toc-item"> <li><span><a href="#6.1.-Check-for-Balanced-Groups-across-Experiments" data-toc-modified-id="6.1.-Check-for-Balanced-Groups-across-Experiments-6.1">6.1. Check for Balanced Groups across Experiments</a></span></li><li><span><a href="#6.2.-Eliminate-Back-door-Paths-Controlling-only-for-Locality" data-toc-modified-id="6.2.-Eliminate-Back-door-Paths-Controlling-only-for-Locality-6.2">6.2. Eliminate Back-door Paths Controlling only for Locality</a></span></ul><li><span><a href="#7.-Conclusion" data-toc-modified-id="7.-Conclusion-7">7. Conclusion</a></span></li><li><span><a href="#8.-References" data-toc-modified-id="8.-References-8">8. References</a></span></li></ul></div>

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as plt
import statsmodels as sm
import scipy as sc
import scipy.stats as ss
import pandas.io.formats.style
import statsmodels.formula.api as smf
import statsmodels.api as sm_api

In [2]:
from auxiliary.auxiliary_tables import *

---
# 1. Introduction 
---

Barrera-Osorio et al. (2011) compare three education-based conditional cash transfer designs aimed at incentivizing academic participation. Using data from a pilot study in Bogota, Colombia they examine the effects of a bi-monthly transfer (basic treatment), a bi-monthly transfer combined with a lump-sum payment at the time students are supposed to re-enroll in school (savings treatment) and a bi-monthly transfer combined with a large payment upon graduation (tertiary treatment). The payments are conditional on school attendence of the child and designed to prevent dropout from secondary schools and to encourage matriculation at tertiary institutions. On the one hand, the savings and tertiary treatment impose more binding short-term liquidity constraints on participating families than the basic treatment. The authors examine whether this decreases monthly school attendence. On the other hand, they might provide stronger incentives for families to re-enroll their children at school or for graduation. 

To estimate and compare the causal impact of the three treatments, Barrera-Osorio et al. (2011) apply a difference model to data from Bogota. The Secretary of Education of the City implemented a pilot study running for one year, where they randomly allocated treatments to children in two localities. They randomize at child-level, generating variation within schools and families. This allows the authors to assess comparability of different groups. Barrera-Osorio et al. (2011) find that all designs significantly increase attendance and that the savings and tertiary treatments increase enrollment rates more strongly than the basic treatment. They conclude that the structure of the intervention can help targeting resources.

In this notebook, I replicate the results presented in the paper by Barrera-Osorio et al. (2011). Additionall, I critically discuss the quality of the strategy and the results. My analysis supports the findings of Barrera-Osorio et al. (2011) 

This notebook is structured as follows. In the next section, I present the identification strategy Barrera-Osorio et al. (2011) use to unravel the causal effects of the conditional cash transfers (Section 2). Section 3 briefly discusses the empirical strategy the authors use for estimation. Section 4 shows my replication of the results of the paper, and Section 5 is a critical discussion thereof. In Section 6 I check the identification assumption across the two experiments and conduct regressions conditioning on one variable that blocks all back-door paths from the causal variable to the outcome variable to identify causal effects. Section 7 presents some conclusions.

---
# 2. Identification
--- 


Barrera-Osorio et al. (2011) aim to answer the question on how three different education-based conditional cash transfer designs perform in preventing dropout from secondary schools and encouraging matriculation at tertiary institutions. 
The different treatments were implemented in two localities in Bogota, San Cristobal and Suba. Eligible children in San Cristobal were randomly assigned between a control group, the basic treatment (bi-monthly transfer) and the savings treatment (bi-monthly transfer combined with a lump-sum payment at the time students are supposed to re-enroll in school). In Suba eligible children were randomly assigned to the tertiary treatment (bi-monthly transfer combined with a large payment upon graduation) and a control group. Since it is impossible to observe treatment effects at the individual level, researchers thus estimate average effects using treatment and control groups. For each individual $i$ we can image a potential outcome where they are treated $Y_i(1)$ and where they are not $Y_i(0)$, but we can never simultaneously observe both outcomes for each individual. The random treatment assignment allows the authors to experimentally estimate the causal effects of the three treatments. While they directly can compare the effect of the basic and savings treatment, comparing those with the tertiary treatment they cannot rely on purely random variation. This is because the tertiary treatment was implemented in another locality and therefore, the comparison to the tertiary treatment occurs across experiments.

Since treatments were assigned randomly within the two localities, potential outcomes are independent of $D$ and the selection bias is eliminated. The naive estimate which simply compares the observed average outcome of the treatment and control groups, then equals the true average treatment effect: 

\begin{align*}
  E[Y\mid D = 1] - E[Y\mid D = 0] & = E[Y^1\mid D = 1] - E[Y^0\mid D = 0] \\
                                  & =E[Y^1\mid D = 1]  - E[Y^0\mid D = 1] + E[Y^0\mid D = 1] - E[Y^0\mid D = 0]  \\
                                  & = \underbrace{E[Y^1 - Y^0\mid D = 1]}_{ATT} + \underbrace{E[Y^0\mid D= 1]- E[Y^0 \mid D = 0]}_{\text{Selection bias}} \\
                                  & =E[Y^1 - Y^0\mid D = 1] \\
                                  & =E[Y^1 - Y^0\mid D = 0] \\
                                  & =E[Y^1 - Y^0]
\end{align*}

The authors here rely on the following two assumptions:
\begin{align*}
E[Y^1\mid D = 1] = E[Y^1\mid D = 0] \\
E[Y^0\mid D = 1] = E[Y^0\mid D = 0] \\
\end{align*}

The causal graphs below illustrate the relationship between the treatments $D_B$, $D_S$, $D_T$ and outcome $Y$ in the two localities. Additionally there may be observables $W$ and unobservables $U$ also affecting $Y$. Due to random treatment assignment within the two localities, treatment is independent of $W$ and $U$ and there is no back-door path which has to be eliminated.

**San Cristobal:**
![ERROR:Here should be causal graph 1](files/CausalGraph_1.png)  
$D_B$: Basic treatment  
$D_S$: Savings treatment  
$Y$: Students outcome  
$U$: Unobservables  
$W$: Observables

**Suba:**
![ERROR:Here should be causal graph 2](files/CausalGraph_2.png)  
$D_T$: Tertiary treatment  
$Y$: Students outcome  
$U$: Unobservables  
$W$: Observables

The identification assumption to eliminate causal effects here is that randomization within the localities is successful. Barrera-Osorio et al. (2011) account for this checking whether treatment assignment created balanced treatment and control groups using household- and individual-level characteristics. These information were collected prior to the randomization, which suggests that students in each group should, on average, have similar characteristics. The authors make 60 comparisons and find 7 differences that are statistically significant at the 10 percent level, 5 at the 5 percent level and 2 at the 1 percent level. They conclude that randomization of the treatment assignment is successful.

However, considering the experiment in the two localities together in order to compare the effects of all three treatments, the causal graph has three back-door paths which have to be eliminated. Treatment assignment then is not completely random, since it is not random whether a person lives in Suba or San Cristobal. Observable or unobservable factors may affect treatment assignment and the outcome at the same time. The causal graph the authors use then looks as follows:

![ERROR:Here should be causal graph 3](files/CausalGraph_3.png)  
$D_B$: Basic treatment  
$D_S$: Savings treatment  
$D_T$: Tertiary treatment  
$L$: Dummy for locality of household (Suba or San Cristobal)
$Y$: Students outcome  
$U$: Unobservables  
$W$: Observables

In order to eliminate the back-door paths Barrera-Osorio et al. (2011) control for the locality of the householods and a large set of observable demographic characteristics. Nevertheless, the authors mention that differences between the tertiary treatment and the other treatments could be due to unobserved heterogeneity in treatment effects. I will come to this problem later again.

---
# 3. Empirical Strategy
---
Barrera-Osorio et al. (2011) examine the impact of the basic, the savings and the tertiary treatment on student outcome. They use a simple difference model that makes comparisons between different subsets of the sample without controlling for covariates. 

For the basic-savings experiment in San Cristobal the specification takes the following form:

\begin{equation}
y_{ij} = \beta_0 + \beta_B Basic_i + \beta_S Savings_i + \epsilon_{ij} 
\end{equation}

For the tertiary experiment in Suba the specification takes the following form:

\begin{equation}
y_{ij} = \beta_0 + \beta_T Tertiary_i + \epsilon_{ij} 
\end{equation}
* $y_{ij}$ denotes a particular outcome for child $i$ in school $j$,
* $Basic_i$, $Savings_i$ and $Tertiary_i$ are indicator variables for whether or not the child is in the respective treatment group,
* $\epsilon_{ij}$ is the error term, which is allowed to vary up to the school level.

The authors additionally use a difference estimator that controls for socio-demographic and school characteristics.
For the basic-savings experiment the model is specified as follows:

\begin{equation}
y_{ij} = \beta_0 + \beta_B Basic_i + \beta_S Savings_i + \delta X_{ijk} + \phi_{j} + \epsilon_{ij} 
\end{equation}

For the tertiary treatment the model is specified as follows:

\begin{equation}
y_{ij} = \beta_0 + \beta_T Tertiary_i + \delta X_{ijk} + \phi_{j} + \epsilon_{ij} 
\end{equation}

The variables are defined as before. Additionally,
* $X_{ijk}$ is a vector of socio-demographic controls for child $i$ in school $j$ and family $k$,
* $\phi_{j}$ are school fixed effects.

---
# 4. Replication of Barrera-Osorio et al. (2011)
---

## 4.1. Data & Descriptive Statistics
Barrera-Osorio et al. (2011) restricted their sample of students spread across 251 schools to the 68 school with the largest number of registered children. In addition to that they filter the data by those students who comleted a baseline survey they conducted. For the tertiary experiment they drop students in grade 6-8 since those were not eligible for the program.

In [3]:
data = pd.read_stata('data/Public_Data_AEJApp_2010-0132.dta')
data.index.name = "individual"
data['grade_group'] = 'Grades 6-8'
data.loc[data['grade'] > 8, 'grade_group'] = 'Grades 9-10'
data.loc[data['grade'] > 10, 'grade_group'] = 'Grade 11'
data['group'] = 'Control'
data.loc[data['T1_treat'] == 1, 'group'] = 'Basic'
data.loc[data['T2_treat'] == 1, 'group'] = 'Savings'
data.loc[data['T3_treat'] == 1, 'group'] = 'Tertiary'
sample = data.drop(data[(data.suba == 1) & (data.grade < 9)].index)
sample['s_teneviv_int'] = sample['s_teneviv'].cat.codes + 1
sample['s_sexo_int'] = sample['s_sexo'].cat.codes
sample['s_estcivil_int'] = sample['s_estcivil'].cat.codes + 1
sample = sample.join(pd.get_dummies(sample['school_code']))
sample = sample.join(pd.get_dummies(sample['s_teneviv']))
sample = sample.join(pd.get_dummies(sample['s_estcivil']))
sample = sample.join(pd.get_dummies(sample['grade'], prefix='grade'))
sample = sample.join(pd.get_dummies(sample['s_estrato'],  prefix='estrato'))
sample_baselinesurvey =  sample.drop(sample[sample.bl_observed == 0].index)

Table 1 summarizes the distribution of children by grade, gender and experimental group. They end up with a sample of 7158 children.

#### Table 1- Distribution of Subjects by Research Groups

In [4]:
create_table1(sample_baselinesurvey)

Experiment,Basic-Savings,Basic-Savings,Basic-Savings,Tertiary,Tertiary,Total
Group,Control,Basic,Savings,Control,Tertiary,Unnamed: 6_level_1
Grades 6-8,1189,1215,1166,0,0,3570
Grades 9-10,643,633,586,449,425,2736
Grade 11,179,188,177,160,148,852
Female,1047,1022,1000,361,336,3766
Male,964,1014,929,248,237,3392
Total,2011,2036,1929,609,573,7158


## 4.2. Baseline Comparison
Barrera-Osorio et al. (2011) check that the randomization was successful and created balanced research groups. Therefore, they compare characteristics of students between research groups. Table 2 shows the control group averages of 15 different variables in the basic-savings experiment (B-S) and the tertiary experiment (T). The 4 other columns show 60 comparisons. The standard errors are in the row below each difference, labeled with "SE". 7 differences are statistically significant at the 10 percent level (blue), 5 at the 5 percent level (red), and 2 at the 1 percent level (green). One can conclude that treatment is assigned randomly, which supports the identification assumption of their strategy.

#### Table 2- Comparison of Students between Research Groups

In [10]:
sancristobal = sample.drop(sample[sample.suba == 1].index)
suba = sample.drop(sample[sample.suba == 0].index)
table2 = create_table2(sancristobal, suba)
#table2 = pd.io.formats.style.Styler(table2, precision=2)
#table2 = table2.style.apply(style_specific_cell, axis=None)
table2.round(2).style

Unnamed: 0,Control average B-S,Basic-Control,Savings-Control,Basic-Savings,Control average T,Tertiary-Control
Possessions,1.9,0.07,0.04,0.03,1.94,-0.05
Possessions SE,1.1,0.02,0.02,0.02,1.02,0.04
Utilities,4.65,-0.02,0.06,-0.08,4.85,0.04
Utilities SE,1.42,0.03,0.03,0.03,1.32,0.04
Durable Goods,1.37,-0.02,0.01,-0.03,1.63,0.02
Durable Goods SE,0.89,0.02,0.02,0.02,0.86,0.03
Physical Infrastructure,11.65,-0.05,0.04,-0.09,12.14,-0.05
Physical Infrastructure SE,1.75,0.03,0.03,0.04,1.49,0.06
Age,14.38,0.09,-0.06,0.16,15.67,-0.06
Age SE,5.3,0.1,0.14,0.17,4.23,0.19


## 4.3. Results

### 4.3.1 Attendence
First, the authors analyse the effect of the conditional cash transfers on the school attendence rate. They here include only individuals who are enrolled in one of the 68 schools selected for surveying. In addition to that, they exclude students who are in grade 11 for the enrollment effect estimations, since they should graduate rather than re-enroll. To be consistent they also restrict the sample to students in grades 6-10 from the estimates of the effect on the attendence rate. In order to replicate their results I run simple regressions of the school attendence rate on the treatment variable without control variables, with demographic controls and with demographic controls and school fixed effects. The first three columns of table 3 show the results for the Basic-Savings experiment in Sancristobal, while columns 4 to 6 show the results for the tertiary experiment in Suba. The last column shows results of a regression containing all three treatments, demographics and school fixed effects. The estimated treatment effects and their standard errors ("SE") are provided in rows 1-6 and the test statistics from comparisons of the relative treatment effects and their p-values are in rows 7-10.

#### Table 3 - Effects on monitored school attendence rates

In [4]:
sancristobal = sample.drop(sample[sample.suba == 1].index)
suba = sample.drop(sample[sample.suba == 0].index)
sancristobal = sancristobal.drop(sancristobal[(sancristobal.survey_selected == 0) | (sancristobal.grade == 11)].index)
suba = suba.drop(suba[(suba.survey_selected == 0) | (suba.grade == 11) | (suba.grade < 9)].index)
sample_survey = sample.drop(sample[(sample.survey_selected == 0) | (sample.grade == 11) | (sample.grade < 9)].index)
create_table34(sancristobal, suba, sample_survey, 'at_msamean')

Unnamed: 0,Basic-Savings,Basic-Savings with demographics,Basic-Savings with demographics and school fixed effects,Tertiary,Tertiary with demographics,Tertiary with demographics and school fixed effects,Both
Basic treatment,0.033,0.032,0.032,,,,0.025
Basic treatment SE,0.007,0.008,0.007,,,,0.01
Savings treatment,0.029,0.027,0.027,,,,0.028
Savings treatment SE,0.008,0.008,0.007,,,,0.012
Tertiary treatment,,,,0.052,0.054,0.056,0.055
Tertiary treatment SE,,,,0.018,0.016,0.02,0.02
H0: Basic-Savings F-Stat,0.312,0.404,0.481,,,,0.053
p-value,0.581,0.53,0.494,,,,0.819
H0: Tertiary-Basic F-Stat,,,,,,,1.863
p-value,,,,,,,0.18


The table shows:
- Basic treatment increases attendence by 3.3 percentage points (significant at the one percent level)
- Savings treatent increases attendence by 2.9 percentage points (significant at the one percent level)
- Tertiary treatment increases attendence by 5.2 percentage points (significant at the one percent level)
- no evidence that the treatments have different effects

My results from the regressions and difference tests are the same as those Barrera-Osorio et al. (2011) estimate in their paper. One can conclude, that although the savings and tertiary treatment impose more binding short-term liquidity constraints on families than the basic treatment, there is no evidence of this hurting monthly attendence. 

### 4.3.2 Re-enrollment
Second, the authors analyse the effect of the conditional cash transfers on re-enrollment. Table 4 is designed as table 3, running regressions on the observed re-enrollment rate.

#### Table 4 - Effects on administrative enrollment in following year

In [5]:
sancristobal = sample.drop(sample[(sample.suba == 1) | (sample.grade == 11)].index)
suba = sample.drop(sample[(sample.suba == 0) | (sample.grade == 11) | (sample.grade < 9)].index)
sancristobal = sancristobal[sancristobal['m_enrolled'].notna()]
suba = suba[suba['m_enrolled'].notna()]
sample_grade = sample.drop(sample[(sample.grade == 11) | (sample.grade < 9)].index)
sample_grade = sample_grade[sample_grade['m_enrolled'].notna()]
create_table34(sancristobal, suba, sample_grade, 'm_enrolled')

Unnamed: 0,Basic-Savings,Basic-Savings with demographics,Basic-Savings with demographics and school fixed effects,Tertiary,Tertiary with demographics,Tertiary with demographics and school fixed effects,Both
Basic treatment,0.017,0.016,0.011,,,,0.0
Basic treatment SE,0.009,0.008,0.01,,,,0.016
Savings treatment,0.045,0.046,0.04,,,,0.03
Savings treatment SE,0.016,0.015,0.011,,,,0.017
Tertiary treatment,,,,0.042,0.039,0.037,0.042
Tertiary treatment SE,,,,0.022,0.021,0.02,0.019
H0: Basic-Savings F-Stat,3.99,3.941,5.519,,,,2.271
p-value,0.048,0.049,0.02,,,,0.133
H0: Tertiary-Basic F-Stat,,,,,,,2.228
p-value,,,,,,,0.137


The table shows:
- Basic treatment increases re-enrollment by 1.7 percentage points (significant at the 10 percent level)
- Savings treatent increases re-enrollment by 4.5 percentage points (significant at the one percent level)
- Tertiary treatment increases re-enrollment by 3.6 percentage points (significant at the 10 percent level)
- difference in magnitude of the basic and savings treatent effects is statistically significant at the 5 percent level
- no evidence that the tertiary and the basic treatment effects are different

Again, my results from the regressions are the same as those Barrera-Osorio et al. estimate in their paper. One can conclude, that the savings treatment, which consists of a bi-monthly transfer combined with a lump-sum payment at the time students are supposed to re-enroll in school, is more effective in increasing re-enrollment than the basic treatment. 

### 4.3.3 Heterogeneity

In [67]:
#h = 0.075
#xmin = 0.2
#xmax = 0.95
#st1 = (xmax-xmin)/(gsize-1)
#sample['xgrid1'] = xmin + ((sample.index-1)*st1)
#sample.loc[sample.index > gsize, 'xgrid1'] = 0
#ic = 1
#while ic <= gsize:
#sample['z'] = abs((sample['en_baseline']-sample['xgrid1'])/h)
#sample = sample.drop(sample[sample.z > 1].index)
#sample['kz'] = (3/4)*(1-sample['z']**2)/h
#sample['x_mod'] = (sample['en_baseline'] - sample['xgrid1']) * np.sqrt(sample['kz'])
#sample['const_mod'] = sample['kz']**0.5
#sample['y_mod'] = sample['m_enrolled']*(sample['kz']**0.5)
#sample.drop(sample[sample.kz == 0].index)
#sample.drop(sample[sample.const_mod == 0].index)
#reg = sm_api.OLS(sample['y_mod'], sm_api.add_constant(sample[['const_mod','x_mod']])).fit()
#sample['den_control'] = reg.params[2]
#sample['en_control'] = reg.params[1]
#reg.params

### 4.3.4 Survey-Based Outcomes - Graduation and Tertiary Enrollment
Barrera-Osorio et al. (2011) use data from a follow-up survey which was conducted after the treatments were implemented to analyze the effects of each treatment on self-reported graduation and tertiary enrollment for students who were in grade 11. The sample is restricted to those students, for which the follow-up survey data is provided. In order to replicate the results from Barrera-Osorio et al. (2011), I run regressions of the binary variable for graduation and of the binary variable for tertiary enrollment on the treatment variables with demographic controls and school fixed effects. The first three columns of table 3 show the effects on graduation in the sample of Sancristobal, the sample in Suba and both together. Columns 4 to 6 show the effects on tertiary enrollment, again, in the sample of Sancristobal, the sample in Suba and both together. The estimated treatment effects and their standard errors ("SE") are provided in rows 1-6 and the test statistics from comparisons of the relative treatment effects and their p-values are in rows 7-10.

#### Table 5 - Effects graduation and tertiary enrollment for students in grade 11

In [6]:
create_table5(sample)

Unnamed: 0,Graduation Basic-Savings,Graduation Tertiary,Graduation Both,Tertiary enrollment Basic-Savings,Tertiary enrollment Tertiary,Tertiary enrollment Both
Basic treatment,0.039,,0.036,0.043,,0.048
Basic treatment SE,0.042,,0.042,0.036,,0.033
Savings treatment,0.04,,0.039,0.094,,0.094
Savings treatment SE,0.033,,0.03,0.034,,0.033
Tertiary treatment,,0.047,0.044,,0.489,0.487
Tertiary treatment SE,,0.037,0.031,,0.04,0.041
H0: Basic-Savings F-Stat,0.0,,0.006,1.769,,1.542
p-value,0.991,,0.94,0.199,,0.223
H0: Tertiary-Basic F-Stat,,,0.022,,,75.558
p-value,,,0.882,,,0.0


The table shows:
- None of the estimated effects on graduation are statistically significant
- Savings treatment increases tertiary enrollment by 9.4 percentage points (statistically significant at the 5 percent level)
- Tertiary treatment increases tertiary enrollment by 48.9 percentage points (statistically significant at the 1 percent level)
- The effect for the tertiary treatment is statistically significantly different from the effect of the basic treatment (p-value < 0.0001)

These estimates are of the same size as those provided by Barrera-Osorio et al. (2011). They mention that the tertiary enrollment findings might not be credible due to the extremely large estimate for the tertiary treatment group. One explanation for this result could be that students in the tertiary treatment group are lying in the follow-up survey about being enrolled in tertiary institutions. However, self-reported graduation rates seem to match the estimates based on the administrative enrollment data and verified attendence data.

### 4.3.5 Siblings Effects
Barrera-Osorio et al. (2011) analyze the effect of the treatments on school attendence and re-enrollment for siblings. They use the intra-familiy variation in treatment assignment to provide a reduced form test of whether receiving the transfer changes the allocation of opportunities within the household. In order to extract causal effects they restrict their sample to the subset of siblings that were also registered, since otherwise systematic differences between those families who registered one and those who registered two children might bias the results. They focus on families who registered two children and only choose those for which administrative enrollment data is available. They pool the treatments due to the small sample size. Columns 1 and 2 of Table 6 contain comparisons of untreated children with and without treated siblings. Columns 3 and 4 show comparisons only for girls and columns 5 and 6 for boys.

#### Table 6 - Effects of treatment on siblings using monitored and administrative participation, households with two registered children

In [4]:
create_table6(data)

Unnamed: 0,Attendence,Enrollment,Attendence Female,Enrollment Female,Attendence Male,Enrollment Male
Sibling is treated?,-0.03,-0.071,-0.053,-0.114,-0.029,-0.04
Sibling is treated? SE,0.015,0.026,0.021,0.053,0.032,0.04
,,,,,,
Observations,690.0,668.0,352.0,340.0,338.0,328.0
R squared,0.278,0.137,0.332,0.23,0.383,0.234


The table shows:
- Attendence of untreated children with treated siblings is 3 percentage points lower compared to untreated children whose siblings are also untreated (statistically significant at the 10 percent level)
- Enrollment of untreated children with treated siblings is 7.1 percentage points lower compared to untreated children whose siblings are also untreated (statistically significant at the 1 percent level)
- The effect is qualitatively similar for both genders
- The effect is stronger and statistically significant at the 5, respectively 10, percent level for girls 

The results in columns 1, 2, 3 and 5 are the same as those provided by Barrera-Osorio et al. (2011). The estimates of the effect on enrollment of female and male  are slightly different to those provided in the paper. However, qualitatively the effects are the same. The table suggest that the additional household resources generated by the program are not used to invest in the education of the untreated children. Instead, families with a treated child seem to take educational input away from the untreated child. The authors conclude, that eligibility rules that cut across children may increase inequality in the educational attainment within the household.

---
# 5. Critical Assessment
---

In the following I discuss the quality of the strategy and estimations provided by Barrera-Osorio et al. (2011).  
One strength of their analysis is, that they use administrative data for attendence and enrollment. This implies a high quality, since it is not systematically affected by individuals lying or misreporting as survey data might be.  
Secondly, they check whether treatments are assigned randomly within the localities. Comparing characteristics they conclude that control and treatment groups are balanced. In addition to that, the fact that regression estimates do not differ strongly when demographics and school fixed effects are included, implies, that randomization was successful.  
Thirdly, in order to rule out spillover effects of the treatment between children through peer networks, Barrera-Osorio et al. (2011) check that children in treatment and control group have similar networks. Therefore, any indirect treatment effect would be equally distributed across the groups.  
Fourthly, the authors make sure that there is no self selection into treatment. Only children from families, who lived in the localities prior to 2004 were eligible to register for the program. This rules out that families moved to take advantage of the treatments.  
Fifthly, they mention that the SISBEN data, which they use for background characteristics, may underestimate assets and income, since the surveyed families knew that they were surveyed for the purpose of scoring them on a poverty index. However, the bias due to this hawthorne effect is not correlated with the differences investigated in the paper given the timing and purpose of the survey.  
Sixthly, Barrera-Osorio et al. (2011) check whether their baseline and follow-up survey induce a bias due to attrition. They conclude that attrition occurs similar across treatment and control groups, which implies that the results should not be biased.  

One problem regarding their analysis is, that the tertiary treatment ends up being more generous than the basic and savings treatment. This makes it difficult to compare the treatments and the effect of the tertiary treatment might be biased upwards compared to the effect of the other two treatments.  
Next, they cannot rely on random treatment assignment for comparisons between the treatments in the two experiments. This might bias the result if there are systematic differences of characteristics which at the same time affect treatment assignment and the outcome variable. For example, the fact that the difference in magnitude of the basic and the savings treatment effects on re-enrollment is statistically significant in the regression, where I only consider the experiment in San Cristobal, and not statistically significant in the regression, where I include both locations, indicates this.  
Another problem is, that the administrative enrollment data could not be matched to 9.3% of the students in the experiment from San Cristobal, and 8.5% could not be matched in Suba. If the children for which the data could not be matched have characteristics, which systematically affect their enrollment rate, this created a bias.  
The results for the effects on graduation and tertiary enrollment relies on survey-based data. Since individuals might lie or misreport information, the results could be biased.  
Lastly, the sample size for the tertiary treatment is not very large, which is another weakness of the paper.

---
# 6. Extensions
---

## 6.1. Check for Balanced Groups across Experiments
As mentioned above, considering the experiments in the two localities together in order to compare the effects of all three treatments, the causal graph has three back-door paths which have to be eliminated. Treatment assignment then is not completely random, since it is not random whether a person lives in Suba or San Cristobal. Observable or unobservable factors may affect where a family lives and therefore also treatment assignment and the outcome at the same time. This can be clearly seen, when comparing characteristics of all three treatment groups to each other and to the group of not treated individuals in both localities together. Table 7 shows these comparisons. The first column shows the mean for all not treated children in the sample. The following columns present the differences between the corresponding groups. For each mean and difference standard errors are provided below, denoted by 'SE'. 

#### Table 7 - Comparisons of Students across Experiments

In [6]:
create_table7(sample)

Unnamed: 0,Control average,Basic-Control,Savings-Control,Tertiary-Conrol,Basic-Savings,Basic-Tertiary,Savings-Tertiary
Possessions,1.91,0.06,0.02,-0.01,0.03,0.07,0.04
Possessions SE,1.08,0.02,0.02,0.04,0.02,0.05,0.05
Utilities,4.7,-0.07,0.01,0.19,-0.08,-0.25,-0.18
Utilities SE,1.39,0.03,0.03,0.06,0.03,0.08,0.07
Durable Goods,1.44,-0.09,-0.06,0.21,-0.03,-0.3,-0.27
Durable Goods SE,0.89,0.02,0.02,0.03,0.02,0.04,0.04
Physical Infrastructure,11.78,-0.18,-0.09,0.31,-0.09,-0.48,-0.4
Physical Infrastructure SE,1.7,0.04,0.03,0.06,0.04,0.07,0.06
Age,14.71,-0.24,-0.39,0.9,0.16,-1.13,-1.29
Age SE,5.08,0.13,0.14,0.2,0.17,0.25,0.21


The table provides 90 differences. 45 are statistically significant at the 1 percent level, 57 at the 5 percent level and 59 at the 10 percent level. Between the basic and the savings treatment group only 2 differences are statistically significant (1 at the 1 percent level and two at the 5 percent level), which we already have seen in table 2. Out of the remaining 75 differences, 57 are statistically significant at the 10 percent level, 55 at the 5 percent level and 44 at the 1 percent level. One can conclude that the treatment and control groups are not similar regarding these characteristics. This suggests, that simple estimates of regressing outcome variables on the treatment might be biased and the back-door paths have to be eliminated.

## 6.2. Eliminate Back-door Paths Controlling only for Locality
In order to eliminate the back-door paths Barrera-Osorio et al. (2011) control for the locality of the householods and a large set of observable demographic characteristics. However, all three back-door paths should be eliminated by simply controling for the locality in which the family lives, since within the localities treatment assignment is random. In the following, I estimate the effects of all three treatments together on attendence and enrollment rates only controlling for the locality. Table 7 shows the results of this regression. I compare my results with those presented in the last column of table 3, respectively table 4. 

#### Table 8 - Effects on Attendence and Re-enrollment only Controlling for the Locality

In [5]:
create_table8(sample)

Unnamed: 0,Attendence,Enrollment
Basic treatment,0.023,0.004
Basic treatment SE,0.011,0.014
Savings treatment,0.028,0.024
Savings treatment SE,0.011,0.019
Tertiary treatment,0.052,0.042
Tertiary treatment SE,0.017,0.022
H0: Basic-Savings F-Stat,0.089,0.932
p-value,0.767,0.335
H0: Tertiary-Basic F-Stat,1.965,1.978
p-value,0.168,0.161


These estimates are very similar to those presented in the last column of table 3, respectively table 4. For example, the tertiary treatment here is estimated to increase the attendence rate by 5.2 percentage points, table 3 shows an effect of 5.5 percentage points. The largest difference occurs for the effect of the savings treatment on re-enrollment, which is an increase of 2.5 percentage points compared to 3 when we control for demographics and include school fixed effects. However, the results are statistically significant at the same significance levels as in table 3 and 4. Concerning the difference tests of the treatment effects the results differ more strongly, while in none of the cases they are statistically significant. One can conclude that including demographic characteritics and school fixed effects in addition to controlling for the locality in which the family lives does not change the results of the regression with all three treatments. This can be explained by the fact that the back-door paths are blocked by adding the locality to the regression.

---
# 7. Conclusion
---

The results in this notebook support the findings reported in the paper by Barrera-Osorio et al. (2011). I reproduced the results precisely for almost all tables exept for the third and sixth column of table 6 for which I could only produce similar results. In addition to the replication of the main results from Barrera-Osorio et al. (2011), I critically discuss the quality of their strategy and estimations and evaluate the robustness of their results. Comparing mean characteristics between the research groups across the two experiments, I find that the groups are different. This indicates that their identification assumption, which is that treatments are assigned randomly, is violated and estimates from the regressions, where all three treatments are included, might be biased. Barrera-Osorio et al. (2011) include demographic controls and school fixed effects in the regression in order to rule out potential biases. However, evaluating the causal graph I provide in section 2 one can conclude that including the locality into the regression equation eliminates all three back-door paths. Running this regression I find similar results as Barrera-Osorio et al. (2011). However, the fact that estimates of the effects of the basic and savings treatment are very different when the tertiary treatment is included in the regression or not, indicates a bias. Further analysis of the difference of the basic and savings treatment compared to the tertiary treatment might be helpful to provide more precise policy implications.  

Additionally, the study focuses only on the effects of the conditional cash transfers on the children included in the experiment in the two localities. It might be the case that the children who registered and are eligible for the experiment yield higher outcomes than those who did not register or are not eligible. This raises the question whether external validity of the results is given. Further research may address this issue.  

Overall, the findings from Barrera-Osorio et al. (2011) offer credible results and policy implications regarding the design of education-based conditional cash transfers.

---
# 8. References
---

* **Barrera-Osorio, Felipe, Marianne Bertrand, Leigh L. Linden, and Francisco Perez-Calle (2011)**. "Improving the Design of Conditional Transfer Programs: Evidence from a Randomized Education Experiment in Colombia." *American Economic Journal: Applied Economics*, 3 (2): 167-95. 