<center>OSE data science, Prof. Dr. Philipp Eisenhauer | Summer 2021, M.Sc. Economics, Bonn University | Sven Jacobs</center> 

***
    
# <center>Replication of Angrist and Lavy (1999)</center>

This notebook replicates the core results of the paper
> Angrist, J. and V. Lavy (1999). "[Using Maimonides' rule to estimate the effect of class size on scholastic achievement](https://economics.mit.edu/files/8273)". *The Quarterly Journal of Economics* 114 (2), pp. 533–575.

Introductory remarks:
* To access the data provided by the authors, the outsourced $\mathsf{R}$ code and further materials, please consult the [README](https://github.com/OpenSourceEconomics/ose-data-science-course-project-svjaco/blob/master/README.md) on GitHub.
* All replicated tables and figures are named and labeled as they appear in the original paper. 
* Individual contributions that extend the paper are clearly indicated as such. In particular, figures and tables that constitute an independent contribution have the tag "EXT" in their name. For instance: "Figure EXT_1". Moreover, in Section [6](#Maimonides_rule_redux) we replicate relevant parts of a recent follow-up paper by the authors, "Maimonides' rule redux" (Angrist et al., [2019](#Angrist_2019)). For transparency these parts are tagged with "MRR", like in "Figure MMR_1".
* For the replication of the regression estimation results we removed one observation in the sample of the 5th graders. The reason is that one class only took the reading test and thus has a missing value for the math score which caused problems with the programming for cluster correction. Instead of 2019 classes we use 2018 classes when estimations involve the average reading score. However, this has de facto no influence on the results. Estimated coefficients sometimes differ slightly at the third decimal place and standard errors are unchanged.

Preparation:

In [2]:
# Load packages
library(tidyverse) # Collection of packages for data science
library(ggpubr)    # Customizing ggplots
library(stargazer) # Well-formatted regression and summary statistics tables
library(IRdisplay) # Front-end package for Jupyter (to display HTML code)
library(testthat)  # Provides the capture_output function
library(estimatr)  # Cluster-robust OLS and 2SLS estimation
library(qpcR)      # Provides the RMSE function
library(AER)       # Provides the ivreg function for 2SLS estimation (supported by stargazer)
# library(rddensity) # Manipulation testing based on density discontinuity

# Load data
grade4 <- read_csv("data/final4_cleaned.csv", col_types = cols())
grade5 <- read_csv("data/final5_cleaned.csv", col_types = cols())
grade5_reg <- grade5[-1501, ] # Drop class with missing value for math test (see introductory remark above)

# Import functions
source("R_code/general_functions.R")
source("R_code/replication_tables.R")
source("R_code/replication_plots.R")
source("R_code/extension_plots.R")

<p style="padding: 10px; border: 1px solid black;">
<strong>Note:</strong>
The package "rddensity" is currently not available on any conda channel. Hence, it is not installed in the conda environment. To run the code producing Figure <a href="#figure4_EXT" class="link">EXT_4</a>, the package has to be downloaded manually from CRAN. I am working on building the package with skeleton CRAN such that it can be downloaded with conda. 
</p>

## Contents

[1 Introduction](#Introduction) <br>
[2 Data and descriptive statistics](#Data_and_descriptive_statistics) <br>
$\enspace$ [2.1 Data and key variables](#Data_and_key_variables) <br>
$\enspace$ [2.2 Descriptive statistics](#Descriptive_statistics) <br>
[3 Identification](#Identification) <br>
[4 Empirical strategy](#Empirical_strategy) <br>
[5 Replication of core results](#Replication_of_core_results) <br>
$\enspace$ [5.1 Graphical analysis](#Graphical_analysis) <br>
$\enspace$ [5.2 Estimation results](#Estimation_results) <br>
[6 Maimonides' rule redux](#Maimonides_rule_redux) <br>
[7 Critical assessment and conclusion](#Critical_assessment_and_conclusion) <br>
[Appendix](#Appendix) <br>
[References](#References)

<a id="Introduction"></a>
***
## 1 Introduction
***

Class size reduction (CSR) is very popular among teachers and parents alike.
The conventional belief is that smaller classes benefit children because it gives more time for a teacher to devote to each student.
Also there may be fewer disruptions during lessons and teachers can more easily engage students in academic activities (Hattie, [2005](#Hattie_2005)).
However, reducing class sizes is highly expensive. 
Additional classrooms have to be provided, but more importantly the newly hired teachers increase the total educational expenditures substantially.
The CSR program enacted in California in 1996 had annually costs exceeding one billion dollars, 25000 new teaching positions had to be created in the first two years (Jepsen and Rivkin, [2009](#Jepsen_2009), p. 224).
These teachers then also have to be equally qualified.
Otherwise potential gains of smaller classes might be offset, as revealed in the case of California by Jepsen and Rivkin ([2009](#Jepsen_2009)).

In view of the high costs with limited funding resources at the same time, empirical evidence for the consequences of changing class size is needed.
The class size question is one of the central research questions in economics of education with several studies published over the years.
Since class size is in general not randomly assigned but endogenous (correlated with unobserved characteristics of students and schools), causal effects of class size on achievement have proved very difficult to identify.

One approach to address the issue of endogeneity/selection bias is to run a randomized experiment where children are randomly assigned to classes of different sizes.
The most prominent example is the Tennessee Student/Teacher Achievement Ratio (STAR) experiment from the late eighties, for which post-analyses have found substantial positive effects of CSR (see Krueger ([1999](#Krueger_1999)), and Chetty et al. ([2011](#Chetty_2011)) for an evaluation of long-term impacts).
Researcher-designed randomized experiments of class size, however, are rare.
As mentioned above they involve large costs and besides may face political and ethical barriers.
Therefore, research relies on natural experiments that create some kind of variation in class sizes.
Over time corresponding studies have reported mixed results.
Some find positive effects as in the STAR experiment (e. g., Gary-Bobo and Mahjoub, [2013](#Gary-Bobo_2013)), some find either no association (e. g., Hoxby, [2000](#Hoxby_2000)) or even larger classes being beneficial (Dobbelsteen et al., [2002](#Dobbelsteen_2002)).

In 1999 Angrist and Lavy published a seminal paper in which they estimate the effect of class size on scholastic achievement, measured by standardized math and reading tests, in Israeli elementary schools.
The authors were the first to recognize that mandatory class size caps might serve as a source of exogenous variation in class size.
In Israel classes are capped at 40 students according to the so-called Maimonides rule, named after the medieval rabbinic scholar Maimonides.
This rule induces a nonlinear and nonmonotonic relationship between grade enrollment and class size.
That is, class size increases one-by-one with enrollment up to 40, but when 41 students are enrolled there are supposed to be two classes of sizes 20 and 21.
The same applies for each multiple of 40 (e. g., 81 pupils are split into three classes at size 27).
Angrist and Lavy ([1999](#Angrist_1999)) exploit these discontinuities in the enrollment-class-size-relationship to create instrumental variables (IV) estimates in a fuzzy regression discontinuity design (RDD).
The RDD is fuzzy since the schools do not follow Maimonides' rule strictly.
The estimates suggest that CSR induces a significant increase in test scores with an effect size at the lower range of the strong STAR findings.

In a recently published paper ("Maimonides' rule redux", [2019](#Angrist_2019)) Angrist et al. revisit the original results and redo the analysis for more recent data with a larger sample size.
The newer estimates show no evidence of class size effects.
Additionally, the data reveal enrollment manipulation.
Both findings cast doubt on the earlier inference.

The structure of the notebook is as follows.
The next section describes the Israeli test score data and its key variables.
In Section [3](#Identification) we illustrate the authors' strategy to identify the causal effect of class size using the causal graph approach.
Then, the empirical strategy is presented.
In Section [5](#Replication_of_core_results) we replicate selected core results of the paper and examine their robustness in Section [6](#Maimonides_rule_redux).
Finally, we critically assess the quality of Angrist and Lavy ([1999](#Angrist_1999)) and conclude in Section [7](#Critical_assessment_and_conclusion).

<a id="Data_and_descriptive_statistics"></a>
***
## 2 Data and descriptive statistics
***

<a id="Data_and_key_variables"></a>
### 2.1 Data and key variables 

The original data used by Angrist and Lavy ([1999](#Angrist_1999)) contain information on Israeli classes in the third, fourth and fifth grade (about 2000 classes per grade).
Micro data were only available for third graders.
The main variables and their definition are presented in the following table.

| Variable   |                        | Definition                                                         |
|:-----------|:-----------------------|:-------------------------------------------------------------------|
| classsize  | Class size             | Number of students in class in the spring                          |
| enrollment | Enrollment             | September grade enrollment                                         |
| pct_disadv | Percent disadvantaged  | Percent of students in the school from "disadvantaged backgrounds" |
| avgread    | Average reading/verbal | Average composite reading score in the class                       |
| avgmath    | Average math           | Average composite math score in the class                          |
| readsize   | Reading size           | Number of students who took the reading test                       |
| mathsize   | Math size              | Number of students who took the math test                          |

The test scores come from a national testing program conducted between 1991 and 1992.
In June 1991, all fourth and fifth graders were given standardized tests to evaluate mathematics and (Hebrew) reading skills.
The scores are calculated as a composite (selected basic and all advanced questions) with a scale from 1 to 100.
Similar tests were taken by third graders in 1992.
The data sets contain a PD (percent disadvantaged) index for each school.
It is the average of the students' socioeconomic status which is measured based on pupils' fathers' education and continent of birth, and family size.
Schools with more disadvantaged students receive more funding per student.

Angrist and Lavy ([1999](#Angrist_1999)) restrict their study to students in Jewish public schools (including both secular and religious school types).
This excludes Arab schools (no PD index available) and independent religious schools (curriculum differs considerably).

The final data link information from different sources.
Except for grade enrollment all information is from an administrative source (mainly the Ministry of Education).
Enrollment at the beginning of the school year in September, however, is reported by school officials to the Central Bureau of Statistics.
This knowledge is crucial for a later investigation of manipulative behavior.
Further details are provided in the Data Appendix of Angrist and Lavy ([1999](#Angrist_1999)).

Most of the analysis excludes the third graders.
Also, the corresponding data set is not provided by the authors.
We explain the reason for this in Appendix [1](#Appendix_1).
Briefly, scores for the third grade are much higher and there exists strong evidence for a compromise of the test program.

<a id="Descriptive_statistics"></a>
### 2.2 Descriptive statistics

As reported in Panel A of Table [I](#table1) the sample consists of 2019 5th grade and 2049 4th grade classes in approximately 1000 schools.
The average class size is about 30 with about 78 pupils being enrolled.
Ten percent of classes have more than 37 pupils.
On average 14% of the students are from a disadvantaged background as defined by the PD index, but the variation is strong.
For both grades measured average math scores are lower with a stronger dispersion.
Overall, the mean score distributions are similar, which is also visible in Figure [EXT_1](#figure1_EXT).

Panel B shows the same statistics, but for a specific subsample of the original data.
Only schools with enrollment close to the discontinuities (40, 80, 120) are considered.
Less than one quarter falls into this +/- 5 discontinuity sample, with a class size that is a bit larger.
Otherwise, it does not appear that these schools are in some way special (almost no difference in the score distribution).

<a id="table1"></a>
<center>TABLE I <br> Unweighted Descriptive Statistics</center>

A. Full sample

In [3]:
table1(data = grade5)
table1(data = grade4)

0,1,2,3,4,5,6,7
,,,,,,,
Variable,Mean,S.D.,0.1,0.25,0.5,0.75,0.9
,,,,,,,
Class Size,29.9,6.5,21.0,26.0,31.0,35.0,38.0
Enrollment,77.7,38.8,31.0,50.0,72.0,100.0,128.0
Percent disadvantaged,14.1,13.5,2.0,4.0,10.0,19.5,35.0
Reading size,27.3,6.6,19.0,23.0,28.0,32.0,36.0
Math size,27.7,6.6,19.0,23.0,28.0,33.0,36.0
Average verbal,74.4,7.7,64.2,69.9,75.4,79.8,83.3
Average math,67.3,9.6,54.9,61.1,67.8,74.1,79.4


0,1,2,3,4,5,6,7
,,,,,,,
Variable,Mean,S.D.,0.1,0.25,0.5,0.75,0.9
,,,,,,,
Class Size,30.3,6.3,22.0,26.0,31.0,35.0,38.0
Enrollment,78.3,37.7,30.8,51.0,74.0,101.0,127.0
Percent disadvantaged,13.8,13.4,2.0,4.0,9.0,19.0,35.0
Reading size,27.7,6.5,19.0,24.0,28.0,32.0,36.0
Math size,28.1,6.5,19.0,24.0,29.0,33.0,36.0
Average verbal,72.5,8.0,62.2,67.7,73.3,78.2,82.0
Average math,68.9,8.8,57.5,63.6,69.3,75.0,79.4


B. +/- 5 Discontinuity sample (enrollment 36–45, 76–85, 116–125), **Extended**

In [4]:
table1(data = grade5_sub_sample_5)
table1(data = grade4_sub_sample_5)

0,1,2,3,4,5,6,7
,,,,,,,
Variable,Mean,S.D.,0.1,0.25,0.5,0.75,0.9
,,,,,,,
Class Size,30.8,7.4,21.0,24.0,31.0,38.0,40.0
Enrollment,76.4,29.5,41.0,43.0,79.0,85.0,120.0
Percent disadvantaged,13.6,13.2,2.0,4.0,10.0,17.0,36.0
Reading size,28.1,7.3,18.0,22.0,28.0,35.0,38.0
Math size,28.5,7.4,18.0,22.0,28.0,35.0,38.0
Average verbal,74.5,8.2,63.8,69.7,75.6,80.5,83.6
Average math,67.0,10.2,54.6,60.9,67.4,73.7,80.0


0,1,2,3,4,5,6,7
,,,,,,,
Variable,Mean,S.D.,0.1,0.25,0.5,0.75,0.9
,,,,,,,
Class Size,31.1,7.2,21.0,25.0,32.0,38.0,40.0
Enrollment,78.5,30.0,41.0,43.0,80.0,116.0,120.0
Percent disadvantaged,12.9,12.3,1.0,4.0,9.0,17.5,32.0
Reading size,28.3,7.7,18.0,22.0,28.0,35.0,38.0
Math size,28.7,7.7,18.0,23.0,29.0,35.0,38.0
Average verbal,72.5,7.8,62.0,67.0,73.3,78.3,81.7
Average math,68.7,9.1,56.9,62.7,69.3,75.4,79.7


In [5]:
# figure1_EXT()

<figure>
<center>
    <img src="materials/figures/figure1_EXT.png" width="600" />
    <figcaption>FIGURE EXT_1 <br>
        Average Test Scores in Math and Reading Compared between 4th and 5th Graders
    </figcaption>
<a id="figure1_EXT"></a>

<a id="Identification"></a>
***
## 3 Identification
***

Even though the question "What is the effect of class size on student performance?" at a first glance may appears simple to answer, past research has shown that capturing the net (causal) effect comes along with different issues of identification.

The fundamental problem is that class size (the treatment) is not randomly assigned.
Thus, classes of different sizes can be expected to differ considerably in their composition and a naïve comparison of achievement would not be valid for inference due to the selection bias.
The selection bias is likely to happen in two ways:
On the one hand, certain parents (e. g., more educated, ambitious and/or wealthier) may actively seek to place their children in schools offering smaller classes.
On the other hand, principals tend to group the least able students into smaller classes (Cohen-Zada et al., [2013](#Cohen-Zada_2013)).
In the first case we would observe spurious positive effects of smaller classes, in the latter spurious lower achievement.
To credibly account for the selection bias, exogenous variation in class size is needed, i. e., variation that is beyond the control of any involved stakeholder (students, teachers and school administrators).
Angrist and Lavy ([1999](#Angrist_1999)) rely on the Maimonides rule as the source of variation.

Figure [EXT_2a](#figure2_EXTa) summarizes the underlying identification strategy in the form of a causal graph.
We can see that the effect of interest, class size on test score (the outcome), is confounded by two observables, total grade enrollment and socioeconomic status (measured as the share of disadvantaged students at school).
The linkage between socioeconomic status and academic performance is well known in the literature (e. g., Saifi and Mehmood, [2011](#Saifi_2011)).
Enrollment correlates with the outcome independently of its effect on class size as larger schools in Israel are more likely to be located in relatively prosperous urban centers with a better intake of students (Angrist and Lavy, [1999](#Angrist_1999), p. 544).
Besides, the relationship between class size and achievement is due to further confounding factors, which are unobservable (e. g., innate ability or parental preferences for child education).
Because conditioning allows to determine causal effects only in the presence of observed confounding, the backdoor criterion is not satisfied (open path: Class size $\leftarrow$ Unobservables $\rightarrow$ Test score) and the causal effect is not identifiable.

However, if Maimonides' rule constitutes a valid IV as argued by the authors, we can nevertheless identify the effect in the given setting. Note that, as apparent in Figure [EXT_2a](#figure2_EXTa), the instrument is conditional on enrollment.
A valid IV has to fulfill two assumptions (see, e. g., Angrist and Pischke, [2009](#Angrist_2009), Chapter 4.1): Relevance and the exclusion restriction.

$\enspace$ (i) Relevance: Maimonides' rule is correlated with class size, Corr( Maimonides' rule, Class size | Enrollment ) $\neq$ 0 

From the nature of the rule it is intuitive that actual class size is determined to some degree by the instrument.
We will see later, graphically as well as quantitatively, that Maimonides' rule indeed creates a strong first stage.

$\enspace$ (ii) Exclusion restriction: Maimonides' rule affects the score only through class size, in particular, Corr( Maimonides' rule, Unobservables | Enrollment ) $=$ 0

The exclusion restriction cannot be tested with the data (e. g., Morgan and Winship, [2014](#Morgan_2014), Chapter 9).
Therefore, it has to be argued verbally that Maimonides' rule does not cause achievement through another channel.
Figure [EXT_2b](#figure2_EXTb) demonstrates a scenario where the instrument is correlated with unobservables.
This can happen if enrollment (the running variable in the RDD) is manipulated and children are successfully placed in grade enrollments just above the cutoff.
As a consequence, the former identification strategy would break down. 
Angrist and Lavy ([1999](#Angrist_1999)) argue that in practice there is no way for parents to predict enrollment by the time school starts when registering their child in school (p. 550).
And even if, Israeli pupils must attend a school in their local registration area that typically includes only one religious and one secular school (p. 542).
Still, it remains an untestable identifying assumption.
We will investigate potential indication for a violation of the assumption in Section [6](#Maimonides_rule_redux). 

In summary, Angrist and Lavy ([1999](#Angrist_1999)) present a credible identification strategy to determine the causal effect of CSR.

<figure>
<center>
    <img src="causal_graph/causal_graph.png" width="700" />
    <figcaption>FIGURE EXT_2a <br>
        Causal Graph Illustrating the Identification Strategy in Angrist and Lavy (<a href="#Angrist_1999"                             class="link">1999</a>)
    </figcaption>
<a id="figure2_EXTa"></a>

<figure>
<center>
    <img src="causal_graph/causal_graph_alternative.png" width="700" />
    <figcaption>FIGURE EXT_2b <br>
        Alternative Causal Graph
    </figcaption>
<a id="figure2_EXTb"></a>

<a id="Empirical_strategy"></a>
***
## 4 Empirical strategy
***

Although the available data for the fourth and fifth graders are at class-level, a model for individual test scores is used as a starting point to describe the causal relationships to be estimated:

<a id="eq_1"></a>
\begin{align}
    y_{isc} = X^\top_s \beta + \alpha n_{sc} + \mu_c + \eta_s + \epsilon_{isc} \, , \tag{1}
\end{align}

where $y_{isc}$ is the score (math or reading) for student $i$ in class $c$ and school $s$.
The vector $X_s$ consists of school characteristics (including functions of enrollment and most often the PD index), $n_{sc}$ is the class size and $\mu_c$, $\eta_s$ and $\epsilon_{isc}$ are random error components.
The i.i.d. errors $\mu_c$ and $\eta_s$ reflect possible within-class and within-school correlations of test scores, respectively.
Lastly, the remaining error component $\epsilon_{isc}$ is pupil-specific.
The coefficient of interest is $\alpha$.

Angrist and Lavy ([1999](#Angrist_1999)) interpret Equation [(1)](#eq_1) as a description of the average potential outcomes of students under alternative assignments of $n_{sc}$ (class size), controlling for any effects of $X_s$ (school characteristics).
However, in practice only one potential outcome is always observed ("fundamental problem of causal inference").
Equation [(1)](#eq_1) assumes a linear causal response function with a constant class size coefficient.
Such a response function, i. e., linear and homogeneous, is highly restrictive and almost certainly not an accurate description of the true (complex) structure.
Angrist and Lavy ([1999](#Angrist_1999)) discuss this in Section V.

Since the micro data are not available, Equation [(1)](#eq_1) needs to be aggregated:

<a id="eq_2"></a>
\begin{align}
    \bar{y}_{sc} = X^\top_s \beta + \alpha n_{sc} + \eta_s + [ \mu_c + \bar{\epsilon}_{sc} ] \, . \tag{2}
\end{align}

The outcome $\bar{y}_{sc}$ is now the average test score of class $c$ in school $s$ with $[ \mu_c + \bar{\epsilon}_{sc} ]$ as the class-level error term.
Equation [(2)](#eq_2) is then used for OLS estimates and as the second stage for the IV estimation.
Regarding the error components, there are two aspects.
First, due to the random-effects error structure weighted least squares with class sizes as weights does not yield the efficient generalized least squares (GLS) estimator. Thus, Angrist and Lavy ([1999](#Angrist_1999)) treat the grouped errors as homoskedastic and report conventional/unweighted estimates.
Second, to adjust the standard errors for clustering (correlation between classes within schools) the authors rely on the Moulton factor (Moulton, [1986](#Moulton_1986)).
Rather than reporting these standard errors, in our replication we make use of a more modern and efficient cluster adjustment approach (for the exact formula consult the $\mathsf{R}$ code and the [mathematical appendix](https://declaredesign.org/r/estimatr/articles/mathematical-notes.html) of the "estimatr" package).
The resulting standard errors are in general slightly larger than the Moulton ones.

As afore-mentioned, Maimonides' rule creates a fuzzy regression discontinuity design with discontinuities at enrollment integer multiples of 40.
This leads to a Two Stage Least Squares (2SLS) estimation strategy where IV estimates of Equation [(2)](#eq_2) exploit the sharp drops in Maimonides' rule.
To identify the causal effect of class size any other effects of enrollment (the running variable) on test scores have to be controlled for.
This is why Angrist and Lavy ([1999](#Angrist_1999)) include different smooth functions of enrollment in the estimation procedure (linear, quadratic and piecewise linear).
The first stage of [(2)](#eq_2) is given by 

<a id="eq_3"></a>
\begin{align}
    n_{sc} = X^\top_s \pi_0 + \pi_1 \text{f}_{sc} + \xi_{sc} \tag{3}
\end{align}

with $\pi_0$ and $\pi_1$ being parameters and $\xi_{sc}$ as the regression error term.
The instrument $\text{f}_{sc}$ is the class size function (or predicted class size) induced by Maimonides' rule.
The function can be stated formally as

\begin{align*}
    \text{f}_{sc} = \frac{e_s}{\text{int} \left( \frac{e_s - 1}{40} \right) + 1}
\end{align*}

with $e_s$ as the beginning-of-the-year enrollment in school $s$ in a given grade.
The function is depicted in Figure [I](#figure1) below.

Since the source of identifying information are the discontinuities, Angrist and Lavy ([1999](#Angrist_1999)) create a +/- 5 discontinuity sample (schools with enrollment in [36, 45], [76, 85] or [116, 125]) and conduct some estimations for this specific subsample.
As part of our own contributions we narrow the full sample further to a +/- 3 subsample.

<a id="Replication_of_core_results"></a>
***
## 5 Replication of core results
***

<a id="Graphical_analysis"></a>
### 5.1 Graphical analysis 

The graphical analysis begins with a plot of class size by enrollment.
Figure [I](#figure1) shows class size as predicted by Maimonides' rule and as observed during the school year for fifth (Panel a) and fourth (Panel b) graders.
Overall, actual average class size follows the class size function.
That is, class size increases approximately linearly with enrollment until integer multiples of 40 and drops then sharply.
The link is worse for enrollment levels above 160 though.
It is also clearly visible that average class size is generally smaller than predicted by a strict compliance to the rule since
some schools open additional classes earlier.
For example, schools with a high PD index receive extra funding from the Ministry of Education which can be used to set up a new class (Angrist and Lavy, [1999](#Angrist_1999), p. 542).
Even though Maimonides' rule is not the only source of variation in class size, the figure provides evidence for a strong first stage.

In [6]:
# figure1()

<figure>
<center>
    <img src="materials/figures/figure1.png" width="800" />
    <figcaption>FIGURE I <br>
        Class Size in 1991 by Initial Enrollment Count, Actual Average Size and as Predicted by Maimonides' Rule
    </figcaption>
<a id="figure1"></a>

<div class="alert alert-info">
<strong>Remark:</strong>
In the original Figure I (Angrist and Lavy, <a href="#Angrist_1999" class="link">1999</a>, p. 541), the function induced by Maimonides' rule is depicted incorrectly from an enrollment count of 160. In particular, in Panel b the last kink lies below the class size of 40.
</div>

Furthermore, Maimonides' rule is correlated with the average test scores.
This is illustrated in Figure [II](#figure2), which plots the average reading score and the average predicted class size for enrollment intervals of ten (enrollment ticks show interval midpoints; the last interval is [161, 190)).
First, we notice the positive trend between enrollment and scores.
Test scores tend to be higher in schools with larger enrollment.
Thus, in general test scores are higher for larger predicted classes.
Better schools might have higher enrollments because they attract more students.
Also, Angrist and Lavy ([1999](#Angrist_1999), p. 544) point out that larger schools in Israel are more likely located in relatively prosperous cities where children exhibit a higher socioeconomic status.
Indeed, Figure [EXT_3](#figure3_EXT) reveals that for both grades higher enrollment is on average accompanied by a lower percent disadvantaged. The correlation coefficients are $\rho = -0.32$ (5th grade) and $\rho = -0.30$ (4th grade).
Second, ignoring the trend there appears to exist a distinctive connection between the two curves, namely a mirroring up-and-down pattern.
When predicted class size increases, average reading scores decrease (and vice versa).
Both observations indicate the importance of enrollment (and also the PD index) as a control.

In [7]:
# figure2()

<figure>
<center>
    <img src="materials/figures/figure2.png" width="800" />
    <figcaption>FIGURE II <br>
        Average Reading Scores by Enrollment Count, and the Corresponding Average Class Size Predicted by Maimonides' Rule         </figcaption>
<a id="figure2"></a>

<p style="padding: 10px; border: 1px solid black;">
<strong>Note:</strong>
The first enrollment interval [1, 11) does not contain any observations for the fourth grade.
</p>

In [8]:
# figure3_EXT()

<figure>
<center>
    <img src="materials/figures/figure3_EXT.png" width="800" />
    <figcaption>FIGURE EXT_3 <br>
        Enrollment and the Share of Disadvantaged Students for Schools with Fourth and Fifth Grades, respectively. <br>
        Also Shown is the Line of Best Fit and the Correlation Coefficient $\rho$.
    </figcaption>
<a id="figure3_EXT"></a>

In fact, after "detrending" average test scores and predicted class sizes the negative association emerges.
Figure [III](#figure3) plots the resulting residuals from regressions on average enrollment and average percent disadvantaged for each interval of ten.
The mirror-image relationship is present for the reading scores of both grades (Panel a and b) and the math scores in the fifth grade (Panel c).
On the contrary, the pattern does not show up for the math scores of fourth graders (Appendix [2](#figureA2_EXT)).

In [9]:
# figure3()

<figure>
<center>
    <img src="materials/figures/figure3.png" width="800" />
    <figcaption>FIGURE III <br>
        Average Test (Reading/Math) Scores and Predicted Class Size by Enrollment, <br>                                                 Residuals from Regressions on Percent Disadvantaged and Enrollment
    </figcaption>
<a id="figure3"></a>

This first graphical analysis suggests a causal impact of class size on test scores, which now needs to be formalized in the following as outlined in Section [4](#Empirical_strategy). 

<a id="Estimation_results"></a>
### 5.2 Estimation results

Table [II](#table2) reports OLS estimates for both grades and tests in accordance with Equation [(2)](#eq_2).
Without any controls the estimates suggest a strong positive relationship between class size and test scores.
For example, the class size coefficient for the reading scores in the fifth grade is a positive 0.223 (standard error = 0.034).
The estimated coefficient is larger for the math test in both grades.
As described in the graphical analysis, enrollment and the PD index are important controls.
Including both in the model specification (columns (3) and (6)) leads to insignificant correlations close to zero.
However, the OLS estimates can be expected to have pronounced selection bias as class size is endogenous and likely correlated with the error components in Equation [(2)](#eq_2).
As a consequence, the estimates cannot be used for causal inference.

<a id="table2"></a>
<center>TABLE II <br> OLS Estimates for 1991</center>

In [10]:
table2(data = grade5_reg)
table2(data = grade4)

0,1,2,3,4,5,6
,,,,,,
,5th Grade,5th Grade,5th Grade,5th Grade,5th Grade,5th Grade
,,,,,,
,Reading comprehension,Reading comprehension,Reading comprehension,Math,Math,Math
,(1),(2),(3),(4),(5),(6)
,,,,,,
Class size,.223,-.031,-.025,.322,.076,.019
,(.034),(.026),(.033),(.040),(.036),(.042)
Percent disadvantaged,,-.350,-.350,,-.340,-.332
,,(.014),(.015),,(.018),(.019)


0,1,2,3,4,5,6
,,,,,,
,4th Grade,4th Grade,4th Grade,4th Grade,4th Grade,4th Grade
,,,,,,
,Reading comprehension,Reading comprehension,Reading comprehension,Math,Math,Math
,(1),(2),(3),(4),(5),(6)
,,,,,,
Class size,.141,-.053,-.040,.221,.055,.009
,(.035),(.028),(.032),(.039),(.036),(.040)
Percent disadvantaged,,-.339,-.341,,-.289,-.281
,,(.015),(.016),,(.017),(.017)


<div class="alert alert-info">
<strong>Remark:</strong>
In the original Table II (Angrist and Lavy, <a href="#Angrist_1999" class="link">1999</a>, p. 551), some of the listed mean scores and standard deviations (not shown in our replication tables) are wrong. The correct values are given in the above Table <a href="#table1" class="link">I</a>. 
</div>

In Table [III](#table3) estimates of the first stage and the reduced form are presented.
Equation [(3)](#eq_3) states the formula for the first stage and Figure [I](#figure1) the graphical counter part.
The plot belonging to the reduced form results is Figure [III](#figure3).
Panel A refers to the full sample, whereas Panel B reports for the +/- 5 discontinuity sample.

The first two columns for the full sample show a strong effect of $\text{f}_{sc}$ on class size (the coefficient ranges from 0.54 to 0.77).
More than half of the variation in actual class size is explained.
That weak identification is not a concern in our setting is also concluded by Feir et al. ([2016](#Feir_2016), Section 4).
For the reduced form a negative association is precisely estimated for the reading scores and math scores (if enrollment control is included) of the fifth graders.
For example, the estimate in column (4) for the 5th grade implies that a reduction in predicted class size of ten students is associated with a 1.5 point boost in average reading test scores.
Concerning the fourth grade, the estimated negative relationship is significant (although smaller) for reading comprehension but not for mathematics (as already seen in Figure [EXT_A2](#figureA2_EXT)).

Figure [I](#figure1) has shown that the connection between predicted and actual class size is less strong near the discontinuities.
Mathematically, this is expressed by a smaller estimated first stage coefficient as well as a smaller R² as given in the lower panel, in particular for the fifth grade.
Still, also for the +/- 5 discontinuity sample there is no real issue with a weak instrument (as supported by the F scores in the 2SLS procedure).
For the fifth graders the reduced form estimates are now larger in magnitude but also less precisely estimated (due to shrunken sample size).
For the fourth graders, however, estimates for the reading scores are no longer significant.
Moreover, both math coefficients in column (5) and (6), respectively, are positive.

<a id="table3"></a>
<center>TABLE III <br> Reduced-form Estimates for 1991</center>

A. Full sample

In [11]:
table3(data = grade5_reg)
table3(data = grade4)

0,1,2,3,4,5,6
,,,,,,
,5th Graders,5th Graders,5th Graders,5th Graders,5th Graders,5th Graders
,,,,,,
,Class size,Class size,Reading comprehension,Reading comprehension,Math,Math
,(1),(2),(3),(4),(5),(6)
,,,,,,
fsc,.703,.541,-.111,-.150,-.009,-.125
,(.025),(.037),(.029),(.039),(.040),(.051)
Percent disadvantaged,-.077,-.054,-.359,-.354,-.354,-.337
,(.011),(.010),(.014),(.015),(.018),(.019)


0,1,2,3,4,5,6
,,,,,,
,4th Graders,4th Graders,4th Graders,4th Graders,4th Graders,4th Graders
,,,,,,
,Class size,Class size,Reading comprehension,Reading comprehension,Math,Math
,(1),(2),(3),(4),(5),(6)
,,,,,,
fsc,.772,.670,-.085,-.089,.038,-.033
,(.022),(.033),(.031),(.040),(.040),(.050)
Percent disadvantaged,-.054,-.039,-.340,-.340,-.292,-.282
,(.009),(.009),(.015),(.016),(.017),(.017)


B. Discontinuity sample

In [12]:
table3(data = grade5_sub_sample_5)
table3(data = grade4_sub_sample_5)

0,1,2,3,4,5,6
,,,,,,
,5th Graders,5th Graders,5th Graders,5th Graders,5th Graders,5th Graders
,,,,,,
,Class size,Class size,Reading comprehension,Reading comprehension,Math,Math
,(1),(2),(3),(4),(5),(6)
,,,,,,
fsc,.481,.346,-.197,-.202,-.089,-.154
,(.057),(.062),(.050),(.059),(.072),(.079)
Percent disadvantaged,-.130,-.067,-.424,-.422,-.435,-.405
,(.033),(.028),(.035),(.036),(.040),(.041)


0,1,2,3,4,5,6
,,,,,,
,4th Graders,4th Graders,4th Graders,4th Graders,4th Graders,4th Graders
,,,,,,
,Class size,Class size,Reading comprehension,Reading comprehension,Math,Math
,(1),(2),(3),(4),(5),(6)
,,,,,,
fsc,.625,.503,-.061,-.075,.059,.012
,(.048),(.061),(.059),(.063),(.079),(.080)
Percent disadvantaged,-.068,-.029,-.348,-.343,-.306,-.291
,(.029),(.027),(.035),(.038),(.040),(.042)


The next two tables form the main part of the study by Angrist and Lavy ([1999](#Angrist_1999)).
Reported are IV estimates for the effect of class size (instrumented by Maimonides' rule) on test scores, as discussed in Section [4](#Empirical_strategy).
Table [IV](#table4) shows the estimates for the 5th grade and Table [V](#table5) for the 4th grade.
Originally, the estimation is conducted on the full and the +/- 5 discontinuity sample.
To further check the sensitivity of the findings we extend the tables to also include results for a +/- 3 sample.
That is, we reduce the bandwidth such that only schools with grade enrollment in the set {[38–43], [78–83], [118–123]} are considered.
This sample includes about 15% of the original schools for the fifth grade and about 13% for the fourth grade.

To capture the causal effect of class size in the RDD, controls for the running variable (enrollment) need to be adequate in order to eliminate any other effects of enrollment on test scores.
Therefore, Angrist and Lavy ([1999](#Angrist_1999)) include different smooth functions of enrollment.
In column (4) of the tables below the model specification only includes a continuous piecewise linear trend that mirrors the slope of the class size function $\text{f}_{sc}$ on the linear segments.
This trend is defined as:

\begin{align*}
    &e_s \, , &e_s &\in [0, 40] \\
    &20 + \frac{e_s}{2} \, , &e_s &\in [41, 80] \\
    &\frac{100}{3} + \frac{e_s}{3} \, , &e_s &\in [81, 120] \\
    &\frac{130}{3} + \frac{e_s}{4} \, , &e_s &\in [121, 160] \, .
\end{align*}

The idea is that no additional controls have to be included once the trend effects of enrollment are fully controlled (Angrist and Lavy, [1999](#Angrist_1999), p. 555).

We first discuss results for the fifth grade in Table [IV](#table4).
The IV estimates for the class size effect on the reading scores with parametric controls range from -0.158 (no enrollment control) to -0.276 (linear enrollment control) and are precisely estimated.
The estimate for the model including only the piecewise linear trend also shows a negative association of the same magnitude but is less precise.
Results for the math scores are similar except for the specification without enrollment control (column (1)).
The corresponding estimate is essentially zero.
The estimates for the discontinuity samples are in general substantially larger with larger standard errors.
For reading comprehension there is not much difference between the two subsamples, all estimates are statistically different from zero (despite the smaller sample size).
The same, however, does not hold for mathematics.
All coefficients are insignificant and the magnitudes are lower for the smallest bandwidth.

In sum, the estimates for the 5th grade strongly suggest that smaller classes increase test scores.
The effect size for an eight pupil reduction (as in the STAR experiment) is ca. 0.29$\sigma$ (2.2 points) using the coefficient -0.276 from column (2).

<a id="table4"></a>
<center>TABLE IV, <b>Extended</b> <br> 2SLS Estimates for 1991 (Fifth Graders)</center>

In [13]:
table4and5(data = grade5_reg, test = "reading")
table4and5(data = grade5_reg, test = "math")

0,1,2,3,4,5,6,7,8
,,,,,,,,
,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension
,,,,,,,,
,Full sample,Full sample,Full sample,Full sample,+/- 5 Discontinuity sample,+/- 5 Discontinuity sample,+/- 3 Discontinuity sample,+/- 3 Discontinuity sample
,(1),(2),(3),(4),(5),(6),(7),(8)
,,,,,,,,
Class size,-.158,-.276,-.263,-.188,-.410,-.582,-.375,-.521
,(.042),(.076),(.094),(.122),(.118),(.205),(.139),(.227)
Percent disadvantaged,-.371,-.369,-.369,,-.477,-.461,-.467,-.448
,(.016),(.016),(.016),,(.048),(.046),(.061),(.057)


0,1,2,3,4,5,6,7,8
,,,,,,,,
,Math,Math,Math,Math,Math,Math,Math,Math
,,,,,,,,
,Full sample,Full sample,Full sample,Full sample,+/- 5 Discontinuity sample,+/- 5 Discontinuity sample,+/- 3 Discontinuity sample,+/- 3 Discontinuity sample
,(1),(2),(3),(4),(5),(6),(7),(8)
,,,,,,,,
Class size,-.013,-.231,-.264,-.205,-.185,-.443,-.071,-.293
,(.058),(.098),(.123),(.145),(.155),(.250),(.175),(.267)
Percent disadvantaged,-.355,-.350,-.350,,-.459,-.435,-.451,-.421
,(.020),(.020),(.020),,(.052),(.050),(.059),(.056)


Looking at the first part of Table [V](#table5), effects are smaller.
The IV estimates range from -0.074 (quadratic enrollment control) to -0.147 (piecewise linear trend).
In contrast to Table [IV](#table4), the findings do not seem to be sensitive to the sample choice (though estimates for the discontinuity samples are all imprecise).
The reported results for the math test are much weaker.
In fact, all estimated coefficients are markedly insignificant.
Additionally, half of the estimates are even positive.

In sum, the estimates for the 4th grade reading scores suggest to some degree a positive causal impact of class size reduction.
However, results for mathematics show no relationship.
The effect size for an eight pupil reduction is ca. 0.13$\sigma$ (1.1 points) using the coefficient -0.133 from column (2). 

<a id="table5"></a>
<center>TABLE V, <b>Extended</b> <br> 2SLS Estimates for 1991 (Fourth Graders)</center>

In [14]:
table4and5(data = grade4, test = "reading")
table4and5(data = grade4, test = "math")

0,1,2,3,4,5,6,7,8
,,,,,,,,
,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension,Reading comprehension
,,,,,,,,
,Full sample,Full sample,Full sample,Full sample,+/- 5 Discontinuity sample,+/- 5 Discontinuity sample,+/- 3 Discontinuity sample,+/- 3 Discontinuity sample
,(1),(2),(3),(4),(5),(6),(7),(8)
,,,,,,,,
Class size,-.110,-.133,-.074,-.147,-.098,-.150,-.099,-.179
,(.040),(.061),(.068),(.089),(.095),(.131),(.112),(.170)
Percent disadvantaged,-.346,-.345,-.346,,-.354,-.347,-.375,-.369
,(.016),(.016),(.016),,(.036),(.038),(.044),(.045)


0,1,2,3,4,5,6,7,8
,,,,,,,,
,Math,Math,Math,Math,Math,Math,Math,Math
,,,,,,,,
,Full sample,Full sample,Full sample,Full sample,+/- 5 Discontinuity sample,+/- 5 Discontinuity sample,+/- 3 Discontinuity sample,+/- 3 Discontinuity sample
,(1),(2),(3),(4),(5),(6),(7),(8)
,,,,,,,,
Class size,.049,-.050,-.033,-.098,.095,.023,.046,-.078
,(.052),(.075),(.084),(.099),(.126),(.158),(.154),(.207)
Percent disadvantaged,-.290,-.284,-.284,,-.299,-.290,-.330,-.321
,(.018),(.017),(.017),,(.042),(.042),(.053),(.053)


<div class="alert alert-info">
<strong>Remark:</strong>
In the original Table V (Angrist and Lavy, <a href="#Angrist_1999" class="link">1999</a>, p. 556), the mean score and standard deviation of the math test for the full sample are those of the 5th graders. Also, in column (8) the enrollment coefficient should not have a minus in front. 
</div>

An interesting question is whether the benefits of smaller classes are more associated with certain types of students.
One existing theory says that class size reductions are more effective for disadvantaged students.
This question is also particularly of practical relevance.
Given the high costs of CSR programs, it is desirable to identify and target the student groups or class compositions that have the highest treatment effect.

Table [VII](#table7), therefore, lists 2SLS estimates with percent disadvantaged interaction terms where f$_{sc}$<span>&#42;</span>PD serves as a second instrument.
Besides grade-specific estimates, Angrist and Lavy ([1999](#Angrist_1999)) compute pooled estimates for increased precision.
The interaction coefficient for the single grades in columns (1) to (4) is always negative (although significant for 5th graders only).
The correlation is stronger for the fifth grade.
Pooled estimates without interaction term yield a relatively strong main effect that is also statistically different from zero for math.
If the interaction class size<span>&#42;</span>PD is included, the main coefficient for the math test shrinks to near zero while the interaction term comes out as significant.
Note that, in contrast to Angrist and Lavy ([1999](#Angrist_1999)), our cluster adjustment makes the estimated interaction coefficient for reading (marginally) insignificant (column (6)).

Overall, the estimates suggest that the gains from small classes are largest for students from disadvantaged backgrounds.

<a id="table7"></a>
<center>TABLE VII <br> Pooled Estimates and Models with Percent Disadvantaged Interaction Terms</center>

In [15]:
table7(data_4th = grade4, data_5th = grade5_reg)

0,1,2,3,4,5,6,7,8
,,,,,,,,
,5th grade,5th grade,4th grade,4th grade,Pooled estimates,Pooled estimates,Pooled estimates,Pooled estimates
,Reading,Math,Reading,Math,Reading,Reading,Math,Math
,(1),(2),(3),(4),(5),(6),(7),(8)
,,,,,,,,
Class size,-.157,-.082,-.101,.019,-.197,-.121,-.128,-.020
,(.072),(.104),(.065),(.082),(.051),(.053),(.065),(.070)
Percent disadvantaged,-.162,-.092,-.288,-.162,-.355,-.222,-.315,-.126
,(.096),(.113),(.084),(.099),(.014),(.072),(.016),(.088)
Grade 4,,,,,-1.931,-1.896,1.512,1.561


<a id="Maimonides_rule_redux"></a>
***
## 6 Maimonides' rule redux
***

This section is concerned with the validity of the identification strategy and hence the reliability of the estimated results.
As sketched in the alternative causal graph (Figure [EXT_2b](#figure2_EXTb)), Maimonides' rule might be correlated with unobservables that are themselves related to test scores.
In that case, predicted class size no longer constitutes a valid IV and a causal interpretation of the class size estimates is not possible (since it is not clear which proportion of the achievement variation is due to class size changes).
Such correlations can originate from manipulative behavior of involved stakeholders who exploit the statutory class size caps resulting in manipulated enrollment ("manipulation of the running variable").

Angrist and Lavy ([1999](#Angrist_1999)) assume that enrollment manipulation is not a real concern.
They argue that it is not possible for parents to predict enrollment at the time school starts (e. g., enrollment of 41 may drop to 39).
And even if, Israeli pupils in general must attend a neighborhood school (Angrist and Lavy, [1999](#Angrist_1999), p. 542).
Thus, parents would have to move to another school district or opt for private schooling.
However, Otsu et al. ([2013](#Otsu_2013), Section 5) provide evidence for sorting around the first Maimonides cutoff.
There appear to be too many schools with enrollment just above 40, producing two classes.
Angrist and Lavy ([1999](#Angrist_1999)) focus on parents' behavior as a source of manipulation and do not take teachers and school officials into account.

In a recently published paper ("Maimonides' rule redux", [2019](#Angrist_2019)) Angrist et al. revisit the original results to address the manipulation concern raised by Otsu et al. ([2013](#Otsu_2013)).
Moreover, they conduct a similar analysis for a more recent (2002–2011) and larger sample of Israeli fifth graders.
The analysis yields two main findings.
First, the new data reveal enrollment manipulation near cutoffs.
Second, 2SLS estimates show no evidence of class size effects (precisely estimated zeros), regardless of whether originally reported enrollment or corrected enrollment is used.
In the latter case, enrollment is imputed based on information on grade-eligible birthdates.
Both findings challenge the substantial negative class size effects from 1991.

Angrist et al. ([2019](#Angrist_2019), p. 310) mention an intuitive explanation for the enrollment observation, namely financially-motivated manipulation by school officials:

> A memo from Israeli Ministry of Education (MOE) officials to school leaders cautions headmasters against attempts to increase staffing ratios through enrollment manipulation. In particular, schools are warned not to move students between grades or to enroll those abroad in order to produce an additional class. [...] MOE rules that determine school budgets as an increasing function of the number of classes also reward this sort of manipulation.

Figure [MRR_A6](#figureA6_panelA_MRR) displays the distribution of grade enrollment (the running variable) as reported by school principals at the beginning of the school year.
Indeed, for both grades we see evidence of a jump in the distribution at the first cutoff.
The gap below the Maimonides threshold is more pronounced for the fourth graders.
Furthermore, only the first cutoff seems to be affected.

In [16]:
# figureA6_panelA_MRR()

<figure>
<center>
    <img src="materials/figures/figureA6_panelA_MRR.png" width="1000" />
    <figcaption>FIGURE MRR_A6 <br>
        Distribution of 4th and 5th Grade Enrollment as Reported by School Headmasters at the Beginning of the 1990–91 School           Year
    </figcaption>
<a id="figureA6_panelA_MRR"></a>

<p style="padding: 10px; border: 1px solid black;">
<strong>Note:</strong>
In contrast to Angrist et al. (<a href="#Angrist_2018" class="link">2018</a>), our histograms rely on the cleaned data.
</p>

Figure [EXT_4](#figure4_EXT) plots density estimates allowing for a discontinuity at 41.
The underlying local polynomial density estimator was proposed in Cattaneo et al. ([2020](#Cattaneo_2020)).
For both grades a clear difference in the estimated densities on the two sides of the cutoff is visible.
However, the corresponding density discontinuity test of Cattaneo et al. does not classify the discontinuity as significant due to the estimated large standard error (the test is similar to the McCrary ([2008](#McCrary_2008)) test applied by Angrist et al. ([2019](#Angrist_2019)), but has some better statistical properties).

In [17]:
# figure4_EXT()

<figure>
<center>
    <img src="materials/figures/figure4_EXT.png" width="1000" />
    <figcaption>FIGURE EXT_4 <br>
        Densities Generating Cattaneo et al. (<a href="#Cattaneo_2020" class="link">2020</a>) Tests for Discontinuities at 41 
    </figcaption>
<a id="figure4_EXT"></a>

<p style="padding: 10px; border: 1px solid black;">
<strong>Note:</strong>
The confidence intervals are not centered at the point estimates because they have been bias-corrected. For small enrollment the density point estimates lie outside the confidence intervals as the enrollment distribution exhibits high curvature there. Details are provided in Cattaneo et al. (<a href="#Cattaneo_2021" class="link">2021</a>).    
</p>

The revealed enrollment manipulation does not automatically translate into problems with the Maimonides instrument.
According to Gerard et al. ([2020](#Gerard_2020)), this sorting is innocuous in our application if the students affected by the headmasters' manipulation are similar to those unaffected.
This would be violated if, for example, more sophisticated school leaders, with on average more able children enrolled in their school, engage more often in manipulation (Angrist et al., [2019](#Angrist_2019), p. 316).
To investigate systematic enrollment sorting, Table [MRR_3](#table3_MRR) presents OLS regression estimates of the PD index on the class size function induced by Maimonides' rule.
Overall, the estimates do not suggest a clear relationship.
Estimates for the fifth grade are negative, whereas estimates for the fourth grade are positive.
All estimates are far away from being significant.

<a id="table3_MRR"></a>
<center>TABLE MRR_3 <br> Maimonides' Rule Effects on Socioeconomic Status</center>

In [18]:
table3_MRR(data_4th = grade4, data_5th = grade5)

0,1,2,3,4,5,6
,,,,,,
,Percent disadvantaged,Percent disadvantaged,Percent disadvantaged,Percent disadvantaged,Percent disadvantaged,Percent disadvantaged
,,,,,,
,Fifth grade,Fifth grade,Fifth grade,Fourth grade,Fourth grade,Fourth grade
,(1),(2),(3),(4),(5),(6)
,,,,,,
fsc,-0.0592,-0.0686,-0.0709,0.0532,0.0636,0.0718
,(0.0784),(0.0848),(0.0865),(0.0810),(0.0878),(0.0894)
Enrollment,-0.0598,-0.0475,,-0.0691,-0.0824,
,(0.0152),(0.0448),,(0.0155),(0.0461),


<p style="padding: 10px; border: 1px solid black;">
<strong>Note:</strong>
All regressions in Table <a href="#table3_MRR" class="link">MRR_3</a> also include the school type (secular or religious) as an explanatory variable. The share of students from a disadvantaged background is considerably higher for religious schools.    
</p>

We can also approach the question graphically.
Figure [EXT_5](#figure5_EXT) plots residuals from regressions of the PD index and predicted class sizes on enrollment and school type (secular or religious) for intervals of ten, as was done in Figure [III](#figure3).
In Panel a for the fifth grade we can recognize a mirror-image-like linkage for an enrollment between 45 and 125.
This is in line with the estimated negative coefficient from Table [MRR_3](#table3_MRR).
Panel b does not show any relationship.

In [19]:
# figure5_EXT()

<figure>
<center>
    <img src="materials/figures/figure5_EXT.png" width="800" />
    <figcaption>FIGURE EXT_5 <br>
        Percent Disadvantaged and Predicted Class Size by Enrollment, <br>                                                             Residuals from Regressions on Enrollment and School Type
    </figcaption>
<a id="figure5_EXT"></a>

Even though the last results indicate that pupils are rather randomly affected by the enrollment manipulation, ideally, we would like to exclude the manipulation from the estimations.
However, an enrollment imputation as performed in Angrist et al. ([2019](#Angrist_2019)) is not feasible because individual data are not available.
Instead, as an alternative robustness check the authors apply a donut estimation procedure.
That is, data in a certain range around the first Maimonides cutoff are omitted for the estimation.
These estimates in Table [MRR_A6](#tableA6_MRR) compared to the estimates from Table [IV](#table4) and Table [V](#table5) are overall slightly smaller and less precise.
But essentially the original results are unchanged.

<a id="tableA6_MRR"></a>
<center>TABLE MRR_A6 <br> 2SLS Donuts</center>

<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2">Language</td><td colspan="2">Math</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><b>Panel A. 5th Grade</b></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">Donut: [39, 41]</td><td>-0.2341</td><td>-0.2010</td><td>-0.1947</td><td>-0.2144</td></tr>
<tr><td style="text-align:left"></td><td>(0.0762)</td><td>(0.0954)</td><td>(0.1018)</td><td>(0.1306)</td></tr>
<tr><td style="text-align:left">Donut: [38, 42]</td><td>-0.2406</td><td>-0.2072</td><td>-0.2000</td><td>-0.2213</td></tr>
<tr><td style="text-align:left"></td><td>(0.0776)</td><td>(0.0987)</td><td>(0.1044)</td><td>(0.1368)</td></tr>
<tr><td style="text-align:left">Donut: [37, 43]</td><td>-0.2152</td><td>-0.1696</td><td>-0.1930</td><td>-0.2024</td></tr>
<tr><td style="text-align:left"></td><td>(0.0777)</td><td>(0.0991)</td><td>(0.1054)</td><td>(0.1388)</td></tr>
<tr><td style="text-align:left"><b>Panel B. 4th Grade</b></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">Donut: [39, 41]</td><td>-0.1267</td><td>-0.0581</td><td>-0.0544</td><td>-0.0353</td></tr>
<tr><td style="text-align:left"></td><td>(0.0612)</td><td>(0.0690)</td><td>(0.0749)</td><td>(0.0858)</td></tr>
<tr><td style="text-align:left">Donut: [38, 42]</td><td>-0.1187</td><td>-0.0431</td><td>-0.0438</td><td>-0.0208</td></tr>
<tr><td style="text-align:left"></td><td>(0.0632)</td><td>(0.0719)</td><td>(0.0775)</td><td>(0.0899)</td></tr>
<tr><td style="text-align:left">Donut: [37, 43]</td><td>-0.1166</td><td>-0.0390</td><td>-0.0467</td><td>-0.0227</td></tr>
<tr><td style="text-align:left"></td><td>(0.0649)</td><td>(0.0743)</td><td>(0.0794)</td><td>(0.0927)</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Percent Disadvantaged</td><td>X</td><td>X</td><td>X</td><td>X</td></tr>
<tr><td style="text-align:left">Enrollment</td><td>X</td><td>X</td><td>X</td><td>X</td></tr>
<tr><td style="text-align:left">Enrollment Squared/100</td><td></td><td>X</td><td></td><td>X</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td colspan="5" style="text-align:left">This table reports 2SLS estimates of class size effects omitting data in the intervals indicated. <br> Standard errors are clustered by school.</td></tr>
</table>

In [20]:
# The following function calls return the estimated coefficients and standard errors.
# The above table, however, is created manually (for formatting reasons).

# tableA6_MRR(data = grade5)
# tableA6_MRR(data = grade4)

Taken all parts of this section together, the present enrollment manipulation around the first cutoff does not seem to be a source of bias and thus a threat to the applied identification strategy using Maimonides' rule as an instrument for class size.
The same conclusion is drawn by Arai et al. ([2021](#Arai_2021), Section 5) based on a formal test to assess the validity of a fuzzy RDD.
Angrist et al. ([2019](#Angrist_2019)) hypothesize that the absence of class size effects in the new data is grounded on a change in the education production function. 

<a id="Critical_assessment_and_conclusion"></a>
***
## 7 Critical assessment and conclusion
***

In this project we successfully replicated core results of the paper by Angrist and Lavy ([1999](#Angrist_1999)).
The authors analyze the effect of class size on reading and math test scores for Israeli elementary students.
To identify the causal effect, variation in the enrollment-class-size relationship, created by an institutional rule (Maimonides' rule) that caps classes at size 40, is exploited.
2SLS instrumental variable estimates find positive effects on class size reductions.
Effects are largest for 5th graders (on both tests) and more modest for the reading scores of 4th graders.
The math scores provide little evidence of an association and are not significant.
However, pooled estimates turn out to be significant on both tests.
Compared to the effect sizes found in the famous Tennessee STAR experiment the estimates are at the lower end.
In a follow-up paper the authors revisit the original results to check their credibility since newer precise estimates suggest no association and researchers revealed enrollment manipulation in the old data.
The re-analysis supports the earlier causal interpretations.

We extended the paper mainly in terms of supportive visualization and robustness checks.
The visualization of the authors' identification strategy and a major potential concern using causal graphs might be particularly helpful for interested readers.
Relevant parts of the "Maimonides' rule redux" ([2019](#Angrist_2019)) paper are replicated and connected to the original paper.
In addition, we replaced original standard error estimates with estimates from a more modern cluster adjustment approach.
The new standard errors are generally close but slightly larger.
On balance, our contributions second the findings and validity of the Maimonides identification strategy.

The paper by Angrist and Lavy ([1999](#Angrist_1999)) has been highly influential in the economics of education literature.
Many countries set their class sizes to conform to some version of Maimonides' rule.
That said, a Maimonides research design has been applied by different scholars over the last two decades.
For example, in France by Gary-Bobo and Mahjoub ([2013](#Gary-Bobo_2013)), in Poland by Jakubowski and Sakowski ([2006](#Jakubowski_2006)) or in Bolivia by Urquiola ([2006](#Urquiola_2006)).
The identification strategy used by Angrist and Lavy ([1999](#Angrist_1999)) is unusual but at the same time clear and simple, leading naturally to 2SLS estimation.
Methodologically, the study provides an example of how a fuzzy regression discontinuity can be analyzed in an IV framework.
It is also worth to acknowledge that the authors revisit the results 20 years later to check their robustness thoroughly.

One limitation of the original paper is that manipulation of the running variable (grade enrollment) is only considered verbally and from the parents' perspective.
Consequently, the authors fail to detect the manipulation at the first cutoff.
Besides, as also acknowledged (Angrist and Lavy, [1999](#Angrist_1999), Section VI), the estimated effects are likely not one-to-one transferable to other countries because Israeli classes are large.
The mean class size in the data is 30 with ten percent of classes having more than 37 pupils.
Meta-results for Europe are weak and insignificant (e. g., Wößmann ([2005](#Wößmann_2005)) and Shen and Konstantopoulos ([2017](#Shen_2017))).
Lastly, it appeared very reasonable to exclude the third graders from most of the analysis and interpretation.
A formal investigation of a possible structural difference in the 3rd graders' test scores, unfortunately, cannot be undertaken as the test program only existed for the two years (1991–1992) and was then abandoned for political reasons.
In hindsight, equipped with the new estimates from Angrist et al. ([2019](#Angrist_2019)), the zero-effect estimates for the third grade are noteworthy.

Still, Angrist and Lavy ([1999](#Angrist_1999)) constitutes a seminal paper that majorly contributed to the CSR literature.
The authors pioneered the use of legislative class size ceilings as a potential source of exogenous variation to credibly determine the causal effect of class size changes.
This led to plenty of subsequent research.

A possibly interesting future research direction is to study the effect of class size on academic achievement if teaching is conducted remotely. In particular, in which ways exactly a change in class size may impact on the performance compared to classroom teaching.

<a id="Appendix"></a>
***
## Appendix
***

<a id="Appendix_1"></a>
### Appendix 1: 3rd grade

Angrist and Lavy ([1999](#Angrist_1999)) exclude the third graders from the main part of their study.
They hypothesize that specific test preparation reduced the information about pupils' abilities in the scores and therefore class size effects are absent.

A comparison of the descriptive statistics between the grades shows that for both tests mean scores are substantially higher (see Figure [EXT_A1](#figureA1_EXT)) and standard deviations are lower for the third grade classes.
For example, the overall mean math score is more than 15 points higher.
Particularly striking is that 25% of all classes have a reading score of at least 90 (89 points for math).

<center>TABLE I <br> Unweighted Descriptive Statistics</center>

<table style="text-align:center"><caption><strong>3rd grade: 2111 classes, 1011 schools, tested in 1992</strong></caption>
<tr><td colspan="8" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Variable</td><td>Mean</td><td>S.D.</td><td>0.10</td><td>0.25</td><td>0.50</td><td>0.75</td><td>0.90</td></tr>
<tr><td colspan="8" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Class Size</td><td>30.5</td><td>6.2</td><td>22</td><td>26</td><td>31</td><td>35</td><td>38</td></tr>
<tr><td style="text-align:left">Enrollment</td><td>79.6</td><td>37.3</td><td>34</td><td>52</td><td>74</td><td>104</td><td>129</td></tr>
<tr><td style="text-align:left">Percent disadvantaged</td><td>13.8</td><td>13.4</td><td>2</td><td>4</td><td>9</td><td>19</td><td>35</td></tr>
<tr><td style="text-align:left">Reading size</td><td>24.5</td><td>5.4</td><td>17</td><td>21</td><td>25</td><td>29</td><td>31</td></tr>
<tr><td style="text-align:left">Math size</td><td>24.7</td><td>5.4</td><td>18</td><td>21</td><td>25</td><td>29</td><td>31</td></tr>
<tr><td style="text-align:left">Average verbal</td><td>86.3</td><td>6.1</td><td>78.4</td><td>83.0</td><td>87.2</td><td>90.7</td><td>93.1</td></tr>
<tr><td style="text-align:left">Average math</td><td>84.1</td><td>6.8</td><td>75.0</td><td>80.2</td><td>84.7</td><td>89.0</td><td>91.9</td></tr>
<tr><td colspan="8" style="border-bottom: 1px solid black"></td></tr>
<tr><td colspan="9" style="text-align:left">Note: All values are copied from Angrist and Lavy (<a href="#Angrist_1999" class="link">1999</a>), no replication.</td></tr>
</table>

In [21]:
# figureA1_EXT()

<figure>
<center>
    <img src="materials/figures/figureA1_EXT.png" width="600" />
    <figcaption>FIGURE EXT_A1 <br>
        Average of the Classes' Test Scores in Math and Reading Compared between 3rd, 4th and 5th Graders
    </figcaption>
<a id="figureA1_EXT"></a>

Table VIII reports estimates for the effect of class size for third graders.
The estimates provide essentially no evidence of an association between class size and achievement of any kind.
IV estimates are negative but small and statistically insignificant.
Yet, in column (1) we see that Maimonides' rule yields a strong first stage.

<img src="materials/tables/table8.png" width="600" />

A plausible explanation for the findings is that testing preparation and conditions were quite different for third graders.
For instance, on test days regular class teachers and an external exam proctor were present (Angrist and Lavy, [1999](#Angrist_1999), p. 563).
The systematic test preparation effort ("intense remedial effort") is also documented in a report of the National Center for Education Feedback from 1993 (Angrist and Lavy, [1999](#Angrist_1999), footnote 20).

<a id="Appendix_2"></a>
### Appendix 2

In [22]:
# figureA2_EXT()

<figure>
<center>
    <img src="materials/figures/figureA2_EXT.png" width="800" />
    <figcaption>FIGURE EXT_A2 <br>
        Average Math Test Score and Predicted Class Size by Enrollment for the Fourth Grade, <br>                                       Residuals from Regressions on Percent Disadvantaged and Enrollment
    </figcaption>
<a id="figureA2_EXT"></a>

<a id="References"></a>
***
## References
***

<a id="Angrist_1999"></a>
Angrist, J. and V. Lavy (1999). "Using Maimonides' rule to estimate the effect of class size on scholastic achievement". *The Quarterly Journal of Economics* 114 (2), pp. 533–575.

<a id="Angrist_2019"></a>
Angrist, J., V. Lavy, J. Leder-Luis and A. Shany (2019). "Maimonides' rule redux". *American Economic Review: Insights* 1 (3), pp. 309–324.

<a id="Angrist_2018"></a>
Angrist, J., V. Lavy, J. Leder-Luis and A. Shany (2018). "Maimonides' rule redux: Online appendix". [doi.org/10.1257/aeri.20180120](https://doi.org/10.1257/aeri.20180120)

<a id="Angrist_2009"></a>
Angrist, J. and J.-S. Pischke (2009). *Mostly Harmless Econometrics: An Empiricist's Companion*. New Jersey: Princeton University Press. 

<a id="Arai_2021"></a>
Arai, Y., Y.-C. Hsu, T. Kitagawa, I. Mourifié and Y. Wan (2021). "Testing identifying assumptions in fuzzy regression discontinuity designs". Working Paper (CWP16/21). [DOI: 10.47004/wp.cem.2021.1621](https://www.cemmap.ac.uk/wp-content/uploads/2021/03/CWP1621-Testing-identifying-assumptions-in-fuzzy-regression-discontinuity-designs.pdf) 

<a id="Cattaneo_2020"></a>
Cattaneo, M., M. Jansson and X. Ma (2020). "Simple local polynomial density estimators". *Journal of the American Statistical Association* 115 (531), pp. 1449–1455.

<a id="Cattaneo_2021"></a>
Cattaneo, M., M. Jansson and X. Ma (2021). "lpdensity: Local polynomial density estimation and inference". *Journal of Statistical Software*, forthcoming.

<a id="Chetty_2011"></a>
Chetty, R., J. Friedman, N. Hilger, E. Saez, D. Schanzenbach and D. Yagan (2011). "How does your kindergarten classroom affect your earnings? Evidence from project STAR". *The Quarterly Journal of Economics* 126 (4), pp. 1593–1660.

<a id="Cohen-Zada_2013"></a>
Cohen-Zada, D., M. Gradstein and E. Reuven (2013). "Allocation of students in public schools: Theory and new evidence". *Economics of Education Review* 34 (3), pp. 96–106.

<a id="Dobbelsteen_2002"></a>
Dobbelsteen, S., J. Levin and H. Oosterbeek (2002). "The causal effect of class size on scholastic achievement: Distinguishing the pure class size effect from the effect of changes in class composition". *Oxford Bulletin of Economics and Statistics* 64 (1), pp. 17–38.

<a id="Feir_2016"></a>
Feir, D., T. Lemieux and V. Marmer (2016). "Weak identification in fuzzy regression discontinuity designs". *Journal of Business & Economic Statistics* 34 (2), pp. 185–196.

<a id="Gary-Bobo_2013"></a>
Gary-Bobo, R. and M.-B. Mahjoub (2013). "Estimation of class-size effects, using Maimonides' rule and other instruments: The case of French junior high schools". *Annals of Economics and Statistics* 111/112, pp. 193–225.  

<a id="Gerard_2020"></a>
Gerard, F., M. Rokkanen and C. Rothe (2020). "Bounds on treatment effects in regression discontinuity designs with a manipulated running variable". *Quantitative Economics* 11 (3), pp. 839–870.

<a id="Hattie_2005"></a>
Hattie, J. (2005). "The paradox of reducing class size and improving learning outcomes". *International Journal of Educational Research* 43 (6), pp. 387–425.

<a id="Hoxby_2000"></a>
Hoxby, C. (2000). "The effects of class size on student achievement: New evidence from population variation". *The Quarterly Journal of Economics* 115 (4), pp. 1239–1285.

<a id="Jakubowski_2006"></a>
Jakubowski, M. and P. Sakowski (2006). "Quasi-experimental estimates of class size effect in primary schools in Poland". *International Journal of Educational Research* 45 (3), pp. 202–215. 

<a id="Jepsen_2009"></a>
Jepsen, C. and S. Rivkin (2009). "Class size reduction and student achievement: The potential tradeoff between teacher quality and class size". *The Journal of Human Resources* 44 (1), pp. 223–250.

<a id="Krueger_1999"></a>
Krueger, A. (1999). "Experimental estimates of education production functions". *The Quarterly Journal of Economics* 114 (2), pp. 497–532.

<a id="McCrary_2008"></a>
McCrary, J. (2008). "Manipulation of the running variable in the regression discontinuity design: A density test". *Journal of Econometrics* 142 (2), pp. 698–714.

<a id="Morgan_2014"></a>
Morgan, S. and C. Winship (2014). *Counterfactuals and Causal Inference: Methods and Principles for Social Research*. New York: Cambridge University Press.

<a id="Moulton_1986"></a>
Moulton, B. (1986). "Random group effects and the precision of regression estimates". *Journal of Econometrics* 32 (3), pp. 385–397.

<a id="Otsu_2013"></a>
Otsu, T., K.-L. Xu and Y. Matsushita (2013). "Estimation and inference of discontinuity in density". *Journal of Business & Economic Statistics* 31 (4), pp. 507–524.

<a id="Saifi_2011"></a>
Saifi, S. and T. Mehmood (2011). "Effects of socioeconomic status on students achievement". *International Journal of Social Sciences and Education* 1 (2), pp. 119–128.

<a id="Shen_2017"></a>
Shen, T. and S. Konstantopoulos (2017). "Class size effects on reading achievement in Europe: Evidence from PIRLS". *Studies in Educational Evaluation* 53, pp. 98–114.

<a id="Urquiola_2006"></a>
Urquiola, M. (2006). "Identifying class size effects in developing countries: Evidence from rural Bolivia". *Review of Economics and Statistics* 88 (1), pp. 171–177.

<a id="Wößmann_2005"></a>
Wößmann, L. (2005). "Educational production in Europe". *Economic Policy* 20 (43), pp. 446–504.

***
Notebook by Sven Jacobs | <a href="mailto:s.jacobs@uni-bonn.de">s.jacobs@uni-bonn.de</a> | <i class="fa fa-github"></i> [svjaco](https://github.com/svjaco)
***