# (Imbens, 2014) Instrumental Variables: An Econometrician's Perspective

[Link to paper](https://docs.iza.org/dp8048.pdf)

# 1. Introduction

IV methods were initially developed in econometrics in the 1920's. Recent work in the statistics literature has focused on the following:
1. binary treatment
2. allows for treatment effect heterogeneity
3. explicitly uses the potential outcome framework
4. includes randomized experimenents with non-compliance, the intention-to-treat or reduced-form estimates are often of greater interest than they are in the traditional econometric simultaneous equations applications.



# 2. Choice versus Chance in Treatment Assignment

## 2.1 The Statistics Literature: The Focus on Chance

Goes back to Fisher (1925) and Neyman (1923), the randomized experiment (both motivated by agricultural applications where the unit of analysis are plots of land).

In modern notation (originating from Rubin (1974)), the unit (plot) level causal effect is a comparison between the two potential outcomes, $Y_i(A)$ and $Y_i(B)$ (e.g., the difference $\tau_i=Y_i(B)-Y_i(A)$).

In a completely randomized experiment with $N$ plots we select $M$ (with $M\in\{1,...,N-1\}$) plots at random to receive fertilizer $B$, with the remaining $N-M$ plots assigned to fertilizer $A$.

Thus, the treatment assignment ($W_i\in\{A,B\}$ for plot $i$) is by design independent of the potential outcomes, allowing for the drawing of exact causal inferences.

Fisher focused on calculating exact p-values for sharp null hypotheses, typically the null hyp. of no effect whatsoever, $Y_i(A)=Y_i(B)$ for all plots.

Neyman focused on developing unbiased estimators for the ATE

$$\sum_i(Y_i(A)-Y_i(B))/N$$

and the variance of these estimators.

The subsequent literature in statistics focus on extending and generalizing the Fisher and Neyman results that were derived explicitly for randomized experiments to the more general setting of observational studies, with additional information in the form of pretreatment variables or covariates not affected by the treatment.

Let $X_i$ denote these covariates. A key assumption is that conditional on these pretreatment variables the assignment to treatment is independent:

$$W_i\perp Y_i(A), Y_i(B) | X_i$$

This is known as *unconfoundedness given $X_i$*, also known as *no unmeasured confounders*.

This assumption, in combination with the auxilary assumption that for all values of the covariates the probability of being assigned to each level of the treatment is strictly positive (i.e., Positivity assumption), is called *strong ignorability* (Rosenbaum and Rubin, 1984).

In the econometrics literature closely related assumptions are related to as *selection-on-observables* (Barnow, Cain and Goldberger (1980)) or *exogeneity*.

Under weak ignorability (and thus also under strong ignorability), it is possible to estimate the ATE in large samples (i.e., the ATE is *identified*). Various methods have been proposed, including matching, subclassification, and regression.

Robins and coauthors (Robins, 1986; Gill and Robins, 2001; Richardson and Robins, 2013; Van der Laan and Robins, 2003) have extended this approach to settings with sequential treatments.

## 2.2 The Econometrics Literature: The Focus on Choice

The starting point in the econometrics literature for studying causal effects emphasizes the choices that led to the treatment received.

The starting point of economic science is to model these agents as behaving optimally. More specifically this implies that economists think of everyone of these agents as choosnig the level of the treatment to most efficiently pursue their objectives given the constraints they face.

## 2.3 Some Examples

What is important is that the starting point is different in the two disciplines, and this has led to the development of substantially different methods for causal inference. For example, in the Fisher-Neyman-Rubin tradition, we model:

$$
Y_i^{\text{obs}}=Y_i(W_i)=\begin{cases}
      Y_i(h) & \text{if }W_i=h \\
      Y_i(f) & \text{if }w_i=f
\end{cases}$$

where we adjust for observed individual characteristics to make $W_i$ effectively random.

However, in Roy (1951), he assumes that each individual chooses their treatment optimally, that is,

$$
W_i=\begin{cases}
      f & \text{if }Y_i(f)\geq Y_i(h) \\
      h & \text{otherwise}
\end{cases}$$

This brings to the question of even selection on observables is even possible.


## 2.4 Instrumental Variables

Instrumental variables methods address the type of selection issues the Roy model raises.



# 3. The Classic Example: Supply and Demand

The classic example of instrumental variables methods in econometrics is called simultaneous equations.

Simultaneous equation models are both at the core of the econometrics canon and at the core of the confusion concerning instrumental variable methods in the statistics literature.

This section looks at the supply and demand models that motivated the original research into instrumental variables.

Here the *endogeneity*, that is, the violation of unconfoundedness, arises from an equilibrium condition.

## 3.1 Discussions in the Statistics Literature

## 3.2 The Market for Fish

An example of a market for whiting (a particular white fish often used in fish sticks).

Graddy collected data on quantities and prices of whiting sold by a particular trader at the Fulton fish market (NYC) on 111 days (1991-1992).

Each day during the period covered in this dataset, indexed by $t=1,...,111$, a number of pounds of whiting are sold by this particular trader, denoted by $Q_t^{\text{obs}}$. The price per pound per day can be represented with $P_t^{\text{obs}}$.

As you can expect, the higher the quantity, the lower the price. (e.g., day 1: 8,058 pounds were sold for an average of 65 cents per pound, and the next day 2,224 pounds were sold for an average of 100 cents).

Suppose we are interested in predicting the effect of a tax in this market. To be specific, suppose the government is considering imposing a $100\times r\%$ tax (e.g., a 10% tax) on all whiting sold, but before doing so it wishes to predict the average percent change in the quantity sold as a result of the tax.

We may formalize that by looking at the average effect on the logarithm of the quantity $\tau = E[\ln Q_t(r)-\ln Q_t(0)]$, where $Q_t(r)$ is the quantity traded on day $t$ if the tax rate were set at $r$.

> We look at the diff. of natural log because that approximately is the relative lift.

The problem is that in all 111 markets we only observed $Q_t^{\text{obs}}=Q_t(0)$, thus we can only direclty estimate $E[\ln Q_t(0)]$ from the data

A naive approach is to assume that a atx increase by 10% would simply raise prices by 10%. Through the unconfoundedness assumption that prices can be viewed as set independently of market conditions on a particular day.

Formally,

$$E[\ln Q_t(r)|P_t^{\text{obs}}=p]=E[\ln Q_t(0)|P_t^{\text{obs}}=(1+r)\times p]$$

This can then be estimated using a regression.

$$\ln Q_t^{\text{obs}}=\alpha^{\text{ls}}+\beta^{\text{ls}}\times\ln P_t^{\text{obs}}+\epsilon_t$$

This is problematic from an Economist's perspective because the unconfoundedness assumption, that prices are independent of the potential outcomes of quantity is independent. In reality, the prices were different *because* the market conditions were different.

## 3.3 The Supply of and Demand for Fish

So how do economists go about analyzing questions such as this one if not by regressing quantities on prices?

Traditionally the demand function is specified parametrically, for example linear in logarithms:

$$\ln Q_t^d(p)=\alpha^d+\beta^d\times\ln p+\epsilon_t^d$$

where $\beta^d$ is the price elasticity of demand. This equation is *not* a regression function. It is a *structural equation* and is a model for the potential outcomes.

We can normalize the unobserved component $\epsilon_t^d$ to have expectation zero:

$$E[\ln Q_t^d(p)]=\alpha^d+\beta^d\times\ln p$$

## 3.4 Market Equilibrium

## 3.5 The Statistical Demand Curve

## 3.6 The Effect of a Tax Increase

## 3.7 Identification with Instrumental Variables

Need IV's to identify the demand and supply functions.

Graddy (1995) assumes that weather conditions at sea on days prior to $t$, denoted $Z_t$, affect supply but not demand. (e.g., high waves and strong winds makes it harder to catch fish, but shouldn't impact whether or not buyers want fish).

Formally, the key assumptions are that

$$Q_t^d(p)\perp Z_t\text{ and }Q_t^s\not\perp Z_t$$

possibly conditional on covariates.

## 3.8 Recent Research on Simultaneous Equation Models










# 4. A Modern Example: Randomized Experiments with Noncompliance and Heterogenous Treatment Effects

The modern literature on instrumental variables methods evolved simultaneously iin the statistics and econometrics literature.

In the economic perspective, there were difficulties in establishing point identification (Heckman, 1990), leading to the bounds approach developed by (Manski, 1995).

At the same time, statisticians analyzed the complications arising from noncompliance in randomized experiments (Robins, 1989) and the merits of encouragement designs (Zelen, 1979; 1990).

By adopting a common framework and notation, these literatures have become closely connected and influenced each other substantially.

## 4.1 The McDonald and Tierney (1992) Data

The canonical example in this literature is that of a randomized experiment with non-compliance (The McDonald and Tierney from 1992).

The two carried out a randomized exp. to evaluate the effect of an influenza vaccination on flue-related hospital visits.

Instead of random assignment on receiving the vaccination, the researchers randomly assigned physicians to receive letters reminding them of the upcoming flue season and encouraging them to vaccinate their patients.

This is what Zelen (1979) refers to as an *encouragement design*.

Let
- $Z_i\in\{0,1\}$ be the indicator for the receipient of the letter
- $X_i\in\{0,1\}$ be the indicator for the receipt of the vaccination.

We can reason that there are 4 potential outcomes. $Y_i(z,x)$.

We also know there are two potential outcomes for the receipt of the letter, $X_i(z)$.

The treatment actually received is $X_i^{\text{obs}}=X_i(Z_i)$

and the potential outcome corresponding to the assignment and treatment received, $Y_i^{\text{obs}}=Y_i(Z_i, X_i(Z_i))$

Notice that there are 8 possible values of the 3 combinations of $Z_i, X_i, Y_i$.

## 4.2 Instrumental Variables Assumptions

There are 4 key assumptions underlying IV methods beyond SUTVA (with diff. versions for some of them).

**Assumption 1: Random Assignment**

The instrument is as good as randomly assigned:

$$Z_i\perp (Y_i(Z_i, X_i), X(Z_i))$$

This assumption is often satisfied by design in an encouragement design in the statistics literature, although with observational data, as common in econometrics literature, is more controversial. This can be relaxed by requiring it to hold only within subpopulations defined by covariates, assuming the assignment of the instrument is unconfounded:

unconfounded assignment given $X_i$

$$Z_i\perp (Y_i(Z_i, X_i), X(Z_i))|X_i$$

Either version of this assumption justifies the causal interpretation of *Intention-to-Treat* (ITT) effects.

In many cases, ITT effects are only of limited interest, and thus motivates the consideration of additional assumptions that do allow the researcher to make statements about the causal effects of the treatment of interest.

**in order to draw inferences beyond ITT effects, additional assumptions will be used.**

The second class of assumptions limits or rules out completely direct effects of the assignment on the outcome, other than through the effect of the assignment on the receipt of the treatment of interest.

This is the most critical and most controversial assumption underlying IV methods, sometimes viewed as the defining characteristic of instruments.

**Assumption 2: Exclusion-Restriction**

$$Y_i(0,x)=Y_i(1,x)\text{ for }x=0,1,\text{ for all }i$$

There are a lot of diff. weaker versions of this.

Imbens and Angrist (1994) combine the above two assumptions by postulating the existence of a pair of potential outcomes $Y_i(x)$ for $x=0,1$ and directly assuming that

$$Z_i\perp (Y_i(0), Y_i(1))$$

**Assumption 3: Monotonicity**

by Imbens and Angrist (1994), an assumption that is often used, requires that

$$X_i(1)\geq X_i(0)\text{ for all }i$$

Rules out the presence of units who always do the opposite of their assignment (units with $X_i(0)=1$ and $X_i(1)=0$), and is therefore also referred to as the *no-defiance* assumption (Balke and Pearl, 1995).

In the case of randomized experiments, often a plausible assumption.

Finally, we need

**Assumption 4: Relevancy**

The instrument needs to be correlated with the instrument

$$X_i\not\perp Z_i$$

In practice, we need the correlation to be substantial in order to draw precise inferences.

The modern literature first focused on the inability under these 4 assumptions to identify an ATE. Manski (1990), Balke and Pearl (1995), and Robins (1989) showed that there was some information to derive bounds, but not a point estimate.

Another strand of literature starting with Imbens and Angrist (1994), Angrist, Imbens and Rubin (1996) abandoned the effort to do inference for the overall average effect, and focused on subpopulations for which the average effect could be identified (compliers).

## 4.3 Point Identification versus Bounds

The primary estiamd is usually the ATE or the ATT:

$$\tau=E[Y_i(1)-Y_i(0)]$$
$$\tau_t=E[Y_i(1)-Y_i(0)|X_i=1]$$

With only the 4 assumptions:
- random assignment
- exclusion restrictioni
- monotonicity
- instrument relevance

Robins (1989), Manski (1990), and Balke and Pearl (1995) established that the ATE can not be consistently estiamted even in large samples, in other words, often *not point-identified*.

As an alternative to these assumptions, bounds (called *natural bounds*) were developed by Manski (1995-2008), Robins (1989) and Hernan and Robins (2006).

## 4.4 Compliance Types

Imbens and Angrist (1994), Angrist, Imbens and Rubin (1996) take a diff. approach.

They focus on a different average causal effect that *can* be identified.

Can also not identify the *proportion* of individuals of each compliance type without additional restrictions. The monotonicity assumption implies that there are no defiers. This means that with random assignment, we can identify the population shares of the remaining 3 compliance types.

The proportion of always-takers and never-takers are:

$$\tau_a=\Pr(T_i=a)=\Pr (X_i=1|Z_i=0)$$

$$\tau_n=\Pr(T_i=n)=\Pr (X_i=0|Z_i=0)$$

And thus the proportion of compilers is the remainder:

$$\tau_c=\Pr(T_i=c)=1-\tau_a-\tau_n$$

## 4.5 Local Average Treatment Effects

So far we have random assignment and monotonicity.

By adding exclusion-restriction, Imbens and Angrist (1994) and Angrist, Imbens and Rubin (1996) show that the LATE or Compiler ATE is *identified*:

$$\tau_\text{late}=E[Y_i(1)-Y_i(0)|T_i=\text{compiler}]=\frac{E[Y_i|Z_i=1]-E[Y_i|Z_i=0]}{E[X_i|Z_i=1]-E[X_i|Z_i=0]}$$

## 4.6 Do We Care About the Local Average Treatment Effect?

The LATE is an unusual estimand. It is an ATE of the treatment for a subpopulation that cannot be identified (i.e., no units whom we know to belong to this population).

Thus is controversial:

"I find it hard to make any sense of the LATE" (Deaton, 2010)

"Most authors in this category do not state whether their focus on a specific stratum is motivated by mathematical convenience, mathematical necessity or a genuine interest in the stratum under analysis" (Pearl, 2011)

However, Imbens states, "this limitation should be acknowledged, but one should not drop the analysis simply because the original estimand cannot be identified".

Some say to just focus on ITT.













# 5. The SubStantive Content of the Instrumental Variables Assumptions

Discuss the substantive content of the 3 key assumptions:
- random assignment
- exclusion restriction
- monotonicity assumption

## 5.1 Unconfoundedness of the Instrument

i.e., random assignment

In many applications, satisfied by design (because the instrument is physically randomized).

## 5.2 The Exclusion Restriction

Most critical and typically most controversial assumption underlying instrumental variables methods.

## 5.3 Monotonicity

i.e., no-defiers assumption. Least controversial.

"Often, but not always reasonable" (Robins, 1989).

In one-sided noncompliance, those assigned to the control are effectively embargoed from receiving treatment, thus monotonicity is automatically satisfied.





# 7. Extentions and Generalizations

## 7.1 Model-based Approaches to Estimation and Inference

Traditionally instrumental variable analyses relied on linear regression methods.

Additional explanatory variables are incorporated linearly in the regression function.

The recent work in the statistics literature has explore more flexible approaches to including covariates.

Often involves modeling the conditional distribution of the endogenous regressor given the instruments and the exogenous variables.

## 7.2 Principal Stratification

## 7.3 Randomization Inference with IV'S

## 7.5 Weak Instruments

The weak isntrument literature is concerned with the construction of confidence intervals, especially after a study by Angrist and Krueger (1991).

## 7.6 Many Instruments

## 7.7 Proxies for Instruments



