# edrubin/EC421W19

Fetching contributors…
Cannot retrieve contributors at this time
1119 lines (871 sloc) 33 KB
 --- title: "Asymptotics and consistency" subtitle: "EC 421, Set 6" author: "Edward Rubin" date: "r format(Sys.time(), '%d %B %Y')" output: xaringan::moon_reader: css: ['default', 'metropolis', 'metropolis-fonts', 'my-css.css'] # self_contained: true nature: highlightStyle: github highlightLines: true countIncrementalSlides: false --- class: inverse, middle {R, setup, include = F} options(htmltools.dir.version = FALSE) library(pacman) p_load( broom, latex2exp, ggplot2, ggthemes, viridis, extrafont, gridExtra, kableExtra, dplyr, magrittr, knitr, parallel ) # Define pink color red_pink <- "#e64173" turquoise <- "#20B2AA" grey_light <- "grey70" grey_mid <- "grey50" grey_dark <- "grey20" # Dark slate grey: #314f4f # Notes directory dir_slides <- "~/Dropbox/UO/Teaching/EC421W19/LectureNotes/05Heteroskedasticity/" # Knitr options opts_chunkset( comment = "#>", fig.align = "center", fig.height = 7, fig.width = 10.5, warning = F, message = F ) opts_chunkset(dev = "svg") options(device = function(file, width, height) { svg(tempfile(), width = width, height = height) }) # A blank theme for ggplot theme_empty <- theme_bw() + theme( line = element_blank(), rect = element_blank(), strip.text = element_blank(), axis.text = element_blank(), plot.title = element_blank(), axis.title = element_blank(), plot.margin = structure(c(0, 0, -0.5, -1), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_simple <- theme_bw() + theme( line = element_blank(), panel.grid = element_blank(), rect = element_blank(), strip.text = element_blank(), axis.text.x = element_text(size = 18, family = "STIXGeneral"), axis.text.y = element_blank(), axis.ticks = element_blank(), plot.title = element_blank(), axis.title = element_blank(), # plot.margin = structure(c(0, 0, -1, -1), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes_math <- theme_void() + theme( text = element_text(family = "MathJax_Math"), axis.title = element_text(size = 22), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = "grey70", size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes_serif <- theme_void() + theme( text = element_text(family = "MathJax_Main"), axis.title = element_text(size = 22), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = "grey70", size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" ) theme_axes <- theme_void() + theme( text = element_text(family = "Fira Sans Book"), axis.title = element_text(size = 18), axis.title.x = element_text(hjust = .95, margin = margin(0.15, 0, 0, 0, unit = "lines")), axis.title.y = element_text(vjust = .95, margin = margin(0, 0.15, 0, 0, unit = "lines")), axis.line = element_line( color = grey_light, size = 0.25, arrow = arrow(angle = 30, length = unit(0.15, "inches") )), plot.margin = structure(c(1, 0, 1, 0), unit = "lines", valid.unit = 3L, class = "unit"), legend.position = "none" )  # Prologue --- name: schedule # Schedule ## Last Time Living with heteroskedasticity ## Today Asymptotics and consistency ## This week - Our second assignment - Survey (for credit) ## Near-ish future Midterm on Feb. 12.super[th] --- # .mono[R] showcase Need speed? .mono[R] allows essentially infinite parallelization. Three popular packages: - [future](https://github.com/HenrikBengtsson/future) - parallel - foreach And here's a nice [tutorial](https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html). --- layout: true # Consistency --- class: inverse, middle --- layout: true # Consistency ## Welcome to asymptopia --- *Previously:* We examined estimators (_e.g._, $\hat{\beta}_j$) and their properties using -- 1. The **mean** of the estimator's distribution: $\mathop{\boldsymbol{E}}\left[ \hat{\beta}_j \right] = ?$ -- 2. The **variance** of the estimator's distribution: $\mathop{\text{Var}} \left( \hat{\beta}_j \right) = ?$ -- which tell us about the .hi[tendency of the estimator] if we took .hi[∞ samples], each with .hi[sample size] $\color{#e64173}{n}$. -- This approach misses something. --- .hi[New question:]
How does our estimator behave as our sample gets larger (as $n\rightarrow\infty$)? -- This *new question* forms a new way to think about the properties of estimators: .hi[asymptotic properties] (or large-sample properties). -- A "good" estimator will become indistinguishable from the parameter it estimates when $n$ is very large (close to $\infty$). --- layout: true # Consistency ## Probability limits --- Just as the *expected value* helped us characterize **the finite-sample distribution of an estimator** with sample size $n$, -- the .pink[*probability limit*] helps us analyze .hi[the asymptotic distribution of an estimator] (the distribution of the estimator as $n$ gets "big".pink[†]). .footnote[ .pink[†]: Here, "big" $n$ means $n\rightarrow\infty$. That's *really* big data. ] --- Let $B_n$ be our estimator with sample size $n$. Then the .hi[probability limit] of $B$ is $\alpha$ if $$\lim_{n\rightarrow\infty} \mathop{P}\left( \middle| B_n - \alpha \middle| > \epsilon \right) = 0 \tag{1}$$ for any $\varepsilon > 0$. -- The definition in $(1)$ *essentially* says that as the .pink[sample size] approaches infinity, the probability that $B_n$ differs from $\alpha$ by more than a very small number $(\epsilon)$ is zero. -- *Practically:* $B$'s distribution collapses to a spike at $\alpha$ as $n$ approaches $\infty$. --- Equivalent statements: - The probability limit of $B_n$ is $\alpha$. - $\text{plim}\: B = \alpha$ - $B$ converges in probability to $\alpha$. --- Probability limits have some nice/important properties: - $\mathop{\text{plim}}\left( X \times Y \right) = \mathop{\text{plim}}\left( X \right) \times \mathop{\text{plim}}\left( Y \right)$ - $\mathop{\text{plim}}\left( X + Y \right) = \mathop{\text{plim}}\left( X \right) + \mathop{\text{plim}}\left( Y \right)$ - $\mathop{\text{plim}}\left( c \right) = c$, where $c$ is a constant - $\mathop{\text{plim}}\left( \dfrac{X}{Y} \right) = \dfrac{\mathop{\text{plim}}\left( X \right)}{ \mathop{\text{plim}}\left( Y \right)}$ - $\mathop{\text{plim}}\!\big( f(X) \big) = \mathop{f}\!\big(\mathop{\text{plim}}\left( X \right)\big)$ --- layout: true # Consistency ## Consistent estimators --- We say that .hi[an estimator is consistent] if 1. The estimator .hi[has a prob. limit] (its distribution collapses to a spike). 2. This spike is .hi[located at the parameter] the estimator predicts. -- *In other words...* An estimator is consistent if its asymptotic distribution collapses to a spike located at the estimated parameter. -- *In math:* The estimator $B$ is consistent for $\alpha$ if $\mathop{\text{plim}} B = \alpha$. -- The estimator is *inconsistent* if $\mathop{\text{plim}} B \neq \alpha$. --- *Example:* We want to estimate the population mean $\mu_x$ (where $X$∼Normal). Let's compare the asymptotic distributions of two competing estimators: 1. The first observation: $X_{1}$ 2. The sample mean: $\overline{X} = \dfrac{1}{n} \sum_{i=1}^n x_i$ 3. Some other estimator: $\widetilde{X} = \dfrac{1}{n+1} \sum_{i=1}^n x_i$ Note that (1) and (2) are unbiased, but (3) is biased. --- To see which are unbiased/biased: -- $\mathop{\boldsymbol{E}}\left[ X_1 \right] = \mu_x$ -- $\mathop{\boldsymbol{E}}\left[ \overline{X} \right]$-- $= \mathop{\boldsymbol{E}}\left[ \dfrac{1}{n} \sum_{i=1}^n x_i \right]$-- $= \dfrac{1}{n} \sum_{i=1}^n \mathop{\boldsymbol{E}}\left[ x_i \right]$-- $= \dfrac{1}{n} \sum_{i=1}^n \mu_x$-- $= \mu_x$ -- $\mathop{\boldsymbol{E}}\left[ \widetilde{X} \right]$-- $= \mathop{\boldsymbol{E}}\left[ \dfrac{1}{n+1} \sum_{i=1}^n x_i \right]$-- $= \dfrac{1}{n+1} \sum_{i=1}^n \mathop{\boldsymbol{E}}\left[ x_i \right]$-- $= \dfrac{n}{n+1}\mu_x$ --- layout: true # Consistency Distributions of $\color{#FFA500}{X_1}$, $\color{#e64173}{\overline{X}}$, and $\color{#314f4f}{\widetilde{X}}$ ---
$n = r (n <- 2)$ {R, ex2, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ex1 <- mu ex2 <- mu ex3 <- n/(n+1) * mu se1 <- sqrt(v) se2 <- sqrt(v/n) se3 <- sqrt(v/(n+1)) lb <- min(c(ex1, ex2, ex3)) - max(c(se1, se2, se3)) * 3.5 ub <- max(c(ex1, ex2, ex3)) + max(c(se1, se2, se3)) * 3.5 ggplot(data = tibble(x = c(lb, ub)), aes(x)) + stat_function( fun = dnorm, args = list(mean = ex1, sd = se1), n = 5e3, geom = "area", color = NA, fill = "orange", alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex2, sd = se2), n = 5e3, geom = "area", color = NA, fill = red_pink, alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex3, sd = se3), n = 5e3, geom = "area", color = NA, fill = "darkslategrey", alpha = 0.7 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  ---
$n = r (n <- 5)$ {R, ex5, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ex1 <- mu ex2 <- mu ex3 <- n/(n+1) * mu se1 <- sqrt(v) se2 <- sqrt(v/n) se3 <- sqrt(v/(n+1)) ggplot(data = tibble(x = c(lb, ub)), aes(x)) + stat_function( fun = dnorm, args = list(mean = ex1, sd = se1), n = 5e3, geom = "area", color = NA, fill = "orange", alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex2, sd = se2), n = 5e3, geom = "area", color = NA, fill = red_pink, alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex3, sd = se3), n = 5e3, geom = "area", color = NA, fill = "darkslategrey", alpha = 0.7 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  ---
$n = r (n <- 10)$ {R, ex10, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ex1 <- mu ex2 <- mu ex3 <- n/(n+1) * mu se1 <- sqrt(v) se2 <- sqrt(v/n) se3 <- sqrt(v/(n+1)) ggplot(data = tibble(x = seq(lb, ub, 0.001)), aes(x)) + stat_function( fun = dnorm, args = list(mean = ex1, sd = se1), n = 5e3, geom = "area", color = NA, fill = "orange", alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex2, sd = se2), n = 5e3, geom = "area", color = NA, fill = red_pink, alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex3, sd = se3), n = 5e3, geom = "area", color = NA, fill = "darkslategrey", alpha = 0.7 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  ---
$n = r (n <- 30)$ {R, ex30, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ex1 <- mu ex2 <- mu ex3 <- n/(n+1) * mu se1 <- sqrt(v) se2 <- sqrt(v/n) se3 <- sqrt(v/(n+1)) ggplot(data = tibble(x = c(lb, ub)), aes(x)) + stat_function( fun = dnorm, args = list(mean = ex1, sd = se1), n = 5e3, geom = "area", color = NA, fill = "orange", alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex2, sd = se2), n = 5e3, geom = "area", color = NA, fill = red_pink, alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex3, sd = se3), n = 5e3, geom = "area", color = NA, fill = "darkslategrey", alpha = 0.7 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  ---
$n = r (n <- 50)$ {R, ex50, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ex1 <- mu ex2 <- mu ex3 <- n/(n+1) * mu se1 <- sqrt(v) se2 <- sqrt(v/n) se3 <- sqrt(v/(n+1)) ggplot(data = tibble(x = c(lb, ub)), aes(x)) + stat_function( fun = dnorm, args = list(mean = ex1, sd = se1), n = 5e3, geom = "area", color = NA, fill = "orange", alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex2, sd = se2), n = 5e3, geom = "area", color = NA, fill = red_pink, alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex3, sd = se3), n = 5e3, geom = "area", color = NA, fill = "darkslategrey", alpha = 0.7 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  ---
$n = r (n <- 100)$ {R, ex100, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ex1 <- mu ex2 <- mu ex3 <- n/(n+1) * mu se1 <- sqrt(v) se2 <- sqrt(v/n) se3 <- sqrt(v/(n+1)) ggplot(data = tibble(x = c(lb, ub)), aes(x)) + stat_function( fun = dnorm, args = list(mean = ex1, sd = se1), n = 5e3, geom = "area", color = NA, fill = "orange", alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex2, sd = se2), n = 5e3, geom = "area", color = NA, fill = red_pink, alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex3, sd = se3), n = 5e3, geom = "area", color = NA, fill = "darkslategrey", alpha = 0.7 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  ---
$n = r (n <- 500)$ {R, ex500, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ex1 <- mu ex2 <- mu ex3 <- n/(n+1) * mu se1 <- sqrt(v) se2 <- sqrt(v/n) se3 <- sqrt(v/(n+1)) ggplot(data = tibble(x = c(lb, ub)), aes(x)) + stat_function( fun = dnorm, args = list(mean = ex1, sd = se1), n = 5e3, geom = "area", color = NA, fill = "orange", alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex2, sd = se2), n = 5e3, geom = "area", color = NA, fill = red_pink, alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex3, sd = se3), n = 5e3, geom = "area", color = NA, fill = "darkslategrey", alpha = 0.7 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  ---
$n = r (n <- 1000)$ {R, ex1000, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ex1 <- mu ex2 <- mu ex3 <- n/(n+1) * mu se1 <- sqrt(v) se2 <- sqrt(v/n) se3 <- sqrt(v/(n+1)) ggplot(data = tibble(x = c(lb, ub)), aes(x)) + stat_function( fun = dnorm, args = list(mean = ex1, sd = se1), n = 5e3, geom = "area", color = NA, fill = "orange", alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex2, sd = se2), n = 5e3, geom = "area", color = NA, fill = red_pink, alpha = 0.7 ) + stat_function( fun = dnorm, args = list(mean = ex3, sd = se3), n = 5e3, geom = "area", color = NA, fill = "darkslategrey", alpha = 0.7 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  --- layout: true # Consistency --- The distributions of $\color{#314f4f}{\widetilde{X}}$
For $n$ in $\{\color{#FCCE25}{2},\, \color{#F89441}{5},\, \color{#E16462}{10},\, \color{#BF3984}{50},\, \color{#900DA4}{100},\, \color{#5601A4}{500}, \color{#0D0887}{1000}\}$ {R, ex biased consistent, echo = F, fig.height = 5.75} mu <- 10; v <- 1 ggplot(data = tibble(x = c(6.67-2, 10)), aes(x)) + stat_function( fun = dnorm, args = list(mean = mu * 2/(2+1), sd = sqrt(v/(2+1))), n = 5e3, geom = "area", fill = plasma(7, end = 0.9)[7], color = NA, alpha = 0.7, size = 0.3 ) + stat_function( fun = dnorm, args = list(mean = mu * 5/(5+1), sd = sqrt(v/(5+1))), n = 5e3, geom = "area", fill = plasma(7, end = 0.9)[6], color = NA, alpha = 0.7, size = 0.3 ) + stat_function( fun = dnorm, args = list(mean = mu * 10/(10+1), sd = sqrt(v/(10+1))), n = 5e3, geom = "area", fill = plasma(7, end = 0.9)[5], color = NA, alpha = 0.7, size = 0.3 ) + stat_function( fun = dnorm, args = list(mean = mu * 50/(50+1), sd = sqrt(v/(50+1))), n = 5e3, geom = "area", fill = plasma(7, end = 0.9)[4], color = NA, alpha = 0.7, size = 0.3 ) + stat_function( fun = dnorm, args = list(mean = mu * 100/(100+1), sd = sqrt(v/(100+1))), n = 5e3, geom = "area", fill = plasma(7, end = 0.9)[3], color = NA, alpha = 0.7, size = 0.3 ) + stat_function( fun = dnorm, args = list(mean = mu * 500/(500+1), sd = sqrt(v/(500+1))), n = 5e3, geom = "area", fill = plasma(7, end = 0.9)[2], color = NA, alpha = 0.7, size = 0.3 ) + stat_function( fun = dnorm, args = list(mean = mu * 1000/(1000+1), sd = sqrt(v/(1000+1))), n = 5e3, geom = "area", fill = plasma(7, end = 0.9)[1], color = NA, alpha = 0.7, size = 0.3 ) + geom_vline(xintercept = mu, linetype = 5, size = 0.6, alpha = 0.5) + geom_hline(yintercept = 0) + theme_empty  --- ## The takeaway? -- - An estimator can be unbiased without being consistent -- (e.g., $\color{#FFA500}{X_1}$). -- - An estimator can be unbiased and consistent -- (e.g., $\color{#e64173}{\overline{X}}$). -- - An estimator can be biased but consistent -- (e.g., $\color{#314f4f}{\widetilde{X}}$). -- - An estimator can be biased and inconsistent -- (e.g., $\overline{X} - 50$). -- **Best-case scenario:** The estimator is unbiased and consistent. --- ## Why consistency (asymptotics)? 1. We cannot always find an unbiased estimator. In these situations, we generally (at least) want consistency. 2. Expected values can be hard/undefined. Probability limits are less constrained, _e.g._, $$\mathop{\boldsymbol{E}}\left[ g(X)h(Y) \right] \text{ vs. } \mathop{\text{plim}}\left( g(X)h(Y) \right)$$ 3. Asymptotics help us move away from assuming the distribution of $u_i$. --
.hi[Caution:] As we saw, consistent estimators can be biased in small samples. --- layout: true # OLS in asymptopia --- class: inverse, middle --- OLS has two very nice asymptotic properties: 1. Consistency 2. Asymptotic Normality -- Let's prove \#1 for OLS with simple, linear regression, _i.e._, $$y_i = \beta_0 + \beta_1 x_i + u_i$$ --- layout: true # OLS in asymptopia ## Proof of consistency --- First, recall our previous derivation of of $\hat{\beta}_1$, $$\hat{\beta}_1 = \beta_1 + \dfrac{\sum_i \left( x_i - \overline{x} \right) u_i}{\sum_i \left( x_i - \overline{x} \right)^2}$$ -- Now divide the numerator and denominator by $1/n$ -- $$\hat{\beta}_1 = \beta_1 + \dfrac{\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i}{\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2}$$ --- We actually want to know the probability limit of $\hat{\beta}_1$, so -- $$\mathop{\text{plim}} \hat{\beta}_1 = \mathop{\text{plim}}\left(\beta_1 + \dfrac{\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i}{\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2} \right)$$ -- which, by the properties of probability limits, gives us -- $$= \beta_1 + \dfrac{\mathop{\text{plim}}\left(\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i \right)}{\mathop{\text{plim}}\left(\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2 \right)}$$ -- The numerator and denominator are, in fact, population quantities -- $$= \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\, u \right)}{\mathop{\text{Var}} \left( x \right)}$$ --- So we have $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\, u \right)}{\mathop{\text{Var}} \left( x \right)}$$ -- By our assumption of exogeneity (plus the law of total expectation) -- $$\mathop{\text{Cov}} \left( x,\,u \right) = 0$$ -- Combining these two equations yields -- $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{0}{\mathop{\text{Var}} \left( x \right)} = \beta_1 \quad\text{🤓}$$ so long as $\mathop{\text{Var}} \left( x \right) \neq 0$ (which we've assumed). --- layout: true # OLS in asymptopia ## Asymptotic normality --- Up to this point, we made a very specific assumption about the distribution of $u_i$—the $u_i$ came from a normal distribution. We can relax this assumption—allowing the $u_i$ to come from any distribution (still assume exogeneity, independence, and homoskedasticity). We will focus on the .hi[asymptotic distribution] of our estimators (how they are distributed as $n$ gets large), rather than their finite-sample distribution. -- As $n$ approaches $\infty$, the distribution of the OLS estimator converges to a normal distribution. --- layout: false # OLS in asymptopia ## Recap With a more limited set of assumptions, OLS is .hi[consistent] and is .hi[asymptotically normally distributed]. ### Current assumptions 1. Our data were **randomly sampled** from the population. 1. $y_i$ is a **linear function** of its parameters and disturbance. 1. There is **no perfect collinearity** in our data. 1. The $u_i$ have conditional mean of zero (**exogeneity**), $\mathop{\boldsymbol{E}}\left[ u_i \middle| X_i \right] = 0$. 1. The $u_i$ are **homoskedastic** with **zero correlation** between $u_i$ and $u_j$. --- layout: false class: inverse, middle # Omitted-variable bias, redux --- layout: true # Omitted-variable bias, redux ## Inconsistency? Imagine we have a population whose true model is \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} --- -- *Recall.sub[1]:* .hi[Omitted-variable bias] occurs when we omit a variable in our linear regression model (_e.g._, leavining out $x_2$) such that -- 1. $x_{2}$ affects $y$, _i.e._, $\beta_2 \neq 0$. -- 2. Correlates with an included explanatory variable, _i.e._, $\mathop{\text{Cov}} \left( x_1,\, x_2 \right) \neq 0$. --- *Recall.sub[2]:* We defined the .hi[bias] of an estimator $W$ for parameter $\theta$ -- $$\mathop{\text{Bias}}\_\theta \left( W \right) = \mathop{\boldsymbol{E}}\left[ W \right] - \theta$$ --- We know that omitted-variable bias causes .pink[biased estimates]. *Question:* Do *omitted variables* also cause .pink[inconsistent estimates]? -- *Answer:* Find $\mathop{\text{plim}} \hat{\beta}_1$ in a regression that omits $x_2$. --- but we instead specify the model as \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + w_i \tag{3} \end{align} where $w_i = \beta_2 x_{2i} + u_i$. -- We estimate $(3)$ via OLS \begin{align} y_i = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \hat{w}_i \tag{4} \end{align} -- *Our question:* Is $\hat{\beta}_1$ consistent for $\beta_1$ when we omit $x_2$? -- $$\mathop{\text{plim}}\left( \hat{\beta}_1 \right) \overset{?}{=} \beta_1$$ --- layout: true # Omitted-variable bias, redux ## Inconsistency? --- .pull-left[ .hi[Truth:] $y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i$ ] .pull-right[ .hi-purple[Specified:] $y_i = \beta_0 + \beta_1 x_{1i} + w_i$ ] We already showed $\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, w \right)}{\mathop{\text{Var}} \left( x_1 \right)}$ where $w$ is the disturbance. -- Here, we know $w = \beta_2 x_2 + u$. Thus, -- $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 + u \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ -- Now, we make use of $\mathop{\text{Cov}} \left( X,\, Y + Z \right) = \mathop{\text{Cov}} \left( X,\, Y \right) + \mathop{\text{Cov}} \left( X,\, Z \right)$ -- $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ --- $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ Now we use the fact that $\mathop{\text{Cov}} \left( X,\, cY \right) = c\mathop{\text{Cov}} \left( X,\,Y \right)$ for a constant $c$. -- $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\beta_2 \mathop{\text{Cov}} \left( x_1,\, x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ -- As before, our exogeneity (conditional mean zero) assumption implies $\mathop{\text{Cov}} \left( x_1,\, u \right) = 0$, which gives us -- $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\beta_2 \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ --- Thus, we find that $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ In other words, an .pink[omitted variable will cause OLS to be inconsistent if **both** of the following statements are true]: 1. The omitted variable .hi[affects our outcome], _i.e._, $\beta_2 \neq 0$. 2. The omitted variable correlates with included explanatory variables, _i.e._, $\mathop{\text{Cov}} \left( x_1,\,x_2 \right) \neq 0$. If both of these statements are true, then the OLS estimate $\hat{\beta}_1$ will not converge to $\beta_1$, even as $n$ approaches $\infty$. --- layout: true # Omitted-variable bias, redux ## Signing the bias --- Sometimes we're stuck with omitted variable bias..pink[†] $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ .footnote[.pink[†] You will often hear the term "omitted-variable bias" when we're actually talking about inconsistency (rather than bias).] When this happens, we can often at least know the direction of the inconsistency. --- Begin with $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ We know $\color{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}$. Suppose $\color{#e64173}{\beta_2 > 0}$ and $\color{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) > 0}$. Then -- \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \color{#e64173}{(+)} \dfrac{\color{#FFA500}{(+)}}{\color{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 > \beta_1 \end{align} ∴ In this case, OLS is **biased upward** (estimates are too large). -- $$\begin{matrix} \enspace & \color{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \color{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \color{#e64173}{\beta_2 > 0} & \text{Upward} & \\ \color{#e64173}{\beta_2 < 0} & & \end{matrix}$$ --- Begin with $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ We know $\color{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}$. Suppose $\color{#e64173}{\beta_2 < 0}$ and $\color{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) > 0}$. Then -- \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \color{#e64173}{(-)} \dfrac{\color{#FFA500}{(+)}}{\color{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 < \beta_1 \end{align} ∴ In this case, OLS is **biased downward** (estimates are too small). $$\begin{matrix} \enspace & \color{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \color{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \color{#e64173}{\beta_2 > 0} & \text{Upward} & \\ \color{#e64173}{\beta_2 < 0} & \text{Downward} & \end{matrix}$$ --- Begin with $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ We know $\color{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}$. Suppose $\color{#e64173}{\beta_2 > 0}$ and $\color{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) < 0}$. Then \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \color{#e64173}{(+)} \dfrac{\color{#FFA500}{(-)}}{\color{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 < \beta_1 \end{align} ∴ In this case, OLS is **biased downward** (estimates are too small). $$\begin{matrix} \enspace & \color{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \color{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \color{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \color{#e64173}{\beta_2 < 0} & \text{Downward} & \end{matrix}$$ --- Begin with $$\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)}$$ We know $\color{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}$. Suppose $\color{#e64173}{\beta_2 < 0}$ and $\color{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) < 0}$. Then \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \color{#e64173}{(-)} \dfrac{\color{#FFA500}{(-)}}{\color{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 > \beta_1 \end{align} ∴ In this case, OLS is **biased upward** (estimates are too large). $$\begin{matrix} \enspace & \color{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \color{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \color{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \color{#e64173}{\beta_2 < 0} & \text{Downward} & \text{Upward} \end{matrix}$$ --- Thus, in cases where we have a sense of 1. the sign of $\mathop{\text{Cov}} \left( x_1,\,x_2 \right)$ 2. the sign of $\beta_2$ we know in which direction inconsistency pushes our estimates. .center[ **Direction of bias** ] $$\begin{matrix} \enspace & \color{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \color{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \color{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \color{#e64173}{\beta_2 < 0} & \text{Downward} & \text{Upward} \end{matrix}$$ --- layout: true # Measurement error --- name: measurement_error .hi[Measurement error] in our explanatory variables presents another case in which OLS is inconsistent. Consider the population model: $y_i = \beta_0 + \beta_1 z_i + u_i$ - We want to observe $z_i$ but cannot. - Instead, we *measure* the variable $x_i$, which is $z_i$ plus some error (noise): $$x_i = z_i + \omega_i$$ - Assume $\mathop{\boldsymbol{E}}\left[ \omega_i \right] = 0$, $\mathop{\text{Var}} \left( \omega_i \right) = \sigma^2_\omega$, and $\omega$ is independent of $z$ and $u$. --
OLS regression of $y$ and $x$ will produce inconsistent estimates for $\beta_1$. --- layout: true # Measurement error ## Proof --- $y_i = \beta_0 + \beta_1 z_i + u_i$ --
$\quad= \beta_0 + \beta_1 \left( x_i - \omega_i \right) + u_i$ --
$\quad= \beta_0 + \beta_1 x_i + \left( u_i - \beta_1 \omega_i \right)$ --
$\quad= \beta_0 + \beta_1 x_i + \varepsilon_i$ where $\varepsilon_i = u_i - \beta_1 \omega_i$ -- What happens when we estimate $y_i = \hat{\beta}_0 + \hat{\beta}_1 x_i + e_i$? $\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\,\varepsilon \right)}{\mathop{\text{Var}} \left( x \right)}$ We will derive the numerator and denominator separately... --- The covariance of our noisy variable $x$ and the disturbance $\varepsilon$. $\mathop{\text{Cov}} \left( x,\, \varepsilon \right)$ -- $= \mathop{\text{Cov}} \left( \left[ z + \omega \right],\, \left[ u - \beta_1 \omega \right] \right)$ --
$\quad\quad\quad\quad\enspace= \mathop{\text{Cov}} \left( z,\,u \right) -\beta_1 \mathop{\text{Cov}} \left( z,\,\omega \right) + \mathop{\text{Cov}} \left( \omega,\, u \right) - \beta_1 \mathop{\text{Var}} \left( \omega \right)$ --
$\quad\quad\quad\quad\enspace= 0 + 0 + 0 - \beta_1 \sigma_\omega^2$ --
$\quad\quad\quad\quad\enspace= - \beta_1 \sigma_\omega^2$ --- Now for the denominator, $\mathop{\text{Var}} \left( x \right)$. $\mathop{\text{Var}} \left( x \right)$ -- $= \mathop{\text{Var}} \left( z + \omega \right)$ --
$\quad\quad\quad= \mathop{\text{Var}} \left( z \right) + \mathop{\text{Var}} \left( \omega \right) + 2\mathop{\text{Cov}} \left( z,\,\omega \right)$ --
$\quad\quad\quad= \sigma_z^2 + \sigma_\omega^2$ --- Putting the numerator and denominator back together, \begin{align} \mathop{\text{plim}} \hat{\beta}_1 &= \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\,\varepsilon \right)}{\mathop{\text{Var}} \left( x \right)} \\ &= \beta_1 + \dfrac{-\beta_1 \sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 - \beta_1 \dfrac{\sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 \dfrac{\sigma_z^2 + \sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} - \beta_1 \dfrac{\sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 \dfrac{\sigma_z^2}{\sigma_z^2 + \sigma_\omega^2} \end{align} --- layout: true # Measurement error ## Summary --- ∴ $\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 \dfrac{\sigma_z^2}{\sigma_z^2 + \sigma_\omega^2}$. What does this equation tell us? -- .hi[Measurement error in our explanatory variables] biases the coefficient estimates toward zero. - This type of bias/inconsistency is often called .hi[attenuation bias]. - If .hi[the measurement error correlates with the explanatory variables], we have bigger problems with inconsistency/bias. --- What about **measurement in the outcome variable**? It doesn't really matter—it just increases our standard errors. --- layout: false # Measurement error ## It's everywhere **General cases** 1. We cannot perfectly observe a variable. 1. We use one variable as a *proxy* for another. **Specific examples** - GDP - Population - Crime/police statistics - Air quality - Health data - Proxy *ability* with test scores --- exclude: true {R, generate pdfs, include = F} system("decktape remark 06_consistency.html 06_consistency.pdf --chrome-arg=--allow-file-access-from-files")