This is my attempt to recreate the model from [chredlin_model_truth.ipynb](chredlin_model_truth.ipynb).

In [1]:
import numpy as np
import pandas as pd

np.set_printoptions(suppress=True)

# Ordinary Least Squares

Suppose our model is that $Y \sim \mathcal{N}\left(X^\intercal \beta, \sigma^2\right)$.

## Parameter Estimates

We want to find $\beta$ such that $L\left(\beta\right) = \left\lVert X\beta - y \right\rVert_2^2$ is minimized. That is, we want to find the projection of $y$ onto to the hyperplane spanned by the columns of $X$. Thus, we must have that $X^\intercal\left(y - X\hat{\beta}\right) = 0$ since the residuals will orthogonal to the columns of $X$ if $X\hat{\beta}$ is the projection that minimizes the squared error. Solving for $\hat{\beta}$, we have that

\begin{equation}
\hat{\beta} = \left(X^\intercal X\right)^{-1} X^\intercal y.
\end{equation}

## Residual Standard Error

Let us derive an unbiased estimator for residual standard error. Consider the residual random vector.

\begin{equation}
R = y - X\hat{\beta}
\end{equation}

As stated earlier, the residuals are orthogonal to hyperplane spanned by the columns of $X$, so they must lie in some orthonormal hyperplane of $N - p$ vectors, where $p = \dim(\beta)$. Thus, the residuals are $y$ projected down to this space.

Let $w_1,\ldots,w_{n-p}$ be an orthonormal basis of this space. Let $W$ be matrix with these basis vectors as the columns.

We have that

\begin{align}
R &= y - X\hat{\beta} \\
&= W\left(W^\intercal y\right) \\
&= W\left(W^\intercal\left(X\beta + \sigma\epsilon\right)\right) \\
&= W\left(W^\intercal X\right)\beta + \sigma W\left(W^\intercal\epsilon\right) \\
&= \sigma W\left(W^\intercal\epsilon\right).
\end{align}

Now, $W^\intercal\epsilon \sim \mathcal{N}\left(0, I_{n-p}\right)$. To see this, note that the $i$th entry is $\sum_{j=1}^n w_{ij}\epsilon_j \sim \mathcal{N}\left(0, 1\right)$, and for $i \neq i^\prime$,

\begin{align}
\operatorname{Cov}\left(\left(W^\intercal\epsilon\right)_i, \left(W^\intercal\epsilon\right)_{i^\prime}\right) &=
\mathbb{E}\left[
\left(\sum_{j=1}w_{ij}\epsilon_j\right)\left(\sum_{k=1}w_{i^\prime k}\epsilon_k\right)
\right] \\
&= \sum_{j=1}^n\mathbb{E}\left[w_{ij}w_{i^\prime j} \epsilon_j^2\right] +
2\sum_{j=1}^{n-1}\sum_{k=j+1}^n \mathbb{E}\left[w_{ij}w_{i^\prime k} \epsilon_j\epsilon_k\right] \\
&= w_i^\intercal w_{i^\prime} + 2\sum_{j=1}^{n-1}\sum_{k=j+1}^n w_{ij}w_{i^\prime k} \mathbb{E}\left[\epsilon_j\epsilon_k\right] \\
&= 0,
\end{align}
where the first term disappears by since the two vectors are orthonormal, and the second term disappears because of independence of the errors.

Thus, we have that

\begin{equation}
R^\intercal R = \sigma^2 \left(W^\intercal\epsilon\right)^\intercal W^\intercal W \left(W^\intercal\epsilon\right) =\sigma^2 \left(W^\intercal\epsilon\right)^\intercal\left(W^\intercal\epsilon\right)
\sim \sigma^2 \chi^2_{n-p}.
\end{equation}

Finally, we have that

\begin{equation}
\mathbb{E}\left[R^\intercal R\right] = \sigma^2\left(n - p\right)
\Rightarrow
\mathbb{E}\left[\frac{\sum_{i=1}^n \left(y - X\hat{\beta}\right)^2}{n-p}\right] = \sigma^2.
\end{equation}

Our consistent estimator is

\begin{equation}
\boxed{\hat{\sigma}^2 = \frac{\sum_{i=1}^n \left(y - X\hat{\beta}\right)^2}{n-p}.}
\end{equation}

## Hypothesis Testing

We can rewrite $y$ as $y = X\beta + \sigma \epsilon$, where each element of $\epsilon$ is drawn from $\mathcal{N}\left(0, 1\right)$. Substituting, we have that

\begin{align}
\hat{\beta} &= \left(X^\intercal X\right)^{-1}X^\intercal\left(X\beta + \sigma\epsilon\right) \\
&= \beta + \sigma\left(X^\intercal X\right)^{-1}X^\intercal \epsilon.
\end{align}

Thus, $\hat{\beta}_j \sim \mathcal{N}\left(\beta_j, \sigma^2\left(X^\intercal X\right)^{-1}_{jj}\right)$.

This gives us that

\begin{equation}
\frac{\hat{\beta}_j - \beta_j}{\sqrt{\sigma^2\left(X^\intercal X\right)^{-1}_{jj}}} \sim
\mathcal{N}\left(0, 1\right).
\end{equation}

From the previous part,

\begin{equation}
(n - p)\frac{\hat{\sigma}^2}{\sigma^2} \sim \chi^2_{n-p}.
\end{equation}

$\hat{\beta}$ and $\hat{\sigma}^2$ are independent by Basu's theorem: $\hat{\sigma}^2$ is an ancillary statistic that does not depend on the model parameters, $\beta$. Thus, we have that

\begin{equation}
\left.
\frac{\hat{\beta}_j - \beta_j}{\sqrt{\sigma^2\left(X^\intercal X\right)^{-1}_{jj}}}
\middle/
\sqrt{\frac{(n - p)\frac{\hat{\sigma}^2}{\sigma^2}}{n-p}}
\right. 
= \frac{\hat{\beta}_j - \beta_j}{\sqrt{\hat{\sigma}^2\left(X^\intercal X\right)^{-1}_{jj}}}
\sim t_{n-p}.
\end{equation}

That is, we have $t$ distribution with $n - p$ degrees of freedom.

## Estimating the Model Parameters

The code for `LinearRegression` can be found in [linear_regression.py](https://github.com/ppham27/stat570/tree/master/stat570/linear_model/linear_regression.py).

In [2]:
from chredlin import load_data
from stat570.linear_model.linear_regression import LinearRegression

chredlin = load_data()

COVARIATES = ['race', 'fire', 'theft', 'age', 'log_income']
RESPONSE = 'involact'

linear_model = LinearRegression.from_data_frame(chredlin, COVARIATES, RESPONSE)

with open('p1_residual_standard_error.txt', 'w') as f:
    f.write(str(np.sqrt(linear_model.residual_variance_)))

with open('p1_model_parameters.tex', 'w') as f:
    f.write(linear_model.coefficients_.to_latex())

print('Residual standard error is {}.'.format(np.sqrt(linear_model.residual_variance_)))
linear_model.coefficients_

Residual standard error is 0.3345267301243203.


Unnamed: 0,estimate,std_error,t-statistic,p-value
(intercept),-1.18554,1.100255,-1.077514,0.28755
race,0.009502,0.00249,3.816831,0.000449
fire,0.039856,0.008766,4.546588,4.8e-05
theft,-0.010295,0.002818,-3.653264,0.000728
age,0.008336,0.002744,3.037749,0.004134
log_income,0.345762,0.400123,0.864137,0.39254


## Response Confidence Interval

Let us $0$-index the columns of $X$ so that the $0$th column is all $1$s. Let us $0$-index $\beta$ so the $0$th entry is the intercept. Since $\hat{\beta}$ satisfies $\left(X^\intercal X\right)\hat{\beta} = X^\intercal y$, we have that the intercept estimate is

\begin{equation}
\hat{\beta}_0 = \bar{y} - \sum_{j=1}^p \hat{\beta}_j \bar{X}_{:,j}.
\end{equation}

Consider trying to predict $\hat{y} = x^\intercal\hat{\beta}$ for some $x$. We have that

\begin{equation}
\hat{y} = \bar{y} + \sum_{j=1}^p \left(x_i - \bar{X}_{:,j}\right)\hat{\beta}_j,
\end{equation}

so the variance of the prediction increases with values far from data.

Let $\bar{X}$ be the vector of column-wise means of $X$. Since $\bar{\epsilon}$ is an ancillary statistic, this can also be written as

\begin{equation}
\hat{y} \mid x \sim \mathcal{N}\left(
x^\intercal \beta, \sigma^2 \left(\frac{1}{n} + \left(x - \bar{X}\right)^\intercal\left(X^\intercal X\right)^{-1}\left(x - \bar{X}\right)\right)
\right).
\end{equation}

Following the same method as before, if we replace $\beta$ with $\hat{\beta}$ and $\sigma^2$ with $\hat{\sigma}^2$, we have

\begin{equation}
\frac{\hat{y} - x^\intercal\hat{\beta}}{
\sqrt{\hat{\sigma}^2\left(\frac{1}{n} + \left(x - \bar{X}\right)^\intercal\left(X^\intercal X\right)^{-1}\left(x - \bar{X}\right)\right)}
}
\sim t_{n-p}.
\end{equation}

## Modified Model

`log_income` doesn't add much value to the model. From [chredlin_explore.ipynb](chredlin_explore.ipynb), we see this is because income correlates so strongly with `race` and `fire`. Let's remove it.

In [3]:
linear_model_custom = LinearRegression.from_data_frame(
    chredlin,
    [covariate for covariate in COVARIATES if covariate != 'log_income'],
    RESPONSE)

with open('p1_residual_standard_error_custom.txt', 'w') as f:
    f.write(str(np.sqrt(linear_model_custom.residual_variance_)))

with open('p1_model_parameters_custom.tex', 'w') as f:
    f.write(linear_model_custom.coefficients_.to_latex())

print('Residual standard error is {}.'.format(np.sqrt(linear_model_custom.residual_variance_)))
linear_model_custom.coefficients_

Residual standard error is 0.3335165792871865.


Unnamed: 0,estimate,std_error,t-statistic,p-value
(intercept),-0.243118,0.145054,-1.676054,0.101158
race,0.008104,0.001886,4.296913,0.0001
fire,0.036646,0.007916,4.629173,3.5e-05
theft,-0.009592,0.00269,-3.565847,0.000921
age,0.00721,0.002408,2.994369,0.004595
