This is my attempt to recreate the model from [chredlin_model_truth.ipynb](chredlin_model_truth.ipynb).

In [1]:
import numpy as np
import pandas as pd
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects import r

np.set_printoptions(suppress=True)

importr('faraway')
r.data('chredlin');
chredlin = pandas2ri.ri2py(r.chredlin)
chredlin = chredlin.set_index(pandas2ri.ri2py(r.chredlin.rownames))
chredlin['log_income'] = np.log(chredlin['income'])
chredlin

Unnamed: 0,race,fire,theft,age,involact,income,side,log_income
60626,10.0,6.2,29.0,60.4,0.0,11.744,n,2.463342
60640,22.2,9.5,44.0,76.5,0.1,9.323,n,2.232484
60613,19.6,10.5,36.0,73.5,1.2,9.948,n,2.297372
60657,17.3,7.7,37.0,66.9,0.5,10.656,n,2.366123
60614,24.5,8.6,53.0,81.4,0.7,9.73,n,2.275214
60610,54.0,34.1,68.0,52.6,0.3,8.231,n,2.107908
60611,4.9,11.0,75.0,42.6,0.0,21.48,n,3.067122
60625,7.1,6.9,18.0,78.5,0.0,11.104,n,2.407305
60618,5.3,7.3,31.0,90.1,0.4,10.694,n,2.369683
60647,21.5,15.1,25.0,89.8,1.1,9.631,n,2.264987


Suppose our model is that $Y \sim \mathcal{N}\left(X^\intercal \beta, \sigma^2\right)$.

In [2]:
covariates = ['race', 'fire', 'theft', 'age', 'log_income']
X = np.hstack((np.ones((len(chredlin),1)), chredlin[covariates].as_matrix()))
y = chredlin['involact'].as_matrix()

We want to find $\beta$ such that $L\left(\beta\right) = \left\lVert X\beta - y \right\rVert_2^2$ is minimized. That is, we want to find the projection of $y$ onto to the hyperplane spanned by the columns of $X$. Thus, we must have that $X^\intercal\left(y - X\hat{\beta}\right) = 0$ since the residuals will orthogonal to the columns of $X$ if $X\beta$ is the projection that minimizes the squared error. Solving for $\hat{\beta}$, we have that

\begin{equation}
\hat{\beta} = \left(X^\intercal X\right)^{-1} X^\intercal y.
\end{equation}


In [3]:
from scipy import linalg

gram_matrix = np.matmul(X.T, X)
beta = linalg.solve(gram_matrix, np.matmul(X.T, y))
beta

array([-1.18553957,  0.00950222,  0.03985604, -0.01029451,  0.0083356 ,
        0.34576152])

Let us derive an unbiased estimator for residual standard error. Consider the residual random vector.

\begin{equation}
R = y - X\hat{\beta}
\end{equation}

As stated earlier, the residuals are orthogonal to hyperplane spanned by the columns of $X$, so they must lie in some orthonormal hyperplane of $N - p$ vectors, where $p = \dim(\beta)$. Thus, residuals are $y$ projected down to this space.

Let $w_1,\ldots,w_{n-p}$ be an orthonormal basis of this space. Let $W$ be matrix with these basis vectors as the columns.

We have that

\begin{align}
R &= y - X\hat{\beta} \\
&= W\left(W^\intercal y\right) \\
&= W\left(W^\intercal\left(X\beta + \sigma\epsilon\right)\right) \\
&= W\left(W^\intercal X\right)\beta + \sigma W\left(W^\intercal\epsilon\right) \\
&= \sigma W\left(W^\intercal\epsilon\right).
\end{align}

We have that $W^\intercal\epsilon \sim \mathcal{N}\left(0, I_{n-p}\right)$. To see this, note that the $i$th entry is $\sum_{j=1}^n w_{ij}\epsilon_j \sim \mathcal{N}\left(0, 1\right)$ and for $i \neq i^\prime$,

\begin{align}
\operatorname{Cov}\left(\left(W^\intercal\epsilon\right)_i, \left(W^\intercal\epsilon\right)_{i^\prime}\right) &=
\mathbb{E}\left[
\left(\sum_{j=1}w_{ij}\epsilon_j\right)\left(\sum_{k=1}w_{i^\prime k}\epsilon_k\right)
\right] \\
&= \sum_{j=1}^n\mathbb{E}\left[w_{ij}w_{i^\prime j} \epsilon_j^2\right] +
2\sum_{j=1}^{n-1}\sum_{k=j+1}^n \mathbb{E}\left[w_{ij}w_{i^\prime k} \epsilon_j\epsilon_k\right] \\
&= w_i^\intercal w_{i^\prime} + 2\sum_{j=1}^{n-1}\sum_{k=j+1}^n w_{ij}w_{i^\prime k} \mathbb{E}\left[\epsilon_j\epsilon_k\right] \\
&= 0,
\end{align}
where the first term disappears by since the two vectors are orthonormal, and the second term disappears because of independence of the errors.

Thus, we have that

\begin{equation}
R^\intercal R = \sigma^2 \left(W^\intercal\epsilon\right)^\intercal W^\intercal W \left(W^\intercal\epsilon\right) =\sigma^2 \left(W^\intercal\epsilon\right)^\intercal\left(W^\intercal\epsilon\right)
\sim \sigma^2 \chi^2_{n-p}.
\end{equation}

Finally, we have that

\begin{equation}
\mathbb{E}\left[R^\intercal R\right] = \sigma^2\left(n - p\right)
\Rightarrow
\mathbb{E}\left[\frac{\sum_{i=1}^n \left(y - X\hat{\beta}\right)^2}{n-p}\right] = \sigma^2.
\end{equation}

Our consistent estimator is

\begin{equation}
\boxed{\hat{\sigma}^2 = \frac{\sum_{i=1}^n \left(y - X\hat{\beta}\right)^2}{n-p}.}
\end{equation}

In [4]:
residuals = y - np.matmul(X, beta)
residual_variance = np.sum(np.square(residuals))/(len(y) - len(beta))
np.sqrt(residual_variance)

0.3345267301243203

We can rewrite $y$ as $y = X\beta + \sigma \epsilon$, where each element of $\epsilon$ is drawn from $\mathcal{N}\left(0, 1\right)$. Substituting, we have that

\begin{align}
\hat{\beta} &= \left(X^\intercal X\right)^{-1}X^\intercal\left(X\beta + \sigma\epsilon\right) \\
&= \beta + \sigma\left(X^\intercal X\right)^{-1}X^\intercal \epsilon.
\end{align}

Thus, $\hat{\beta}_j \sim \mathcal{N}\left(\beta_j, \sigma^2\left(X^\intercal X\right)^{-1}_{jj}\right)$.

This gives us that

\begin{equation}
\frac{\hat{\beta}_j - \beta_j}{\sqrt{\sigma^2\left(X^\intercal X\right)^{-1}_{jj}}} \sim
\mathcal{N}\left(0, 1\right).
\end{equation}

From the previous part,

\begin{equation}
(n - p)\frac{\hat{\sigma}^2}{\sigma^2} \sim \chi^2_{n-p}.
\end{equation}

$\hat{\beta}$ and $\hat{\sigma}^2$ are independent by Basu's theorem: $\hat{\sigma}^2$ is an ancillary statistic that does not depend on the model parameters, $\beta$. Thus, we have that

\begin{equation}
\left.
\frac{\hat{\beta}_j - \beta_j}{\sqrt{\sigma^2\left(X^\intercal X\right)^{-1}_{jj}}}
\middle/
\sqrt{\frac{(n - p)\frac{\hat{\sigma}^2}{\sigma^2}}{n-p}}
\right. 
= \frac{\hat{\beta}_j - \beta_j}{\sqrt{\hat{\sigma}^2\left(X^\intercal X\right)^{-1}_{jj}}}
\sim t_{n-p}.
\end{equation}

That is, we have $t$ distribution with $n - p$ degrees of freedom.

In [5]:
from collections import OrderedDict
from scipy import stats

beta_std_error = np.sqrt(np.diag(linalg.cho_solve(
    linalg.cho_factor(gram_matrix), np.eye(len(beta))))*residual_variance)
beta_t_statistic = beta/beta_std_error
beta_p_value = 2*(stats.t.sf(np.abs(beta_t_statistic), df=len(y) - len(beta)))

model_parameters = pd.DataFrame(OrderedDict([
    ('estimate', beta),
    ('std_error', beta_std_error),
    ('t-statistic', beta_t_statistic),
    ('p-value', beta_p_value),
]), index=['(intercept)'] + covariates)
with open('p1_model_parameters.tex', 'w') as f:
    f.write(model_parameters.to_latex())
model_parameters

Unnamed: 0,estimate,std_error,t-statistic,p-value
(intercept),-1.18554,1.100255,-1.077514,0.28755
race,0.009502,0.00249,3.816831,0.000449
fire,0.039856,0.008766,4.546588,4.8e-05
theft,-0.010295,0.002818,-3.653264,0.000728
age,0.008336,0.002744,3.037749,0.004134
log_income,0.345762,0.400123,0.864137,0.39254
