In [1]:
import numpy as np
import pandas as pd

# Normal Linear Regression Model

## Likelihood Function

$$ y_i = x^\intercal_i \beta + \epsilon_i, \qquad i=1,...,n$$

Assumptions:

$\epsilon_i \sim^{iid} N(0,\sigma^2)$

$x_i$ is either fixed (not random) or it is independent of $\epsilon_i$

Assumptions imply: $f(y,x|\beta,\sigma^2) = f(y|x,\beta,\sigma^2)f(x|\Lambda)$

Our interest would be on $f(y,x|\beta,\sigma^2)$, since the above, we can disregard the marginal distribution of x and work with the conditional likelihood of each observation:

$$f(y|x,\beta,\sigma^2) \sim N((y_i-x_i\beta),\sigma^2)$$

Since the disturbances are independent, the likelihood of the sample is given by:

$$f(y|x,\beta,\sigma^2) = \prod^n_{i=1} f(y_i|x_i,\beta,\sigma^2)$$

Using the pdf or a normal distribution:

$$f(y|x,\beta,\sigma^2)  = \frac{1}{(2\pi)^{n/2}\sigma^n)} exp(-\frac{1}{2\sigma^2}(y-X\beta)^\intercal(y-X\beta)) $$

Writing the likelihood in terms of the OLS using $y-\beta = y-X(\beta-\hat{\beta})-X\hat{\beta}$ yields:

$$f(y|x,\beta,\sigma^2)  = \frac{1}{(2\pi)^{n/2}\sigma^{n-v})} exp(-\frac{1}{2\sigma^2}(\beta-\hat{\beta})^\intercal X^\intercal X (\beta-\hat{\beta})) \frac{1}{\sigma^v}exp(-\frac{s^2v}{2 \sigma^2})$$

So that our likelihood is a normal function.

Where

$$\hat{\beta} = (X^\intercal X)^{-1} X^\intercal y$$

$$ s^2 = \frac{(y-X \hat{\beta})^\intercal (y-X \hat{\beta}) }{v} $$

## Choosing a prior

We know from conjugate priors that if we choose a Normal-Inverse Gamma as a prior it'll yield another NIG as posterior. This comes from a rationale developed on the class slides. We will just skip to the final prior distribution. So that we postulate:

$$(\beta,\sigma^2) \sim NIG(\underline{\beta},\underline{V},1 / \underline{\sigma^2}, \underline{v})$$

All the underlined constants are called **hyperparameters** and are parameters of the prior distribution. Not to be confused with hyperparameters from the posterior distribution.

If we do not find the distribution we can use the slide steps to draw the data from IG and Normal distributions.

Then a discussion about the parametrization of the Gamma distribution follows.

## Resultant Posterior

$$ (\beta, \sigma^2 | y,X) \sim NIG(\overline{\beta},\overline{V},1/\overline{\sigma^2},\overline{v}) $$

Where:

$$ \overline{V} = (\underline{V}^{-1} + X^\intercal X)^{-1} $$ 

$$\overline{\beta} = \overline{V} (\underline{V}^{-1} \underline{\beta} + X^\intercal X \hat{\beta}) $$

$$\overline{v} = \underline{v} + n $$

$$ \overline{\sigma}^2 = \frac{1}{\overline{v}} ( \underline{v \sigma^2} + (n - k) s^2 + ( \hat{\beta} - \underline{\beta})^\intercal  (\underline{V} + (X^\intercal X)^{-1})^{-1}  (\hat{\beta} - \beta)) $$

# Simulation

## Sample

First we will generate a sample from the true "population" that comes from a $N(\beta,1)$

For the multivariate case $\beta$ is $(k x 1)$

In [2]:
beta = [5,10] #betas
n = 100 #sample size

In [3]:
k = len(beta)

beta = np.array(beta)

Now we have to generate $X$. Which has to be generated independently of the error term. Thus we will draw random values from a standard normal distribution. $X$ is a matrix $(n,k)$.

In [4]:
data = np.random.standard_normal((n,k))
data = pd.DataFrame(data)

#rename the columns for conveniece
cols = pd.DataFrame(data=list(range(0,data.shape[1])))
data.columns = ("x" + cols.astype(str))[0].tolist()


Now we have to generate the $y$'s, from the $x$'s and a random disturbance who also follows a std normal.

$$y = x\beta + \epsilon$$

In [5]:
data['y'] = data.dot(beta) + np.random.standard_normal((n))
data.head()

Unnamed: 0,x0,x1,y
0,-0.461389,-0.672083,-7.938436
1,1.298645,2.010507,27.091955
2,0.873146,-0.396031,-0.816526
3,2.643229,-0.941906,3.069911
4,1.107706,-0.81808,-3.592053


## Prior

For our NIG prior we have to choose a $\beta$ value, a $V0$ value, a $\sigma^2$ and a $v$ value. Also $m$ is the confidence we have in the prior (inversely related to its variance).

In [6]:
betaPrior = [5,10]
m = 1

prior = {'b0': np.array(betaPrior),
         'V0': 0.05/m * np.identity(k),
         'sigma2_0':1,
         'v':m }

Remember that: 

$$\hat{\beta} = (X^\intercal X)^{-1} X^\intercal y$$

$$ s^2 = \frac{(y-X \hat{\beta})^\intercal (y-X \hat{\beta}) }{v} $$

In [26]:
x = data.ix[:, data.columns != 'y']
y = data['y']

beta_hat = np.linalg.inv((x.T.dot(x))).dot(x.T).dot(y)

s2 = (y-x.dot(beta_hat)).T.dot(y-x.dot(beta_hat)) / (n-k)


# beta_hat
# (n-k)

1.2103624263883663