## Part d): Paper and pencil part 

### Expectation value y

In the context of ordinary least squares (OLS), the vector of observations $( \mathbf{y} )$ is modeled as:

$\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}$


For each element $( y_i )$ in the vector $( \mathbf{y} )$, we have:

$y_i = \sum_j x_{ij} \beta_j + \epsilon_i$

where $( x_{ij} )$ is the element in the $( i )-th$ row and $( j )-th$ column of the matrix $( \mathbf{X} )$, and $( \epsilon_i )$ is the corresponding error for the $( i )-th$ observation.

$\mathbb{E}(y_i) = \mathbb{E}\left( \sum_j x_{ij} \beta_j + \epsilon_i \right)$

Using the linearity of expectation:

$\mathbb{E}(y_i) = \sum_j x_{ij} \beta_j + \mathbb{E}(\epsilon_i)$

Since $( \epsilon_i )$ is normally distributed with mean 0, we have:

$\mathbb{E}(\epsilon_i) = 0$

$\mathbb{E}(y_i) = \sum_j x_{ij} \beta_j$


In matrix form, this can be written as:

$\mathbb{E}(y_i) = \mathbf{X}_{i,*} \boldsymbol{\beta}$

where $( \mathbf{X}_{i,*} )$ is the $( i )-th$ row of the design matrix $( \mathbf{X} )$

We have now shown that the expectation value of $( y_i )$ is:

$\mathbb{E}(y_i) = \sum_j x_{ij} \beta_j = \mathbf{X}_{i,*} \boldsymbol{\beta}$

This is what we where supposed to show.

### Variance y

We are supposed to find the variance: 

$\text{Var}(y_i) = \text{Var}\left(\sum_j x_{ij} \beta_j + \epsilon_i\right)$

Since the deterministic part $( \sum_j x_{ij} \beta_j )$ is constant with respect to the random error $( \epsilon_i )$, its variance is 0. Thus, we have:

$\text{Var}(y_i) = \text{Var}(\epsilon_i)$

From the assumptions, we know that $(\epsilon_i \sim N(0, \sigma^2))$, so:

$\text{Var}(\epsilon_i) = \sigma^2$

Thus, we can then conclude that:

$\text{Var}(y_i) = \sigma^2$

### Expectation value Beta

The OLS estimator for the regression coefficients $( \hat{\beta} )$ is given by the well-known formula:

$\hat{\beta} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}$


By substituting $( \mathbf{y} = \mathbf{X} \beta + \boldsymbol{\epsilon} )$ into the expression for $( \hat{\beta} )$ we get:

$\hat{\beta} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top (\mathbf{X} \beta + \boldsymbol{\epsilon})$


Using the distributive property of matrix multiplication:

$\hat{\beta} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{X} \beta + (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{\epsilon}$

The first term simplifies because $( (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{X} = \mathbf{I} )$:

$\hat{\beta} = \beta + (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{\epsilon}$


Now, we take the expectation of both sides:

$\mathbb{E}[\hat{\beta}] = \mathbb{E}[\beta + (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{\epsilon}]$

Since $( \beta )$ is a constant and the expectation operator is linear:

$\mathbb{E}[\hat{\beta}] = \beta + \mathbb{E}[(\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{\epsilon}]$

The term $( (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{\epsilon} )$ involves the error $( \boldsymbol{\epsilon} )$, and since $( \mathbb{E}[\boldsymbol{\epsilon}] = 0 )$, we have:

$\mathbb{E}[(\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{\epsilon}] = 0$


Which means that: 

$\mathbb{E}[\hat{\beta}] = \beta$


### Variance Beta 

We already have the expression for the OLS estimator:

$\hat{\beta} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}$

Using the model $( \mathbf{y} = \mathbf{X} \beta + \boldsymbol{\epsilon} )$, we substitute this into the expression for $( \hat{\beta} )$:

$\hat{\beta} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top (\mathbf{X} \beta + \boldsymbol{\epsilon})$

This simplifies to:

$\hat{\beta} = \beta + (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{\epsilon}$

We now want to compute the variance of $( \hat{\beta} )$. The variance operator acts only on the random error term $( \boldsymbol{\epsilon} )$, since $( \beta )$ is deterministic.

Thus, the variance of $( \hat{\beta} )$ is:

$\text{Var}(\hat{\beta}) = \text{Var}\left((\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{\epsilon}\right)$

Using the fact that for a random vector $( \mathbf{A} \mathbf{Z} ),$ where $( \mathbf{Z} )$ is a random vector and $( \mathbf{A} )$ is a matrix, the variance is:

$\text{Var}(\mathbf{A} \mathbf{Z}) = \mathbf{A} \, \text{Var}(\mathbf{Z}) \, \mathbf{A}^\top$

Here, $( \mathbf{A} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top )$ and $( \boldsymbol{\epsilon} \sim N(0, \sigma^2 \mathbf{I}))$. The variance of $( \boldsymbol{\epsilon} )$ is:

$\text{Var}(\boldsymbol{\epsilon}) = \sigma^2 \mathbf{I}$

Which means that the variance of $( \hat{\beta} )$ becomes:

$\text{Var}(\hat{\beta}) = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \, \sigma^2 \mathbf{I} \, \mathbf{X} \, (\mathbf{X}^\top \mathbf{X})^{-1}$

This simplifies to:

$\text{Var}(\hat{\beta}) = \sigma^2 (\mathbf{X}^\top \mathbf{X})^{-1}$