1. **Restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart)
2. **Run all cells** (in the menubar, select Cell$\rightarrow$Run All).
3. __Use the__ `Validate` __button in the Assignments tab before submitting__.

__Include comments, derivations, explanations, graphs, etc.__ 

You __work in groups__ (= 3 people). __Write the full name and S/U-number of all team members!__

---

# Assignment 3 (Statistical Machine Learning 2024)
# **Deadline: 22 November 2024**

## Instructions
* Fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE` __including comments, derivations, explanations, graphs, etc.__ 
Elements and/or intermediate steps required to derive the answer have to be in the report. If an exercise requires coding, explain briefly what the code does (in comments). All figures should have titles (descriptions), axis labels, and legends.
* __Please use LaTeX to write down equations/derivations/other math__! How to do that in Markdown cells can be found [here](https://www.fabriziomusacchio.com/blog/2021-08-10-How_to_use_LaTeX_in_Markdown/), a starting point for various symbols is [here](https://www.overleaf.com/learn/latex/Mathematical_expressions).
* Please do __not add new cells__ to the notebook, try to write the answers only in the provided cells. Before you turn the assignment in, make sure everything runs as expected.
* __Use the variable names given in the exercises__, do not assign your own variable names. 
* __Only one team member needs to upload the solutions__. This can be done under the Assignments tab, where you fetched the assignments, and where you can also validate your submissions. Please do not change the filenames of the individual Jupyter notebooks.

For any problems or questions regarding the assignments, ask during the tutorial or send an email to charlotte.cambiervannooten@ru.nl and janneke.verbeek@ru.nl .

## Introduction
Assignment 3 consists of:
1. The faulty lighthouse (30 points);
2. Gaussian processes (40 points);
3. __Bayesian polynomial regression (30 points)__.

## Libraries

Please __avoid installing new packages__, unless really necessary.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

import numpy as np
import matplotlib.pyplot as plt

# Set fixed random seed for reproducibility
np.random.seed(2022)

## Bayesian polynomial regression (30 points)
In this exercise, we will consider the _Bayesian_ treatment of polynomial regression. Recall polynomial regression 
\begin{equation}
t_{n} = \omega_{0} + \omega_{1}x_{n} + \omega_{2} x_{x}^{2}+\dots+\omega_{M}x_{n}^{M}+\epsilon_{n}, 
\end{equation}
where $\epsilon_{n}\sim N(0, \sigma^{2})$. In the vector form we have 
\begin{equation}
t_{n} = \boldsymbol{\omega}^{T}\boldsymbol{x}_{n}+\epsilon_{n}, 
\end{equation}
where $\boldsymbol{\omega}=[\omega_{0}, \dots, \omega_{M}]^{T}$ and $\boldsymbol{x}_{n}=[1,x_{n}, x_{n}^{2}, \dots, x_{n}^{M}]$. Further, let us stack all responses in one vector $\boldsymbol{t}=[t_{1}, \dots, t_{N}]^{T}$, all inputs in a single matrix $\boldsymbol{X}=[\boldsymbol{x}_{1}, \boldsymbol{x}_{2}, \dots, \boldsymbol{x}_{N}]^{T}$. Then we get for the whole data set
\begin{equation}
\boldsymbol{t}=\boldsymbol{X}\boldsymbol{\omega}+\boldsymbol{\epsilon},
\end{equation}
where $\boldsymbol{\epsilon}=[\epsilon_{1}, \dots, \epsilon_{N}]^{T}$. 
Assume that we know the true value of  $\sigma^{2}$. 

1. Derive the posterior distribution of $\boldsymbol{\omega}$, i.e., $p(\boldsymbol{\omega}|\boldsymbol{t}, \boldsymbol{X}, \sigma^{2})$.  
Hint: Use the prior $p(\boldsymbol{\omega}|\boldsymbol{\mu_{0}}, \boldsymbol{\Sigma_{0}})=\mathcal{N}(\boldsymbol{\mu_{0}}, \boldsymbol{\Sigma_{0}})$ and the fact that the posterior should be Gaussian. 

YOUR ANSWER HERE

2. Take the first order polynomial, the inputs are $\boldsymbol{x_{n}}=[1, x_{n}]^{T}$. Let $\boldsymbol{\mu_{0}}=[0,0,\dots,0]^{T}$. Then the posterior mean for the linear Gaussian model is
\begin{equation}
\boldsymbol{\mu_{\omega}}=(\boldsymbol{X}^{T}\boldsymbol{X}+ \sigma^2 \boldsymbol{\Sigma_{0}^{-1}})^{-1}\boldsymbol{X}^{T}\boldsymbol{t}.
\label{eq:MAP} \tag{1}
\end{equation}
Recall also the regularized least squares solution:
\begin{equation}
\hat{\boldsymbol{\omega}}=(\boldsymbol{X}^{T}\boldsymbol{X}+N\lambda\boldsymbol{I})^{-1}\boldsymbol{X}^{T}\boldsymbol{t}.
\label{eq:MLE} \tag{2}
\end{equation}
Find $\boldsymbol{\Sigma_{0}}$ that makes Equation \eqref{eq:MAP} and Equation \eqref{eq:MLE} identical. Reflect on the similarity between MAP solution and regularized least squares. Comment on what it implies for the effect of the prior.

YOUR ANSWER HERE

3. Consider the polynomial function
\begin{equation}
f(x) = 5x^{3} - x^{2} +x
\end{equation}
which we will use to generate some training data. First define the function $f$.

In [None]:
def f(x):
    """
    Define the polynomial function f(x).
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
"""
Test the polynomial function f(x).
"""
assert f(0) == 0
assert f(1) == 5
assert f(-1) == -7

Generate 40 random input points uniformly distributed on the interval $[-5, 5]$. Apply the function $f$ to the input data and then add Gaussian noise with mean zero and variance $\sigma^2 = 150$ to obtain the target values ($\boldsymbol{t}$). Plot the generated data on top of the true function $f(x)$ over the interval $[-5,5]$.

In [None]:
"""
Generate 40 (X, t) pairs as described above and plot them (for comparison) on top of the function f(x).

Variable names
----------
X : Nx1 array
    The array containing the random data points.
N : integer
    The number of data points.
t : Nx1 array
    The array containing the random data points with Gaussian noise.
"""
# YOUR CODE HERE
raise NotImplementedError()

4. Compute the marginal likelihood on the data generated above for the polynomial models from first to seventh order. Make a plot of the marginal likelihood for these models (polynomial order on the x-axis and marginal likelihood value on the y-axis). Use a Gaussian prior on $\boldsymbol{\omega}$ with **zero mean** ($\boldsymbol{\mu_0} = \boldsymbol{0}$) and a **diagonal covariance matrix** ($\boldsymbol{\Sigma_0} = \sigma_0^2 \boldsymbol{I}$), and set the prior covariance hyperparameter $\sigma_0^2$ (initially) to one. 

_Hint:_ 
The marginal likelihood (also known as the model evidence) for our Gaussian model is defined as $p(\boldsymbol{t}|\boldsymbol{X}, \boldsymbol{\mu_{0}}, \boldsymbol{\Sigma_{0}})$. The data matrix $\boldsymbol{X}$ is defined at the beginning of the exercise. Note that this matrix is _order dependent_ and is different from the array of data points $\{x_1, x_2, ..., x_{40}\}$. Using Gaussian distributions as before we can compute
\begin{equation}
p(\boldsymbol{t}|\boldsymbol{X}, \boldsymbol{\mu_{0}}, \boldsymbol{\Sigma_{0}})= \mathcal{N}(\boldsymbol{t}; \boldsymbol{X}\boldsymbol{\mu_{0}}, \sigma^{2}\boldsymbol{I}_{N}+\boldsymbol{X}\boldsymbol{\Sigma_{0}}\boldsymbol{X}^{T}).
\end{equation}

In [None]:
def log_marginal_likelihood(X, N, t, sigma_0=1):
    """
    Calculate the log marginal likelihood for the polynomial models from first to seventh order.

    Parameters
    ----------
    X : Nx1 array
        The array containing the random data points.
    N : numeric
        The number of data points.
    t : Nx1 array
        The array containing the random data points with Gaussian noise.
    sigma_0: numeric
        Prior covariance hyperparameter.

    Returns
    -------
    list
        The list contains seven log marginal likelihood values corresponding to the different polynomial orders 
        (from first to seventh order).
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
"""
Test for log_marginal_likelihood.
"""
assert type(log_marginal_likelihood(np.full((1, 1), 0), 1, np.full((1, 1), 0))) is list

Now plot the results with the help of `log_marginal_likelihood`. You can use `matplotlib.pyplot.bar` for making the bar plot.  

_Hint_: You might want to normalize by computing the difference between the log marginal likelihood values and their maximum, and by then taking the exponent of these differences.

In [None]:
"""
Plot the marginal likelihood values for different polynomial orders.
"""
# YOUR CODE HERE
raise NotImplementedError()

Based on the plot, which model would you choose according to the marginal likelihood?  

YOUR ANSWER HERE

5. How would the prior affect the choice of polynomial using marginal likelihood? Let $\boldsymbol{\Sigma_{0}}=\sigma_{0}^{2}\boldsymbol{I}$ as before and vary $\sigma_{0}^{2}$. What happens when you increase and decrease $\sigma_{0}^{2}$? Plot the marginal likelihood for each polynomial order from 1 to 7, for the following hyperparameter values: $\sigma_{0}^{2}=0.1$ , $\sigma_{0}^{2}=0.3$, $\sigma_{0}^{2}=0.4$, $\sigma_{0}^{2}=0.7$, $\sigma_{0}^{2}=1.3$, $\sigma_{0}^{2}=1.4$, $\sigma_{0}^{2}=1.7$.

In [None]:
"""
Plot the marginal likelihood values for different polynomial orders. We now consider different values for sigma_0.
"""
# YOUR CODE HERE
raise NotImplementedError()

Comment on the effect of changing $\sigma_{0}^{2}$. What does it imply, in general, for such a modeling choice (i.e. for the _Bayesian_ way of estimation for the polynomial regression)?

YOUR ANSWER HERE