# Bayesian linear regression

Demonstration of an agent performing Bayesian linear regression.

==========================================================================

* **Notebook dependencies**:
    * ...

* **Content**: Jupyter notebook accompanying Chapter 3 of the textbook "Fundamentals of Active Inference"

* **Author**: Sanjeev Namjoshi (sanjeev.namjoshi@gmail.com)

* **Version**: 0.1

We now turn to a Bayesian linear regression agent. In Bayesian linear regression, we are interested in solving the following inference problem:

$$
p_{\Sigma_{\theta \mid y}, \mu_{\theta \mid y}}(\boldsymbol{\theta} \mid \boldsymbol{y}) = \frac{p_{\sigma^2_y, X}(\boldsymbol{y} \mid \boldsymbol{\theta}) p_{\mu_{\theta}, \sigma^2_{\theta}}(\boldsymbol{\theta})}{p(\boldsymbol{y})}
$$

This tells us that we are interesting in the posterior distribution over parameters. However, we are not just interested in deterministic values for $\beta_0$ and $\beta_1$. Rather we want to collect them into a vector $\boldsymbol{\theta} = \begin{bmatrix} \beta_0 & \beta_1 \end{bmatrix}^\top$ which we can represent with a multivariate normal distribution. This distribution is the posterior: Given what we have observed of the data, what is the distribution of the parameter vector? The variables of interest are represented as follows:

| Variable | Status     | Data type     |
|----------|------------|---------------|
| $x$      | observed   | deterministic |
| $y$      | observed   | probabilistic |
| $\theta$ | unobserved | **probabilistic** |
| $\phi$   | known      | deterministic |

Note that the likelihood takes in the data matrix $\boldsymbol{X}$ multiplied by $\boldsymbol{\theta}$. The output of this is a vector which means the likelihood is represented by a multivariate normal distribution. Likewise, the prior and posterior are also multivariate normal distributions.

The first step is to create the environment, The environment is designed to return the data matrix to use for convenience. Each vector $\boldsymbol{x}$ has a $1$ inserted in front and is then stacked row-by-row into a matrix. We also include a function `get_prior_probs()` which returns the probabilities for a prior of our choice.

What about the posterior? Well as it turns out, we can easily obtain the posterior mean and covariance matrix with two equations. This obviates the need to utilize Bayesian inference as we have done in previous chapters. Like Experiment 3A and 3B, we can obtain a closed form solution for the posterior. The equations are

$$
\begin{align}
    \boldsymbol{\Sigma}_{\theta \mid y} &=  (\sigma^{-2} \boldsymbol{X}^\top \boldsymbol{X} + \boldsymbol{\Sigma}^{-1}_{\theta} )^{-1} \\
    \boldsymbol{\mu}_{\theta \mid y} &= \boldsymbol{\Sigma}_{\theta \mid y}(\sigma^{-2} \boldsymbol{X}^\top \boldsymbol{y} + \boldsymbol{\Sigma}^{-1}_{\theta} \boldsymbol{\mu}_{\theta})
\end{align}
$$

These equations are directly implemented in `get_posterior_probs()` below.