# Bayesian Machine Learning

Objective of the exercise is to find a posterior distribution via Bayes' law:

$p(\theta|D) = \frac{p(D|\theta)p(\theta)}{p(D)}$

As the data term in the denominator is a constant, we can simplify:

$p(\theta|D) \propto p(D|\theta)p(\theta)$

Where $p(\theta)$ is the **prior distribution** and $p(D|\theta)$ is the **likelihood function**.

For a Bayesian Neural Network regression, we can further specify the likelihood function:

$p(D|\theta) = \prod_{i=1}^{N} \mathcal{N}(y_i|f_W(X_i), \sigma^2)$

This denotes a product of independent normal distributions with means defined by the outputs of a Neural Network. The variance of the Normal distribution is chosen to be a constant.

The corresponding prior distribution could look as follows:

$p(\theta) = p(W, \sigma) = \prod_{k=1}{K}\mathcal{N}(W_k|0, 1)\dot\Gamma(1, 1)$

The priors for **K** networks weights are independent standard normal distributions. For the square root of the variance (a.k.a. standard deviation), we use a standard gamma distribution.


### Steps
1. Define the likelihood function
2. Define the prior distribution
3. Train the model

In [3]:
import Pkg;
Pkg.add("Flux")

[32m[1m    Updating[22m[39m registry at `C:\Users\crowthg\.julia\registries\General.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m Flux ─ v0.13.7
[32m[1m    Updating[22m[39m `C:\Users\crowthg\.julia\environments\v1.8\Project.toml`
 [90m [587475ba] [39m[92m+ Flux v0.13.7[39m
[32m[1m    Updating[22m[39m `C:\Users\crowthg\.julia\environments\v1.8\Manifest.toml`
 [90m [587475ba] [39m[93m↑ Flux v0.13.6 ⇒ v0.13.7[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39mFlux
[32m  ✓ [39m[90mOptimizationFlux[39m
[32m  ✓ [39mDiffEqFlux
  3 dependencies successfully precompiled in 111 seconds. 472 already precompiled.


In [11]:
using Flux
using Distributions


struct Likelihood
    network
    sigma
end

Flux.@functor Likelihood

(p::Likelihood)(x) = Normal.(p.network(x)[:], p_sigma[1])

In [12]:
likelihood = Likelihood(Chain(Dense(1, 5, tanh), Dense(5, 1)), ones(1, 1))

params, likelihood_reconstructor = Flux.destructure(likelihood)
n_weights = length(params) - 1

likelihood_conditional(weights, sigma) = likelihood_reconstructor(vcat(weights..., sigma))

likelihood_conditional (generic function with 1 method)

In [14]:
weight_prior = MvNormal(zeros(n_weights), ones(n_weights))
sigma_prior = Gamma(1., 1.)

Gamma{Float64}(α=1.0, θ=1.0)

In [17]:
Xline = Matrix(transpose(collect(-3:0.1:3)[:, :]))
likelihood_conditional(rand(weight_prior), rand(sigma_prior))(Xline)

LoadError: UndefVarError: p_sigma not defined