# Subspace Inference for Bayesian Neural Network Using Julia

### Introduction to uncertainty analysis in Deep Neural Network (DNN)

Deep learning has led to a revolution in artificial intelligence, that has artificial neural networks capable of tackling more and more complex and challenging problems. The learning in this s networks can be supervised or unsupervised. However, there is a chance of overfitting in deep learning models. The Bayesian neural networks use Bayesian techniques to extract these uncertainties.

### Bayes Theorem

<img src="http://www.sciweavers.org/tex2img.php?eq=P%28A%7CB%29%20%3D%20%5Cfrac%7BP%28A%29%2AP%28B%7CA%29%7D%7BP%28B%29%7D&bc=White&fc=Black&im=jpg&fs=12&ff=arev&edit=0" align="center" border="0" alt="P(A|B) = \frac{P(A)*P(B|A)}{P(B)}" width="190" height="46" />

Bayes theorem depends on the prior probability distributions and the likelihood value s to calculate the posterior probability. The posterior probability P(A|B) is proportional to prior and the likelihood probability distribution in the above equation. P(A|B) and P(B|A) is also called a conditional probability, and P(A) and P(B) is called marginal probability. P(B) calculation is a tedious process; therefore, approximate Bayesian inference is introduced. In approximate Bayesian inference, the posterior probability will be proportional to the product of prior probability distribution and the likelihood distribution. Similarly, the prior probability will be proportional to joint probability. The MCMC methods are used to sample from the posterior distribution using a joint probability distribution.

### Bayesian Neural Network(BNN)

Simply a BNN is a stochastic neural network trained using Bayesian inference methods.In BNN, the parameters are defined as a distribution instead of a single value as in the figure below:
<img src="https://github.com/efmanu/SubspaceInference.jl/blob/master/implementation/bnn.png?raw=true" />


The distribution of parameters represents the uncertainty in the neural network. This distribution corresponds to uncertainty is generated by using Bayes theorem with the help of MCMC methods. This MCM method generates samples of weight parameters from the posterior distribution of weights. The standard MCMC sampling methods are Metropolis-Hastings, Hamiltonian Monte Carlo and No-U-turn Sampler.

### Why Subspace Inference:

The conventional MCMC methods are suitable to generate uncertainties in NN. However, for the case DNN, the parameter space's size is large and time to calculate uncertainty distribution is also high. Subspace inference focus generates low dimensional subspace from the DNN parameters and performs Bayesian inference on that subspace to generate uncertainties.

### Subspace Inference Algorithm

The subspace inference uses a pretrained DNN with data D and model M
1. Generate low dimensional subspace
2. Execute Bayesian inference within this subspace
3. Transform posterior of lower dimensional subspace to original dimension

### Algorithm for subspace construction

The inputs are `W0`: pretrained weights, `lr`: learning rate, `T`: Number steps, `f`: frequency to update deviation matrix, `N`: Maximum number of columns indeviation matrix, `R`: Rank of PCA, `P`: the projection matrix of low dimensional subspace. The variable `Wswa` is the mean weight.

```julia
Intialize mean weights Wswa = W0
for i in 1:T
 Wi = Wi-1 - lr*gradient(cost(Wi, data)) #update using SGD
if modulus(i,f) == 0 do
 n = i/f
Wswa = (n*Wswa + wi)/(n+1)
if Number_of_columns(D) == N do
 remove_columns(D)
end
Appends_column(D,Wi-Wswa)
end
end
U,S,Vt = SVD(D)
return P = U*S, Wswa
```

Where SVD is the singular value decomposition, which returns thre outputs, U, S and Vt

### Algorithm for subspace inference

1. Initialize prior and proposal distribution of lower dimensional subspace,z
2. Generate Forwadrd NN model weights `W_cap = Wswa + P*z`
3. Generate likelihood projection model using proposal and forward NN
4. Sample posterior of subspace based on joint probability

### Implementation of Subspace Inference in Julia
The subspace inference package can be find [SubspaceInference.jl](https://github.com/efmanu/SubspaceInference.jl).

The ***subspace construction*** is implemented using [Flux.jl](https://github.com/FluxML/Flux.jl) to do the SGD update.

```julia
for i in 1:T
    for d in data
        gs = gradient(ps) do
            training_loss = cost(d…)
            return training_loss
        end 
        Flux.update!(opt, ps, gs)
        W = Array{Float64}(undef,0)
        [append!(W, reshape(ps.order.data[i],:,1)) for i in 1:ps.params.dict.count];
        if mod(i,c) == 0
            n = i/c
            W_swa = (n.*W_swa + W)./(n+1)
            if(length(A) >= M*all_len)
                A = A[1:(end - all_len)]
            end
            W_dev = W - W_swa
            append!(A, W_dev)
        end 
    end
    println("Traing loss: ", training_loss," Epoch: ", i)
 end
col_a = Int(floor(length(A)/all_len))
A = reshape(A, all_len, col_a)
U,s,V = TSVD.tsvd(A,M)
P = U*LinearAlgebra.Diagonal(s)
return W_swa, P, re
```

The ***subspace inference*** is implemented like below:

```julia
```