# Bayesian Neural Networks

by Emil Vassev
<br>
March 14th, 2023

In this notebook, you can follow an example of creating a Bayesian Neural Network, by using the <b>bayesian_torch</b> Python library.

## Neural Networks

A series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, either organic or artificial in nature.

Structure - multiple layers - tend to resemble the connections of neurons and synapses found in human brain:
* input layer – accepts input signals 
* hidden layer(s) – hosts the algorithms
* output layer – delivers the result

Machine Learning with Neural Nets:
* set of algorithms designed to recognize patterns
* computer learns to perform tasks by analyzing training examples
* examples are labeled (added information – labels, used by ML)

<div>
 <img src="attachment:image-2.png" width="250"/>
</div

## Artificial Neurons

The fundamental processing element of a neural network:
* simulates the natural neuron
* inputs $𝑋= {𝑥_1,𝑥_2, …,𝑥_𝑛}$
* weight factors $𝑊= {𝑤_1,𝑤_2, …,𝑤_𝑛}$
* links connecting neurons have a weighting factor
* Summation and Transfer functions

Steps:
* every input is multiplied by its weighting factor 
* modified inputs accepted by Summation function:
* simplest form: sums up the input products

Activation function - enables Summation function to operate in a time-sensitive way:
* the output of Summation function is sent to Transfer function:
* turns the summation number into a real output via some algorithm
* commonly supported: Sigmoid, Sine, Hyperbolic Tangent, Threshold, RelU

<div>
 <img src="attachment:image-2.png" width="350"/>
</div

## Training Neural Networks

Function fitting: 
*the process of training a neural network on a set of inputs (training dataset) to produce associated target outputs
* requires an optimization algorithm:
 * searches through a space of possible values for the neural network to model weights
 * finds a set of weights that results in good performance on the training dataset

Optimization process:
* a search through a landscape for a candidate solution that is sufficiently satisfactory
* a point on the landscape is a specific set of weights for the model:
 * the elevation of that point is an evaluation of the set of weights
 * valleys represent good models with small values of loss
* common conceptualization of optimization problems - landscape is referred to as an error surface
* optimization algorithms:
 * iteratively walk through the landscape
 * update the weights and seek out good or low elevation areas
 
<div>
 <img src="attachment:image-3.png" width="200"/>
</div

## Bayesian Neural Networks

Able to quantify uncertainty in predictive output:
* train the model weights as a distribution rather than searching for an optimal value
* generalize better with less overfitting

Probabilistic neural networks:
* provide outputs in the form of probability distributions
* standard Bayesian neural network outputs a single point estimate
* if the network is run multiple times with the same inputs, this single point estimate will vary

Bayesian Deep Learning - Bayesian inference + neural networks: 
* Posterior Bayesian inference: 
 * estimates posterior probability of a hypothesis considering new evidence
 * starts with a prior probability distribution (the belief before any evidence)
 * uses the evidence to update this distribution
 
```
		           Likelyhood * Prior
	Posterior = ------------------------
		                Evidence
```

Bayesian Neural Networks  (BNN):
* neural networks that use Posterior Bayesian Inference to come up with probability distribution over the network weights, given the training data

## Bayesian Learning of Network Weights

Posterior inference over the neural network’s weights: 
 * BNN runs posterior inference to find a posterior distribution over weights

```
		           p(D|w) * p(w)
	p(w|D) = ------------------------
		               p(D)
```

* $𝑤={𝑤_1,𝑤_2,…,𝑤_𝑛}$ – weights of the neural network
* $𝐷$ – data, i.e., the result produced by the neural network 

Bayesian formalism of learning network weights:
* changing our belief about the weights from the prior $𝑝(𝑤)$, to the posterior $𝑝(𝑤│𝐷)$ as a consequence of considering the evidence $𝑝(𝐷)$

<div>
 <img src="attachment:image-4.png" width="200"/>
</div

## Implementing Bayesian Neural Networks

### Bayesian-Torch
In this exercise, we will **Bayesian-Torch** to build a Bayesian Neural Network (BNN). **Bayesian-Torch** is a library of neural network layers and utilities extending the core of **PyTorch** to enable Bayesian inference in deep learning models to quantify principled uncertainty estimates in model predictions.

### TorchVision
We will use one of the pre-trained networks provided by the **TorchVision** package – pre trained models for Image Classification. The **Torchvision** package is part of **PyTorch** and consists of popular datasets, model architectures, and common image transformations for computer vision. 

Pre-trained models are Neural Network models trained on large benchmark datasets like **ImageNet**. 

### Objective: Converting a ResNet18 DNN to a BNN with Bayesian-Torch

In this exercise we are going to:
1. Buld a provided by Torchvision **ResNet18** deep neural network (DNN) that has been trained on the ImageNet dataset. Note that the ImageNet dataset has over 14 million images maintained by Stanford University. It is extensively used for a large variety of Image related deep learning project s. The images belong to various classes or labels. The aim of the pre-trained model of ResNet18 is to take an image as an input and predict it’s class. 
2. Use **Bayesian-Torch** to convert the DNN to a BNN. 


### Pre-Trained Model Inference

The steps involved in using a pre-trained model for predicting the class (label) of an input are:

1. Reading the input image
2. Performing transformations on the image. For example – resize, center crop, normalization, etc.
3. Forward Pass: Use the pre-trained weights to find out the output vector. Each element in this output vector describes the confidence with which the model predicts the input image belongs to a particular class.
4. Based on the scores obtained (elements of the output vector mentioned in step 3), display the predictions.

## Python Implementation

### Step 1. Import Models from TorchVision
Import **models** from **torchvision** and see the different models and architectures available to us.

In [1]:
from torchvision import models
import torch
 
dir(models)

['AlexNet',
 'DenseNet',
 'EfficientNet',
 'GoogLeNet',
 'GoogLeNetOutputs',
 'Inception3',
 'InceptionOutputs',
 'MNASNet',
 'MobileNetV2',
 'MobileNetV3',
 'RegNet',
 'ResNet',
 'ShuffleNetV2',
 'SqueezeNet',
 'VGG',
 '_GoogLeNetOutputs',
 '_InceptionOutputs',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_utils',
 'alexnet',
 'densenet',
 'densenet121',
 'densenet161',
 'densenet169',
 'densenet201',
 'detection',
 'efficientnet',
 'efficientnet_b0',
 'efficientnet_b1',
 'efficientnet_b2',
 'efficientnet_b3',
 'efficientnet_b4',
 'efficientnet_b5',
 'efficientnet_b6',
 'efficientnet_b7',
 'feature_extraction',
 'googlenet',
 'inception',
 'inception_v3',
 'mnasnet',
 'mnasnet0_5',
 'mnasnet0_75',
 'mnasnet1_0',
 'mnasnet1_3',
 'mobilenet',
 'mobilenet_v2',
 'mobilenet_v3_large',
 'mobilenet_v3_small',
 'mobilenetv2',
 'mobilenetv3',
 'quantization',
 'regnet',
 'regnet_x_16gf',
 'regnet_x_1_6gf',
 're

### Step 2. Load the Pre-Trained Model ResNet18

In [2]:
# load the model
resnet = models.resnet18(pretrained=True)
 
# put the network in eval mode
resnet.eval()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

### Step 3. Specify Image Transformations

Transform the input image so that it will have the right shape and other characteristics like mean and standard deviation. These values should be similar to those used while training the model. 

We pre-process the input image with **transforms** provide by the TochVision module. 

In [3]:
from torchvision import transforms
transform = transforms.Compose([            #[1] Define a variable transform
 transforms.Resize(256),                    #[2] Resize the image to 256×256 pixels.
 transforms.CenterCrop(224),                #[3] Crop the image to 224×224 pixels about the center.
 transforms.ToTensor(),                     #[4] Convert the image to PyTorch Tensor data type.
 transforms.Normalize(                      #[5] Normalize the image by setting its mean and standard deviation to the specified values.
   mean=[0.485, 0.456, 0.406],                
   std=[0.229, 0.224, 0.225]                  
 )
])

### Step 4. Load the Input Image and Pre-Process It

In [4]:
# import pillow
from PIL import Image
img = Image.open("images//dog.jpg")
img.show()

In [5]:
# pre-process the image and prepare a batch to be passed through the network
img_t = transform(img)
batch_t = torch.unsqueeze(img_t, 0)

### Step 5. Model Inference

Use the pre-trained model to predict the image class.

In [6]:
out_nn = resnet(batch_t)
print(out_nn.shape)

torch.Size([1, 1000])


In [7]:
classes = []
with open('data//imagenet_classes.txt') as f:
  classes = [line.strip() for line in f.readlines()]

Next sub-steps:
1. Find the index where the maximum score in output vector **out** occurs. 
2. Use this index to find out the prediction.

In [8]:
_, index = torch.max(out_nn, 1)
 
percentage = torch.nn.functional.softmax(out_nn, dim=1)[0] * 100
 
print(classes[index[0]], percentage[index[0]].item())

208, Labrador_retriever 70.66329956054688


The closest classification of the picture, per class. 

In [9]:
_, indices = torch.sort(out_nn, descending=True)
[(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]

[('208, Labrador_retriever', 70.66329956054688),
 ('207, golden_retriever', 4.956589221954346),
 ('209, Chesapeake_Bay_retriever', 4.195649147033691),
 ('176, Saluki', 4.141535758972168),
 ('243, bull_mastiff', 2.6598098278045654)]

### Step 6. Create the Bayesian Neural Network with Bayesian-Torch

**Bayesian-Torch** is a library of neural network layers and utilities extending the core of **PyTorch** to enable Bayesian inference in deep learning models to quantify principled uncertainty estimates in model predictions.

We use the **dnn_to_bnn()** function to convert the previously created Deep Neural Network model to a Bayesian Deep Neural Network (BNN).

The following code builds a Bayesian-ResNet18 from the torchvision deterministic ResNet18 model we created above.

In [10]:
import torch
import torchvision
from bayesian_torch.models.dnn_to_bnn import dnn_to_bnn, get_kl_loss

const_bnn_prior_parameters = {
        "prior_mu": 0.0,
        "prior_sigma": 1.0,
        "posterior_mu_init": 0.0,
        "posterior_rho_init": -3.0,
        "type": "Reparameterization",  # Flipout or Reparameterization
        "moped_enable": True,  # True to initialize mu/sigma from the pretrained dnn weights
        "moped_delta": 0.5,
}
    
resnet_bnn = torchvision.models.resnet18(pretrained=True)
dnn_to_bnn(resnet_bnn, const_bnn_prior_parameters)

In [11]:
resnet_bnn.eval()

ResNet(
  (conv1): Conv2dReparameterization()
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2dReparameterization()
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2dReparameterization()
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2dReparameterization()
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2dReparameterization()
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): 

### Step 7. Model Inference with the BNN

Use the pre-trained model to predict the image class.

In [12]:
out_bnn = resnet_bnn(batch_t)
print(out_bnn.shape)

torch.Size([1, 1000])


In [13]:
_, index = torch.max(out_bnn, 1)
 
percentage = torch.nn.functional.softmax(out_bnn, dim=1)[0] * 100
 
print(classes[index[0]], percentage[index[0]].item())

868, tray 13.467456817626953


The BNN will return a different result everytime we re-run the network.

In [14]:
out_bnn = resnet_bnn(batch_t)

_, index = torch.max(out_bnn, 1)

percentage = torch.nn.functional.softmax(out_bnn, dim=1)[0] * 100
 
print(classes[index[0]], percentage[index[0]].item())

905, window_shade 83.33670806884766


### Step 9. Optimize the BNN with PyTorch Optim

torch.optim is a package implementing various optimization algorithms. To use torch.optim we construct an optimizer object that will hold the current state and will update the parameters based on the computed gradients.

To construct an Optimizer, we provide the BNN's parameters to optimize. 

Recall how the BNN work: For a classification problem, it performs multiple forward passes each time with new samples of weights and biases. There is one output provided for each forward pass. The uncertainty will be high if the input image is something the network has never seen for all output classes.

In [15]:
import torch.nn as nn
loss_fn = nn.MSELoss()

optimizer = torch.optim.Adam(resnet_bnn.parameters(), lr=0.0001)

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)

optimizer.zero_grad()

for epoch in range(40):
    
    out_bnn = resnet_bnn(batch_t)
    
    _, index = torch.max(out_bnn, 1)
    percentage = torch.nn.functional.softmax(out_bnn, dim=1)[0] * 100
    print(classes[index[0]], percentage[index[0]].item())

    loss = loss_fn(out_bnn, out_nn)
    loss.backward(retain_graph=True)
    
    optimizer.step()
    scheduler.step()

987, corn 66.9870376586914
203, West_Highland_white_terrier 14.794273376464844
973, coral_reef 19.227766036987305
188, wire-haired_fox_terrier 54.31241989135742
775, sarong 4.79146671295166
192, cairn 8.430464744567871
208, Labrador_retriever 15.078536987304688
236, Doberman 10.309348106384277
207, golden_retriever 20.491371154785156
236, Doberman 9.802458763122559
162, beagle 2.8532395362854004
238, Greater_Swiss_Mountain_dog 30.35093879699707
215, Brittany_spaniel 5.035763740539551
168, redbone 12.431377410888672
242, boxer 23.964462280273438
208, Labrador_retriever 37.32036209106445
242, boxer 24.13730239868164
173, Ibizan_hound 21.52020263671875
180, American_Staffordshire_terrier 23.623432159423828
208, Labrador_retriever 4.6609625816345215
236, Doberman 5.629646301269531
207, golden_retriever 30.947134017944336
207, golden_retriever 8.263542175292969
207, golden_retriever 7.026632308959961
852, tennis_ball 17.989099502563477
242, boxer 19.40876007080078
163, bloodhound 31.8183002

In [16]:
out_bnn = resnet_bnn(batch_t)

_, index = torch.max(out_bnn, 1)

percentage = torch.nn.functional.softmax(out_bnn, dim=1)[0] * 100
 
print(classes[index[0]], percentage[index[0]].item())

208, Labrador_retriever 38.302528381347656


In [17]:
out_bnn = resnet_bnn(batch_t)

_, index = torch.max(out_bnn, 1)

percentage = torch.nn.functional.softmax(out_bnn, dim=1)[0] * 100
 
print(classes[index[0]], percentage[index[0]].item())

209, Chesapeake_Bay_retriever 5.968829154968262
