# Push Bayesian Deep Learning Tutorial
## Introduction

In this notebook we will introduce the concept of bayesian deep learning and demonstrate its usage in Push by running a deep ensemble.

## The Posterior Predictive Distribution
The goal of bayesian deep learning methods is to estimate the posterior predictive distribution

$$p(y|x, D) = \int p(y|x, w) p(w|D) \, dw
$$

where y is an output, x is an input, w are parameters, and D is is the data. This integral is intractable and must be approximated. Nearly all approaches to estimating this integral when it cannot be computed in closed form, involve a simple Monte Carlo approximation [1]: 
$$p(y|x, D) \approx \frac{1}{J} \sum_{j=1}^{J} p(y|x, w_j), \quad w_j \sim p(w|D)$$


with each $w_j$ corresponding to a unique parameter setting. Deep Ensembles train J different initializations of the same nueral network, producing J parameter settings. If we average $p(y|x, w_j)$ over J models, as shown above, we are essentially creating a Bayesian Model Average which approximates the posterior predictive distribution. Thus deep ensembles are a valid bayesian deep learning method, inspiring this tutorial as it is one of the the simpler forms of bayesian deep learning methods to implement and grasp.

/home/jtsegaye/push/docs/source/tutorials/01_Bayesian_Deep_Learning


In [1]:
from typing import *
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from test_basic import RandDataset, MiniNN, BiggerNN


import push.bayes.ensemble



# =============================================================================
# Simple Dataset + Neural Network
# =============================================================================

# class RandDataset(Dataset):
#     def __init__(self, batch_size, N, D):
#         self.xs = torch.randn(batch_size*N, D)
#         self.ys = torch.randn(batch_size*N, 1)

#     def __len__(self):
#         return len(self.xs)

#     def __getitem__(self, idx):
#         return self.xs[idx], self.ys[idx]


# class MiniNN(nn.Module):
#     def __init__(self, D):
#         super(MiniNN, self).__init__()
#         self.fc1 = nn.Linear(D, D)
#         self.fc2 = nn.Linear(D, D)

#     def forward(self, x):
#         x = self.fc1(x)
#         x = torch.nn.ReLU()(x)
#         x = self.fc2(x)
#         return x
    

# class BiggerNN(nn.Module):
#     def __init__(self, n, D):
#         super(BiggerNN, self).__init__()
#         self.minis = []
#         self.n = n
#         for i in range(0, n):
#             self.minis += [MiniNN(D)]
#             self.add_module("mini_layer"+str(i), self.minis[-1])
#         self.fc = nn.Linear(D, 1)
            
#     def forward(self, x):
#         for i in range(0, self.n):
#             x = self.minis[i](x)
#         return self.fc(x)



In [7]:
# Dataset
L = 5
D = 10
N = 1000
batchsize = 32
dataset = RandDataset(batchsize, N, D)
dataloader = DataLoader(dataset, batch_size=batchsize, shuffle=True)

# Dispatch
def mk_dataloader():
    return DataLoader(RandDataset(batchsize, N, D), batch_size=batchsize, shuffle=True)

epochs = 10
num_ensembles = 4
push.bayes.ensemble.train_deep_ensemble(
    dataloader, # mk_dataloader,
    torch.nn.MSELoss(),
    epochs,
    BiggerNN, L, D,
    num_devices=2,
    num_ensembles=num_ensembles
)


  0%|          | 0/10 [00:00<?, ?it/s]

References:

[1] A.G. Wilson, P. Izmailov. Bayesian Deep Learning and a Probabilistic Perspective of Generalization. Advances in Neural Information Processing Systems, 2020.