# Bayesian Transformation 

This notebook is a practical introduction to the Bayesian transformation in `probly`. Bayesian Neural Networks are a more advanced topic than Dropout or DropConnect,
so this tutorial aims to provide an intuitive, hands-on understanding.

We will start by explaining the core idea behind Bayesian Neural Networks (BNNs) and then see how the `probly` transformation enables you to create them. After that, we will look at a PyTorch example to inspect the transformed model and use it to estimate uncertainty.

---
## Part A: Introduction to BNNs and the Bayesian Transformation
---


## 1.Concept: What is a Bayesian Neural Network?

To understand the Bayesian transformation, we first need to understand the difference between a standard neural network and a Bayesian one.

### 1.1 Standard Neural Networks

In a standard neural network, each weight is a single, deterministic number. After training, these weights are fixed. 
When you pass an input through the model, it follows one exact path, producing one exact output.
The model has no inherent way to express how "sure" it is about the values of its weights.

### 1.2 Bayesian Neural Networks (BNNs)

In a Bayesian Neural Network, we replace the deterministic weights with probability distributions.
Instead of a weight being a single number, it might be represented by a Gaussian (normal) distribution 
with a mean and a standard deviation.

- The mean represents the most likely value for that weight.

- The standard deviation represents the model's uncertainty about that weight. A small standard deviation means the model
 is very confident in the weight's value, while a large one means it is very unsure.

During a forward pass, we don't use the mean value directly. Instead, we sample a value for each weight from its distribution.
Because we get a slightly different set of weights every time, each forward pass on the same input will produce a slightly different
 output. This natural variation is a direct reflection of the model's parameter uncertainty.

### 1.3 The Bayesian Transformation `(probly)`

The Bayesian transformation in `probly` automates the process of converting a standard network into a BNN.

The transformation does the following:

It walks through your PyTorch model and finds all compatible layers (e.g., nn.Linear and nn.Conv2d).
It programmatically replaces each standard layer with a corresponding custom Bayesian layer (e.g., BayesLinear, BayesConv2d).
These new layers contain weight distributions instead of single values and are inherently stochastic, even during inference.

This allows us to get a distribution of predictions by running multiple forward passes, which we can then use to quantify the model's uncertainty.


### 1.4. What that entails
| Aspect                       |Bayesian Transformation `(probly)`                                                |
|------------------------------|--------------------------------------------------------                          |
| **Main Idea**                | "Weights are distributions"                                                      | 
| Stochastic Element           | Weights are sampled from probability distributions.                              | 
| Architectural Change         | Replaces `nn.Linear` and `nn.Conv2d` with `BayesLinear`/`BayesConv2d` layers.    | 
| Uncertainty Interpretation   | A principled, direct measure of the model's parameter uncertainty.               | 
|Supported Layers              | `Linear` and `Conv2d`                                                            | 
|Key Parameters                | `prior_mean`, `prior_std`, `posterior_std`                                       | 

## 2. Quickstart (PyTorch)

Below: build a small MLP, apply `bayesian(model)`, and inspect the modified architecture to see the layer replacement.


In [2]:
import torch
from torch import nn

from probly.transformation import bayesian


def build_mlp(in_dim: int = 10, hidden: int = 32, out_dim: int = 1) -> nn.Sequential:
    return nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.ReLU(),
        nn.Linear(hidden, hidden),
        nn.ReLU(),
        nn.Linear(hidden, out_dim),
    )


model = build_mlp()
print("Original model:\n", model)

# Apply the Bayesian transformation with default parameters
model_bnn = bayesian(model)
print("\nWith Bayesian transformation:\n", model_bnn)

Original model:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=32, bias=True)
  (3): ReLU()
  (4): Linear(in_features=32, out_features=1, bias=True)
)

With Bayesian transformation:
 Sequential(
  (0): BayesLinear()
  (1): ReLU()
  (2): BayesLinear()
  (3): ReLU()
  (4): BayesLinear()
)


#### Notes on the structure

- Notice that each `Linear` layer has been replaced by a `BayesLinear` layer.
- The new layers manage the distributions for the weights and biases internally.

## 3. Uncertainty via Stochastic Forward Passes

To obtain predictive uncertainty, we run multiple forward passes. In each pass, a new set of weights is sampled from the learned distributions.
We then compute the mean and variance of the resulting predictions.

In [3]:
# Toy regression data
torch.manual_seed(0)
n = 128
X = torch.randn(n, 10)
true_w = torch.randn(10, 1)
y = X @ true_w + 0.1 * torch.randn(n, 1)

# Build and transform the model
model = build_mlp(in_dim=10, hidden=64, out_dim=1)
model_bnn = bayesian(model)

# Simple training loop (for illustration)
opt = torch.optim.Adam(model_bnn.parameters(), lr=1e-2)
loss_fn = nn.MSELoss()

for _step in range(200):
    opt.zero_grad()
    pred = model_bnn(X)
    loss = loss_fn(pred, y)
    loss.backward()
    opt.step()


# Prediction function (stochastic passes)
@torch.no_grad()
def stochastic_predict(
    bayesian_model: nn.Module,
    inputs: torch.Tensor,
    n_samples: int = 50,
) -> tuple[torch.Tensor, torch.Tensor]:
    preds = []
    for _ in range(n_samples):
        preds.append(bayesian_model(inputs).detach())
    stacked = torch.stack(preds, dim=0)  # [n_samples, N, out_dim]
    mean = stacked.mean(dim=0)
    var = stacked.var(dim=0, unbiased=False)
    return mean, var


mean_pred, var_pred = stochastic_predict(model_bnn, X[:5], n_samples=100)
print("Predictive mean (first 5):\n", mean_pred.squeeze())
print("\nPredictive variance (first 5):\n", var_pred.squeeze())

Predictive mean (first 5):
 tensor([-1.1153,  3.2737,  1.8071, -1.3878, -0.6908])

Predictive variance (first 5):
 tensor([0.0594, 0.0913, 0.0668, 0.0288, 0.0506])


## 4. Part A Summary

In Part A, we introduced the core concept of Bayesian Neural Networks, where weights are represented as probability distributions rather than single numbers.
This inherently captures the model's uncertainty about its own parameters. We saw how the `probly.transformation.bayesian` Bayesian Transformation makes creating BNNs simple: 
it traverses a standard PyTorch model and replaces `nn.Linear` and `nn.Conv2d` layers with their Bayesian counterparts. This transformed model naturally producesIn Part A, 
we introduced the core concept of Bayesian Neural Networks, where weights are represented as probability distributions rather than single numbers.
This inherently captures the model's uncertainty about its own parameters. We saw how the probly Bayesian Transformation makes creating BNNs simple: 
it traverses a standard PyTorch model and replaces nn.Linear and nn.Conv2d layers with their Bayesian counterparts. This transformed model naturally
produces a distribution of outputs for any given input, allowing us to directly quantify predictive uncertainty. a distribution of outputs for any given input, 
allowing us to directly quantify predictive uncertainty.

---

## Part B — Applied BNN Transformation
--- 

In Part A, we learned what the Bayesian transformation in `probly` does.
In this Part B, we will apply it to a model containing both linear and convolutional layers, run several stochastic predictions, and visualize the resulting uncertainty.

An indepth tutorial showing:

- How to define a standard neural network (LeNet) and make it Bayesian using the bayesian transformation.

- How to set up the specialized training loop required for a BNN using the ELBO loss function.

- How to train the BNN on a real-world dataset (FashionMNIST).

- How to evaluate the final classification accuracy of the trained Bayesian model.

Can be found in the here:  **[Training a BNN for Classification](train_bnn_classification.ipynb)**.


---

## Final Summary — Bayesian Transformation Tutorial

---

This tutorial introduced the core concepts of Bayesian Neural Networks (BNNs), where weights are treated as probability distributions to capture model uncertainty. We demonstrated how `probly`'s **`bayesian` transformation** automates this by replacing standard `nn.Linear` and `nn.Conv2d` layers with their stochastic Bayesian counterparts. We also walked through a simplified example of how to run multiple forward passes to get a predictive mean and variance.

While this notebook covered the fundamentals, a proper BNN requires a specialized training procedure. For a complete, end-to-end guide that shows you how to train a Bayesian LeNet on the FashionMNIST dataset using the correct **ELBO loss**, please see the next tutorial: **[Training a BNN for Classification](train_bnn_classification.ipynb)**.

