# `probly` Tutorial — Evidential Regression Transformation

This notebook is a practical introduction to the **Evidential Regression transformation** in `probly`. This technique allows a model to predict not just a single value, but a full probability distribution, enabling it to quantify both its confidence in the data (**aleatoric uncertainty**) and its own knowledge (**epistemic uncertainty**).

We will start by explaining the core idea behind evidential regression and then see how `probly`'s transformation automates the process of building such a model by replacing the final layer. We will then train this model on a simple 1D dataset and visualize its predictive uncertainty.

---

# # Part A — Introduction to Evidential Regression

---

## 1. Concept: What is Evidential Regression?
### 1.1 The Problem: Standard Regression Predicts a Point

A standard regression network is trained to predict a single value. For a given input, it might predict `y = 3.14`. This gives us no information about the model's confidence. Is the prediction `3.14 ± 0.01` or `3.14 ± 10.0`? We have no way of knowing.


### 1.2 The Evidential Approach: Predicting a Distribution
Evidential Regression reframes the problem. Instead of predicting a single point, the model predicts the four parameters of a **Normal-Inverse-Gamma (NIG)** distribution. These four parameters are: `gamma` (γ), `nu` (ν), `alpha` (α), and `beta` (β).

Together, these parameters define a distribution over our prediction. From them, we can directly calculate:
-   **The Prediction:** The mean of the distribution (given by `gamma`).
-   **Aleatoric Uncertainty (Data Noise):** The inherent noise or ambiguity in the data itself. A high value means the data points are widely scattered.
-   **Epistemic Uncertainty (Model Ignorance):** The model's own uncertainty about its predictions. A high value means the model is "out of its depth," perhaps because it's seeing data far from what it was trained on.

### 1.3 The Evidential Regression Transformation (probly)
The `probly` transformation makes it easy to create an evidential regression model.
-   You design your network as usual.
-   The `evidential_regression` transformation traverses your model *backwards* and **replaces the final `nn.Linear` layer** with a special `NormalInverseGammaLinear` layer.
-   This new final layer is responsible for outputting the four `(γ, ν, α, β)` parameters instead of a single value.

The uncertainty can then be calculated from these parameters in a **single forward pass**.

### 1.4 Short side‑by‑side comparison

| Aspect | Evidential Regression | Standard (Point) Regression |
| :--- | --- | --- |
| **Model Output** | Four parameters: `(γ, ν, α, β)` | A single predicted value. |
| **Final Layer** | `NormalInverseGammaLinear` | `nn.Linear` |
| **Uncertainty Source** | Calculated directly from the four output parameters. | None. |
| **Inference Cost** | **One single forward pass.** | One single forward pass. |


## 2. Quickstart (PyTorch)
Below: build a small MLP and apply `evidential_regression(model)` to see how the *last* linear layer is replaced.

In [1]:
import torch
from torch import nn

from probly.transformation import evidential_regression


def build_mlp(in_dim: int = 10, hidden: int = 32, out_dim: int = 1) -> nn.Sequential:
    return nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.ReLU(),
        nn.Linear(hidden, hidden),
        nn.ReLU(),
        nn.Linear(hidden, out_dim),
    )


model = build_mlp()
print("Original model:\n", model)

# Apply the Evidential Regression transformation
model_evidential = evidential_regression(model)
print(f"\nWith Evidential transformation:\n", model_evidential)

Original model:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=32, bias=True)
  (3): ReLU()
  (4): Linear(in_features=32, out_features=1, bias=True)
)

With Evidential transformation:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=32, bias=True)
  (3): ReLU()
  (4): NormalInverseGammaLinear()
)


### Notes on the structure
-   Notice that the transformation has replaced **only the final `nn.Linear` layer** with a `NormalInverseGammaLinear` layer.
-   The output of this new model will be a dictionary containing the four parameters.

## 3. Part A Summary
In Part A, we introduced Evidential Regression as a method for a model to predict its own uncertainty. Instead of a single point, the model learns to output the four parameters of a Normal-Inverse-Gamma distribution (`γ, ν, α, β`). We learned that the `probly` transformation automates this by replacing the final linear layer of a network. The key advantage is that both data uncertainty (aleatoric) and model uncertainty (epistemic) can be calculated from these parameters in a single, deterministic forward pass.

 ---
 
# # Part B — Applied Evidential Regression

 ---

In this part, we will train an evidential regression model on a simple 1D dataset and visualize its predicted mean and uncertainty bounds.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F
import math

from probly.transformation import evidential_regression

torch.manual_seed(42)

## 1. Setup and Custom Loss Function
To train an evidential regression model, we can't use a standard loss like MSE. We need a loss function that encourages the model to produce a distribution that correctly fits the data. The standard loss is the Negative Log-Likelihood (NLL) of the Normal-Inverse-Gamma distribution.
The function below looks complex, but its job is simple: it calculates how well the predicted distribution explains the true `y` value.


In [2]:
def nig_nll_loss(y_true, gamma, nu, alpha, beta, reduce=True):
    """The Negative Log-Likelihood loss for the Normal-Inverse-Gamma distribution."""
    # The two fundamental moments of the distribution
    mu = gamma
    var = beta / (alpha - 1)
    
    # The precision term for the Normal distribution
    lambda_ = 2 * alpha * (1 + nu) / (nu * (2 * beta))
    
    log_likelihood = 0.5 * torch.log(lambda_ / (2 * math.pi)) \
                     - 0.5 * lambda_ * (y_true - mu)**2 \
                     - alpha * torch.log(beta) \
                     + torch.lgamma(alpha) \
                     - 0.5 * torch.log(1 + nu)
    
    loss = -log_likelihood
    if reduce:
        return torch.mean(loss)
    return loss

def build_tiny_mlp(in_dim: int = 1, hidden: int = 64) -> nn.Sequential:
    """A simple MLP for our 1D regression task."""
    return nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.ReLU(),
        nn.Linear(hidden, hidden),
        nn.ReLU(),
        nn.Linear(hidden, 1),
    )

## 2. Create a Synthetic Dataset
 We'll create a simple 1D dataset with some noise. We will intentionally leave a gap in the training data to see how the model's uncertainty behaves in a region where it has no knowledge.


In [6]:
X_train1 = torch.linspace(-4, -1, 100).unsqueeze(1)
X_train2 = torch.linspace(1, 4, 100).unsqueeze(1)
X_train = torch.cat([X_train1, X_train2], dim=0)
y_train = X_train.pow(3) + 3 * torch.randn(X_train.shape)
X_test = torch.linspace(-6, 6, 200).unsqueeze(1)
y_test = X_test.pow(3) + 3 * torch.randn(X_test.shape)


## 3. Apply Transformation and Train the Model

In [None]:
# 1. Create the base model
base_model = build_tiny_mlp()

# 2. Apply the evidential regression transformation
evidential_model = evidential_regression(base_model)

# 3. Train the model
optimizer = torch.optim.Adam(evidential_model.parameters(), lr=1e-3)

for epoch in range(1000):
    optimizer.zero_grad()
    
    # The model now outputs a dictionary of the four parameters
    params = evidential_model(X_train)
    gamma, nu, alpha, beta = params['gamma'], params['nu'], params['alpha'], params['beta']
    
    loss = nig_nll_loss(y_train, gamma, nu, alpha, beta)
    
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 200 == 0:
        print(f"Epoch {epoch+1}/{1000}, Loss: {loss.item():.4f}")


## 4. Inference and Uncertainty Calculation

In [None]:
with torch.no_grad():
    params = evidential_model(X_test)
    gamma, nu, alpha, beta = params['gamma'], params['nu'], params['alpha'], params['beta']

# The mean prediction is gamma
mean_pred = gamma

# Aleatoric (data) uncertainty
aleatoric_var = beta / (alpha - 1)

# Epistemic (model) uncertainty
epistemic_var = beta / (nu * (alpha - 1))

total_var = aleatoric_var + epistemic_var
total_std = torch.sqrt(total_var)


## 5. Visualization //TODO

---
 
## Final Summary — Evidential Regression Tutorial
 
---

This tutorial demonstrated how to use the **Evidential Regression Transformation** in `probly` to create models that can predict their own uncertainty.
We learned that instead of a single point, an evidential model outputs the four parameters of a distribution (`γ, ν, α, β`). The `probly` transformation automates this by replacing the final linear layer of a network. The key advantage is that both **aleatoric (data) uncertainty** and **epistemic (model) uncertainty** can be calculated from these parameters in a single forward pass.
 We saw this in practice by training a model on a dataset with a gap. The final visualization clearly showed the model's uncertainty increasing in the regions where it had no training data, making it a powerful and interpretable tool for building more reliable regression models.