## Introduction

This notebook provides a practical introduction to the core utility functions in `probly`.  
These helpers are essential building blocks for training probabilistic models and quantifying uncertainty.

We will focus on two main categories:

- **Model traversal functions**, which inspect a model’s architecture  
- **Uncertainty quantification functions**, which compute meaningful uncertainty scores from model predictions

## Key Utility Functions in `probly`

### 1. `collect_kl_divergence` (for BNNs)

**What it does:**  
Automatically traverses a Bayesian Neural Network and sums the KL divergence from each Bayesian layer.

**Why it’s useful:**  
This function is critical for computing the **ELBO loss** during training.


In [1]:
from torch import nn

from probly.train.bayesian.torch import collect_kl_divergence
from probly.transformation import bayesian


# 1. Define a standard model
def build_mlp() -> nn.Sequential:
    return nn.Sequential(nn.Linear(10, 50), nn.ReLU(), nn.Linear(50, 2))


# 2. Apply the Bayesian transformation
# This replaces nn.Linear layers with BayesLinear layers, each of which has a
# .kl_divergence property.
bnn_model = bayesian(build_mlp())

# 3. Use the utility function to sum the KL divergence across all Bayesian layers
total_kl = collect_kl_divergence(bnn_model)

print("Successfully collected the Total KL Divergence from the BNN.")
print(f"Total KL Divergence: {total_kl.item():.4f}")

# This `total_kl` value would then be passed to the ELBOLoss during a training step.

Successfully collected the Total KL Divergence from the BNN.
Total KL Divergence: 1637.1818


### 2. `total_entropy`, `conditional_entropy`, `mutual_information`

**What they do:**  
These functions take a set of predictions (for example, from an ensemble) and decompose predictive uncertainty.

**Why they’re useful:**  
They allow you to separately measure:

- **Aleatoric uncertainty** (inherent randomness in the data)  
- **Epistemic uncertainty** (uncertainty due to limited model knowledge)


In [4]:
import numpy as np

from probly.quantification.classification import conditional_entropy, mutual_information, total_entropy

# 1. Create a dummy set of predictions for one input instance.
# Shape: (n_instances, n_samples, n_classes) -> (1, 5, 3)
# This simulates getting 5 different predictions from an ensemble for a 3-class problem.
# These predictions show some disagreement, indicating uncertainty.
prob_samples = np.array(
    [
        [
            [0.7, 0.2, 0.1],
            [0.6, 0.3, 0.1],
            [0.8, 0.1, 0.1],
            [0.6, 0.2, 0.2],
            [0.7, 0.1, 0.2],
        ]
    ]
)

# 2. Decompose the uncertainty
# Total uncertainty in the average prediction
total = total_entropy(prob_samples)

# Aleatoric uncertainty: average uncertainty *within* each prediction
aleatoric = conditional_entropy(prob_samples)

# Epistemic uncertainty: how much the predictions disagree with each other
epistemic = mutual_information(prob_samples)

print(f"Total Uncertainty (Entropy): {total.item():.4f}")
print(f"Aleatoric Uncertainty (Data Noise): {aleatoric.item():.4f}")
print(f"Epistemic Uncertainty (Model Ignorance): {epistemic.item():.4f}")

# 3. Verify the decomposition
# Total uncertainty is approximately the sum of its two parts.
print(f"\nSum of parts: {(aleatoric + epistemic).item():.4f}")
print(f"Decomposition is correct: {np.allclose(total, aleatoric + epistemic)}")

Total Uncertainty (Entropy): 1.2208
Aleatoric Uncertainty (Data Noise): 1.1804
Epistemic Uncertainty (Model Ignorance): 0.0404

Sum of parts: 1.2208
Decomposition is correct: True



### 3. `evidential_uncertainty` (for Evidential Models)

**What it does:**  
Computes an uncertainty score directly from the **evidence vector** produced by an evidential model.

**Why it’s useful:**  
It provides a fast, single-pass way to determine whether a model is uncertain about its prediction.


In [7]:
import numpy as np

from probly.quantification.classification import evidential_uncertainty

# 1. Simulate the output of an evidential model for two different inputs.
# The output is a vector of "evidence", not probabilities.

# Case 1: The model is UNCERTAIN (it found very little evidence for any class)
low_evidence = np.array([[0.1, 0.2, 0.15]])

# Case 2: The model is CONFIDENT (it found a lot of evidence for one class)
high_evidence = np.array([[100.0, 2.0, 5.0]])

# 2. Calculate the uncertainty score for each case
uncertainty_low_confidence = evidential_uncertainty(low_evidence)
uncertainty_high_confidence = evidential_uncertainty(high_evidence)

print(f"Evidence vector (low confidence): {low_evidence}")
print(f"Resulting Uncertainty Score: {uncertainty_low_confidence.item():.4f}\n")

print(f"Evidence vector (high confidence): {high_evidence}")
print(f"Resulting Uncertainty Score: {uncertainty_high_confidence.item():.4f}")

# The uncertainty score is much higher when the total evidence is low, as expected.

Evidence vector (low confidence): [[0.1  0.2  0.15]]
Resulting Uncertainty Score: 0.8696

Evidence vector (high confidence): [[100.   2.   5.]]
Resulting Uncertainty Score: 0.0273
