# `probly` Tutorial — Evidential Classification Transformation
This notebook is a practical introduction to the **Evidential Classification transformation** in `probly`. Evidential Deep Learning is a powerful and computationally efficient method for uncertainty quantification that differs significantly from sampling-based approaches like MC-Dropout.


 We will start by explaining the core idea behind evidential learning and see how `probly`'s transformation helps you build such models. We will then walk through a PyTorch example to see how to get an uncertainty estimate from a **single forward pass**.

---

## Part A — Introduction to Evidential Learning

 ---

## 1. Concept: What is Evidential Classification?

### 1.1 The Problem: Overconfident Softmax

A standard classification network outputs logits, which are converted to probabilities using a `softmax` function. 
While useful, a high softmax probability (e.g., 0.99) is often misinterpreted as high model confidence. A model can be "confidently wrong," especially on out-of-distribution data.

### 1.2 The Evidential Approach: Learning "Evidence"
Evidential Deep Learning reframes the problem. Instead of learning a direct mapping from input to class probabilities, the model learns to collect **evidence** for each class.
Think of the model as a detective gathering clues for different suspects (the classes):
-   If the model finds **many clues** pointing to one suspect and very few for others (e.g., evidence of `[100, 2, 5]`), it is very **confident**.
 -   If the model finds **very few clues for any suspect** (e.g., evidence of `[0.1, 0.2, 0.15]`), it is very **uncertain**. This might happen if the input is ambiguous or something the model has never seen before.
 The model's final output is a vector of these evidence scores. The total amount of evidence collected is a direct measure of confidence. 

  ### 1.3 The Evidential Transformation (probly)
  The `probly` transformation helps you build an evidential model by ensuring the output can be interpreted as evidence.
  -   You design your network as usual, but your final layer should output raw logits that represent the "evidence."
  -   The `evidential_classification` transformation simply **appends a `torch.nn.Softplus()` activation function.**
  -   This ensures the evidence scores are always positive, a requirement for the underlying mathematical theory (the Dirichlet distribution).

The uncertainty can then be calculated directly from these evidence scores in a **single forward pass**.

 ### 1.4 Short side‑by‑side comparison

| Aspect                       | Evidential Classification                                        | Standard (Softmax) Classification |
|------------------------------|------------------------------------------------------------------|-----------------------------------------            |
| **Model Output**             | A vector of **evidence** for each class                          |  A vector of **probabilities** for each class.      |
| **Final Activation**         | `Softplus` (to ensure positive evidence).                        | `Softmax` (to ensure probabilities sum to 1).       |
| **Uncertainty Source**       | The **magnitude** of the total evidence.                         | No direct measure; high probability is a poor proxy.|
| **Inference Cost**           |  **One single forward pass.**                                    | One single forward pass.                            |



## 2. Quickstart (PyTorch) 
Below: build a small MLP and apply `evidential_classification(model)` to see how the final activation is appended.

In [1]:
import torch
from torch import nn

from probly.transformation import evidential_classification


def build_mlp(in_dim: int = 10, hidden: int = 32, out_dim: int = 3) -> nn.Sequential:
    return nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.ReLU(),
        nn.Linear(hidden, out_dim),
    )


model = build_mlp()
print("Original model:\n", model)

# Apply the Evidential Classification transformation
model_evidential = evidential_classification(model)
print("\nWith Evidential transformation:\n", model_evidential)

Original model:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=3, bias=True)
)

With Evidential transformation:
 Sequential(
  (0): Sequential(
    (0): Linear(in_features=10, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=3, bias=True)
  )
  (1): Softplus(beta=1.0, threshold=20.0)
)


### Notes on the structure
-   Notice that the transformation has wrapped the original model in a `Sequential` module and **appended a `Softplus` layer** at the end.
-   The output of this new model will now always be positive.

## 3. Uncertainty from a Single Forward Pass
The key advantage of evidential learning is that uncertainty can be calculated directly from the output of a single prediction.
The output of the model gives us the evidence `alpha` for each class. The total evidence, or Dirichlet strength `S`, is the sum of all `alpha`. The uncertainty `u` is then simply the number of classes `K` divided by this strength.
 -   **High `S`** (lots of evidence) -> **Low `u`** (low uncertainty).
 -   **Low `S`** (little evidence) -> **High `u`** (high uncertainty).

In [2]:
from probly.quantification.classification import evidential_uncertainty

torch.manual_seed(0)

# Create a dummy evidential model
model_evidential = evidential_classification(build_mlp())

# A dummy input
x = torch.randn(1, 10)

# Get the evidence from a single forward pass
with torch.no_grad():
    evidence = model_evidential(x)

# `probly` provides a function to calculate uncertainty directly
uncertainty = evidential_uncertainty(evidence.numpy())

print("Input data:\n", x)
print("\nOutput Evidence (alpha):\n", evidence)
print(f"\nCalculated Uncertainty: {uncertainty.item():.4f}")

# Example with higher evidence (more confidence)
high_evidence = torch.tensor([[100.0, 2.0, 5.0]])
low_uncertainty = evidential_uncertainty(high_evidence.numpy())
print(f"\nUncertainty for high evidence: {low_uncertainty.item():.4f}")

Input data:
 tensor([[ 0.1167,  0.1689, -1.1233,  1.8116,  0.6322, -0.8759,  0.3580, -0.4363,
         -0.7609,  1.5249]])

Output Evidence (alpha):
 tensor([[0.8682, 0.9021, 0.5639]])

Calculated Uncertainty: 0.5624

Uncertainty for high evidence: 0.0273


## 4. Part A Summary

In Part A, we introduced Evidential Deep Learning as a powerful alternative to standard softmax classification. Instead of outputting probabilities, an evidential model outputs "evidence" for each class. We learned that the `probly` transformation makes this easy by appending a `Softplus` activation to a standard network. The key advantage is that model uncertainty can be directly calculated from the magnitude of this evidence in a **single, deterministic forward pass**, making it much faster than sampling-based methods.


---

# # Part B — Applied Evidential Classification

----

In **Part A**, we learned the concept of the **Evidential Classification transformation**.
In this **Part B**, we will apply it to a classification model, get a prediction, and calculate the uncertainty from a single forward pass.

An indepth tutorial showing:
- How to define a standard neural network (LeNet) and make it an Evidential model using the `evidential_classification` transformation.

- How to set up the specialized training loop required for an Evidential model, using the Evidential Log Loss and a KL Divergence regularizer.

- How to train the Evidential model on a real-world dataset **(FashionMNIST)**.

- How to evaluate the final classification accuracy of the trained model.

- How to compute and visualize Evidential Uncertainty by rotating an image.

can be found here [Training an Evidential Model for Classification](train_evidential_classification.ipynb)

 ---

# # Final Summary — Evidential Transformation Tutorial

 ---



This tutorial introduced the core concepts of **Evidential Deep Learning**, a powerful and efficient method for uncertainty quantification. We learned that instead of outputting probabilities like a standard classifier, an evidential model outputs **"evidence"** for each class.

We saw that `probly`'s `evidential_classification` transformation automates this by simply appending a `Softplus` layer to a standard network, ensuring the evidence is always positive. The key advantage of this approach is its speed: a meaningful uncertainty score can be calculated directly from the magnitude of the evidence in a **single forward pass**.

For a complete, end-to-end example that shows how to train an evidential model on the **FashionMNIST** dataset using the specialized evidential loss functions, please see the next tutorial: **[Training an Evidential Model for Classification](train_evidential_classification.ipynb)**.