# **Your Guide to: Evidential CLassification** *(with probly)*

The goal of this wonderful Notebook is to showcase what Evidential Classification means and why it is so important for the use of uncertainty awareness. Enjoy!



## 1. What even is Evidential Classification

Have you ever asked yourself, when turning to a machine with a question: "How certain are you about this answer and how do I know that I can trust you?" ü§î
<br> Well, you can¬¥t. But that is when Evidential Classification helps us!
<br> With it, we can understand how certain our machine is about its own prediction. In normal classifiers, we only get a probability vector ‚Äî but not how confident the model is about that probability.
<br> This transformation adds uncertainty-awareness in a single forward pass.

Let¬¥s look at how this works! üëá

### 1.1 Its use in uncertainty

Now you might be asking yourself: *"Why are you telling me about this and when would I ever need this?"* üôÑ
I¬¥m glad you asked! 

**Evidential Classification** becomes important when we talk about **uncertainty in machine learning**. 
<br> Usually, your machine will tell you **which option is probably correct** but it won¬¥t tell **how certain** it is about this prediction.
<br> In other words, you¬¥ll get *a probability* for a class but not *a confidence* about that probability.

### 1.2 Now: *How* is it used?

Now that we know *what* it is and its necessity, we can move on to how we use it. Before I can show you, we have to learn how it¬¥s structured. ü´†
<br> Don¬¥t worry, it¬¥s short, I promise.

> #### Softmax vs. Softplus vs. Dirichlet
These 3 functions are doing all the work for us, so listen up. 

***Softmax***: converts the input value to a value between 0-1, which all together sum up to 1‚Äã $$ \mathbb{p_i} = \frac{e^{x_i}}{\sum_j e^{x_j}} $$

***Softplus***: approaches zeros (but never reaches them) and negative values, turning them positive $$ \text{Softplus} (x) = \log(1 + e^x) $$

***Dirichlet***: shows the distribution of the probabilities. $$ \mathbb{E}[p_k] = \frac{\alpha_k}{\sum_i \alpha_i} $$

See, it wasn¬¥t that bad, right? üòÅ 

### 2. Turning a Base Model into Evidential Model 

Now we really start! üö¶
<br> Like we established in the beginning, we are using probly! yeey ü•≥ So let¬¥s look at it.
<br> probly provides a transformation that appends a Softplus layer after each Module to ensure positive outputs. So let¬¥s look at how the base for this would look like.

In [None]:
# install probly & torch via: pip install probly torch
import torch
from torch import nn

# base model
base = nn.Sequential(
    nn.Linear(10, 3),
)

# add softplus activation to ensure positive outputs
model = nn.Sequential(base, nn.Softplus())

print(model)

Perfect! Now we have a *base* nn.Linear layer that gets 10 inputs and outputs 3 values (also called **logits**) while nn.Softplus ensures that the logits stay *positive*. For anyone wondering: nn.Sequential simply lets us *chain* layers. Easy right? üòÅ

### 3. What do the logits (Œ±-values) tell us?

We all know some statistics, right? We also know that a higher probability is usually better - unless you¬¥re calculating the odds of getting famous. 
<br> In that case... we *love* uncertainty! üòå
<br> But back to the important stuff: the larger our Œ±-values are, the higher our confidence and certainty become ‚Äî and the smaller they are, the less confident our predictions are.

In [3]:
alpha_confident = torch.tensor([10.0, 0.5, 0.5])
alpha_uncertain = torch.tensor([1.1, 1.1, 1.1])

As you can see, our first set of logits represents a Dirichlet distribution that makes us confident about our predictions - <br> meanwhile the second one shows that we have absolutely no clue which class is correct, and every answer could be the right one. üòÖ <br> And we don¬¥t like that!

Okay, now let¬¥s say we have a picture of a cat and our machine tells us what it thinks it is.  
<br> Let¬¥s visualize our machines prediction! Run the codeee! 

In [None]:
import matplotlib.pyplot as plt

classes = ["cat", "dog", "octopus"]

# confident
plt.bar(classes, [10, 0.5, 0.5])
plt.title("Confident Model (High alpha)")
plt.show()
print("This a cat!  I am sure!")

# uncertain
plt.bar(classes, [1.1, 1.1, 1.1])
plt.title("Uncertain Model (Low alpha)")
plt.show()
print("Hmm... I have no idea what this is. Could be anything.")

You can perfectly see the distribution and the difference between the two models.üßê
<br> It is very clear, that the Dirichlet distribution, really shows us the uncertainty of the answer, at first glance, in contrast to our softmax values. 

### 4. Measuring Uncertainty with Entropy and Evidence

Now that we¬¥ve seen this ongoing battle between the probabilities and uncertaintyü•ä - let‚Äôs add a small quantitative view:
<br> We can measure **how uncertain** our model is by calculating the *entropy* of its probabilities. 
<br> Entropy might sound like a fancy word, but it simply tells us the extent to which the uncertainty of our machine goes.
<br> To understand the code a little better, you need to know 2 simple functions:

This is the mean of our Dirichlet distribution: $$ \mathbb{E}[p_k] = \frac{\alpha_k}{\sum_i \alpha_i} $$
This is our entropy: $$ \mathbb{H}[p] = - {\sum_i} {p_i} {log(p_i)} $$

In [None]:
import torch


def p_mean(alpha: torch.Tensor) -> torch.Tensor:
    return alpha / alpha.sum()


def entropy(p: torch.Tensor) -> torch.Tensor:
    return -(p * torch.log(p)).sum()


alpha_confident = torch.tensor([10.0, 0.5, 0.5])
alpha_uncertain = torch.tensor([1.1, 1.1, 1.1])

for name, a in [("Confident", alpha_confident), ("Uncertain", alpha_uncertain)]:
    p = p_mean(a)
    H = entropy(p)
    print(f"{name} | Probabilities: {p.tolist()} | Entropy: {H:.3f}")

We can see:
<br> ‚¶Å low entropy ‚Üí more confident predictions 
<br> ‚¶Å high entropy ‚Üí model uncertainty 

‚Üí So if your entropy is high, your model is basically screaming at you:
‚ÄúI don‚Äôt know, don‚Äôt trust me!‚Äù ü•≤

On the other hand,, we also have something called *evidence*. Yes, just like the evidence you¬¥d need in court, to get out of a ticket - so drive slowly! üöóüí®
<br> Anyway, it¬¥s the same concept:
<br> ‚¶Å **high evidence** ‚Üí the model has seen similar examples before ‚Üí **high confidence**
<br> ‚¶Å **low evidence** ‚Üí the model isn‚Äôt sure ‚Üí **high uncertainty**
<br> So this is another cool way to tell **how uncertain our machine really is!**

<u> So all in all: </u>
<br> Entropy measures **how uncertain** a prediction is,
while evidence tells us **how strongly** the model **supports** that prediction.

In practice, both work hand in hand:
<br> low entropy and high evidence means the model is confident -
<br> high entropy and low evidence means it‚Äôs confused. 

Let¬¥s visualize this a little!

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Raw values
entropy_values = np.array([0.368, 1.099])
evidence_values = np.array([10, 1.1])

# Normalize for better visualization
entropy_norm = entropy_values / entropy_values.max()
evidence_norm = evidence_values / evidence_values.max()

labels = ["Confident Model", "Uncertain Model"]
x = np.arange(len(labels))
width = 0.35

fig, ax = plt.subplots(figsize=(6, 4))
ax.bar(x - width / 2, entropy_norm, width, label="Entropy (Uncertainty)", color="tomato")
ax.bar(x + width / 2, evidence_norm, width, label="Evidence (Confidence)", color="mediumseagreen")

ax.set_ylabel("Normalized scale (0-1)")
ax.set_title("Normalized Entropy vs Evidence")
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
plt.show()

<br> We could also imagine this as a line ‚Äî
when entropy goes up, evidence goes down.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

entropy = np.linspace(0, 1, 6)
evidence = 1 - entropy  # inverse relationship

plt.plot(entropy, evidence, marker="o", color="mediumseagreen")
plt.xlabel("Entropy (Uncertainty)")
plt.ylabel("Evidence (Confidence)")
plt.title("Relationship between Entropy and Evidence")
plt.grid(True)
plt.show()

### 5. Comparing Softmax vs. Evidential

Now to really get the difference between using just softmax vs. evidential.
Softmax gives us probabilities that *always* **look** confident ‚Äî  
even when the model has never seen the data before.
Evidential models, on the other hand, keep track of **how much evidence** supports each prediction.  
They can say:  
> ‚ÄúI think it‚Äôs a cat‚Ä¶ but I‚Äôm not really sure.‚Äù üòÖ 

In [None]:
import torch.nn.functional as F

logits = torch.tensor([2.0, 5.0, 3.0])
softmax_probs = F.softmax(logits, dim=0)
print("Softmax probabilities:", softmax_probs.tolist())

alpha = torch.nn.functional.softplus(logits) + 1
dirichlet_mean = alpha / alpha.sum()
print("Evidential mean (alpha):", dirichlet_mean.tolist())

At first glance the results look very similiar but theres is a small but distinct difference.
<br> **Softmax** converts logits into clean probabilities that always add up to 1.  
<br> That‚Äôs why one value (here around 0.8) dominates ‚Äî it always looks **confident**. 

**Evidential** models use Œ±-values that include evidence.
<br> Their probabilities look *flatter*, because they account for how much the model actually *knows*.  
<br> So even if both predict the same class, Evidential is usually more *honest* about its uncertainty. 

### 6. Confidence ‚â† Correctness

This is very important! You can never forget that a model can be very confident and still completely wrong.

This happens when it has never seen similar data before ‚Äî it simply doesn‚Äôt know that it doesn‚Äôt know. You can¬¥t know what you haven¬¥t learned üòû

Evidential models help fix this:
they express how sure the model is, and how much evidence that certainty is based on. Without it you wouldn¬¥t know if you can trust your machine.

In other words, they don‚Äôt just say ‚ÄúI‚Äôm 95 % sure‚Äù,
they also tell us ‚Äúand I actually have good reasons for that. And I can show you my evidence.‚Äù üòÑ

### 7. Takeaway
Evidential Classification teaches our models not only *what* to predict, but also *how sure* they are about it.
<br> So next time your model says ‚ÄúI‚Äôm 95% sure‚Äù ‚Äî you can finally ask ‚Äúand how much evidence do you have for that?‚Äù üòâ