# Credal Sets â€” Exploration Notebook

This notebook gives a very simple, beginner-friendly view of credal sets.
We start from standard probability distributions, which are vectors of nonnegative numbers that sum to 1.
An ensemble model can produce multiple probability distributions for the same input.
The **credal set** is the set of all these probability distributions produced by the ensemble.



## Minimal intuition

To summarize an ensemble of probability distributions, we can look at **lower** and **upper** probabilities per class.
For each class, the **lower** value is the minimum probability that any ensemble member gives to that class.
For each class, the **upper** value is the maximum probability that any ensemble member gives to that class.
These lower/upper values form an interval that tells us a simple range of belief for each class.
First, we will create a small NumPy matrix P, that represents the ensemble output/predictions.


In [1]:
import numpy as np

In [2]:
P = np.array(
    [
        [0.6, 0.3, 0.1],
        [0.5, 0.4, 0.1],
        [0.7, 0.2, 0.1],
        [0.4, 0.4, 0.2],
        [0.55, 0.25, 0.20],
    ],
    dtype=float,
)

Now, we have to do basic sanity checks on P, to make sure that no probability is negative. It will also sum each row.

In [None]:
nonneg_msg = "Probabilities must be nonnegative."
row_sum_msg = "Each row must sum to 1.0."

if not np.all(P >= 0.0):
    raise ValueError(nonneg_msg)

if not np.allclose(P.sum(axis=1), 1.0):
    raise ValueError(row_sum_msg)

In [4]:
# Lower and upper probability envelopes per class
lower = P.min(axis=0)
upper = P.max(axis=0)

print("Ensemble probabilities P (rows = members, columns = classes):")
print(P)
print()
print("Lower envelope per class:")
print(lower)
print()
print("Upper envelope per class:")
print(upper)

Ensemble probabilities P (rows = members, columns = classes):
[[0.6  0.3  0.1 ]
 [0.5  0.4  0.1 ]
 [0.7  0.2  0.1 ]
 [0.4  0.4  0.2 ]
 [0.55 0.25 0.2 ]]

Lower envelope per class:
[0.4 0.2 0.1]

Upper envelope per class:
[0.7 0.4 0.2]


## Interpreting lower and upper envelopes

The **lower** value for a class is the most conservative belief across all ensemble members.
It says, "even the least confident member does not go below this probability for this class."
The **upper** value for a class is the most optimistic belief across all ensemble members.
It says, "at least one member goes up to this probability for this class."
If the interval between lower and upper is **wide**, the ensemble members disagree a lot, so uncertainty is high.
If the interval is **narrow**, the ensemble members are more in agreement, so uncertainty is lower.


In [None]:
# Simple summary table: lower, upper, width per class
interval_width = upper - lower

for class_idx, (lo, up, width) in enumerate(zip(lower, upper, interval_width, strict=False)):
    print(f"Class {class_idx}: lower={lo:.3f}, upper={up:.3f}, width={width:.3f}")

## What is the credal set here?

Each row of `P` is one probability distribution over the 3 classes.
The whole matrix `P` collects several such distributions from different ensemble members.
The **credal set** in this example is simply the set of all these rows.
It is a small, discrete set of possible beliefs about the same input.
The lower and upper envelopes we computed are a basic summary of this credal set.


## Relation to Probly (high-level only)

In Probly, credal sets are organized in a small hierarchy, with classes like `CredalSet`, `DiscreteCredalSet`, and `CategoricalCredalSet`.
The idea is that a credal set is a set of probability distributions, with some structure and operations defined on it.
In the NumPy implementation, these distributions are stored in arrays with shapes that look roughly like `(..., num_members, num_classes)`.
This means we track, for each input, several members (distributions) over a fixed number of classes.
Our simple matrix `P` in this notebook mirrors that idea conceptually, but in a tiny, stand-alone form.
Here we only explain the concept and do not import or depend on Probly code.


## Basic Experimentation

Now we run a few simple experiments to build intuition about credal sets and envelope bounds.
All experiments use small ensembles (3 classes, a few to tens of members) and focus on how disagreement affects the lower/upper intervals.


In [None]:
np.random.seed(0)


# Helper function to validate probability distributions
def validate_probs(p: np.ndarray) -> None:
    nonneg_msg = "Probabilities must be nonnegative."
    row_sum_msg = "Each row must sum to 1.0."
    if not np.all(p >= 0.0):
        raise ValueError(nonneg_msg)
    if not np.allclose(p.sum(axis=1), 1.0):
        raise ValueError(row_sum_msg)


# Helper function to compute envelopes
def envelope(p: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
    lower = p.min(axis=0)
    upper = p.max(axis=0)
    return lower, upper

### Experiment 1: Low vs High disagreement

We compare two ensembles: one where members agree (low disagreement) and one where they disagree more (high disagreement).
We expect wider uncertainty intervals when disagreement is higher.


In [None]:
# Low disagreement: all members are very similar
P_low = np.array(
    [
        [0.6, 0.3, 0.1],
        [0.61, 0.29, 0.1],
        [0.59, 0.31, 0.1],
        [0.6, 0.3, 0.1],
        [0.6, 0.3, 0.1],
    ],
    dtype=float,
)
validate_probs(P_low)

# High disagreement: members differ more
P_high = np.array(
    [
        [0.8, 0.15, 0.05],
        [0.3, 0.5, 0.2],
        [0.5, 0.3, 0.2],
        [0.2, 0.6, 0.2],
        [0.4, 0.4, 0.2],
    ],
    dtype=float,
)
validate_probs(P_high)

# Compute envelopes and widths
lower_low, upper_low = envelope(P_low)
width_low = upper_low - lower_low
avg_width_low = width_low.mean()

lower_high, upper_high = envelope(P_high)
width_high = upper_high - lower_high
avg_width_high = width_high.mean()

print("Low disagreement ensemble:")
print(f"  Average interval width: {avg_width_low:.4f}")
print(f"  Lower: {lower_low}")
print(f"  Upper: {upper_low}")
print()
print("High disagreement ensemble:")
print(f"  Average interval width: {avg_width_high:.4f}")
print(f"  Lower: {lower_high}")
print(f"  Upper: {upper_high}")
print()
print(f"P_high has wider bounds: {avg_width_high > avg_width_low}")

As expected, the high disagreement ensemble has wider uncertainty intervals.
When ensemble members disagree more, the gap between lower and upper bounds grows, reflecting higher uncertainty.


### Experiment 2: Effect of number of ensemble members

We generate ensembles with different numbers of members, all centered around the same distribution but with small random variations.
This shows how the number of members affects the bounds we observe.


In [None]:
# Center distribution
c = np.array([0.6, 0.3, 0.1])

# Generate ensembles with different member counts
member_counts = [3, 5, 20]
scale = 0.05  # Noise scale

results = []
for n in member_counts:
    # Generate noise and add to center
    noise = np.random.normal(0, scale, size=(n, 3))
    P_n = c + noise
    # Clip to nonnegative and renormalize
    P_n = np.clip(P_n, 0, None)
    P_n = P_n / P_n.sum(axis=1, keepdims=True)
    validate_probs(P_n)

    # Compute envelopes
    lower_n, upper_n = envelope(P_n)
    width_n = upper_n - lower_n
    avg_width_n = width_n.mean()

    results.append((n, avg_width_n, lower_n, upper_n))
    print(f"n={n:2d} members: average interval width = {avg_width_n:.4f}")

print()
print("Summary: More members can reveal more extreme values, affecting bounds.")

The bounds depend on both the variability of the distributions and the sample size (number of members).
With more members, we are more likely to observe extreme values, which can widen the lower/upper bounds.


### Experiment 3: Averaging loses information

A common approach is to average ensemble predictions into a single distribution.
However, this hides the disagreement between members.
We show that the average lies within the bounds but does not capture the uncertainty range.


In [None]:
# Use P_high from Experiment 1
# Compute the mean (averaged) distribution
p_mean = P_high.mean(axis=0)

# Verify p_mean is a valid distribution
validate_probs(p_mean.reshape(1, -1))

# Get envelopes for P_high
lower_high, upper_high = envelope(P_high)

print("Averaged distribution (p_mean):")
print(p_mean)
print()
print("Credal set bounds (lower, upper):")
for class_idx in range(3):
    lo = lower_high[class_idx]
    up = upper_high[class_idx]
    pm = p_mean[class_idx]
    inside = lo <= pm <= up
    print(f"  Class {class_idx}: [{lo:.3f}, {up:.3f}]")
    print(f"    p_mean[{class_idx}] = {pm:.3f} (inside bounds: {inside})")
print()
print("The average lies within the bounds but does not show the width of uncertainty.")