# Adaptive Prediction Sets (APS)

### The Problem of Uncertainty in Machine Learning Predictions


Traditional ML models do not provide information about **uncertainty or reliabilty**. But that is **crucial** in criitical applications, that are in need of **guaranteed coverage**.

An example for such application would be an autuonomous driving application. It is not enough to predcit "It is a pedestrian", the application has **to be confident** about something on the road being "a pederian, cyclist or traffic sign", to prevent serious concequences.


### Core Idea: Conformal Prediction

While traditional ML models only provide point predictions, APS provide prediction sets. 

Traditional: Model predicts "class 3"

APS: Model predicts "{2, 3, 5}" with 90% confidence

### Marginal Coverage

Marginal coverage provides a statistical guarantee.

$$\mathbb{P}[Y_{n+1} \in \hat{C}_{n,\alpha}(X_{n+1})] \geq 1-\alpha$$

Where: 
- $\mathbb{P}[\cdot]$: Probability operator
- $Y_{n+1}$: Unknown true label we want to predict
- $\hat{\mathcal{C}}_{n,\alpha}(\cdot)$: prediction set function that maps features
- $X_{n+1}$: Observed features of test point
- $1-\alpha$: Target probability of coverage

Let's look at a simple coverage test, where $\alpha=0.1$.

In [1]:
test_images = 10
alpha = 0.1
target_coverage = 1 - alpha

results = [1, 1, 1, 1, 1, 1, 1, 0, 1, 1]  # 9 out of 10 correct

coverage = sum(results) / len(results)

print(f"Number of test images: {test_images}")
print(f"Target coverage: {target_coverage} (90%)")
print(f"Actual coverage: {coverage} ({coverage * 100}%)")
print(f"Coverage >= Target coverage: {coverage >= target_coverage}")
print(f"Calculation: {sum(results)} / {len(results)} = {coverage}")

Number of test images: 10
Target coverage: 0.9 (90%)
Actual coverage: 0.9 (90.0%)
Coverage >= Target coverage: True
Calculation: 9 / 10 = 0.9


### The APS Algorithm

Provides the Conformity Score:
$$ E(x,y,u;\hat{\pi}) = \min\{\tau \in [0,1] : y \in \mathcal{S}(x,u;\hat{\pi},\tau)\} $$

The conformity score measures the **minimum probability threshold** at which the true label would be included in the prediction set, quantifying how **"surprised"** the model is by the correct answer.

Following: A Simple APS Conformity Score implemented:

In [2]:
import numpy as np

def aps_conformity_score(self, probabilities: np.ndarray, true_labels: np.ndarray) -> np.ndarray:
    """
    E(x,y,u;π̂) = min{τ ∈ [0,1] : y ∈ S(x,u;π̂,τ)}
    
    Finds the minimum threshold τ at which the true class 
    is included in the prediction set.
    """
    conformity_scores = []

    for true_label, probs in zip(true_labels, probabilities):
        #descending
        sorted_indices = np.argsort(probs)[::-1]
        sorted_probs = probs[sorted_indices]

        cumulative_probs = np.cumsum(sorted_probs)

        true_class_pos = np.where(sorted_indices == true_label)[0][0]

        #score is cumulative prob up to the true class
        aps_score = cumulative_probs[true_class_pos]
        conformity_scores.append(aps_score)

    return np.array(conformity_scores)

Where inputs are probabilitites(model's predicted probabilities), true_labels.

For each sample, sort classes by probability and find the **cumulative probability at whih the true class is included**.

Low conformity score means the model is "suprised" by the true label.

Mathemathical Interpretation: The minimum probability threshold τ where the true class y would be included in the prediction set S(x,τ).

### APS implemented on Iris Dataset

1. Whats is the Iris Dataset? 
The Iris Dataset is a classical benchmark dataset, whih contains 150 samples. It is an ideal dataset to test and teach such algoriths through examples.

Heres an implementation of Iris and APS implemented:


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from typing import Tuple, List, Dict, Any
import pandas as pd

class APSIrisImplementation:
    """Complete APS Implementation"""

    def __init__(self, significance_level: float = 0.1):
        self.significance_level = significance_level
        self.model = None
        self.quantile = None
        self.calibration_scores = None

        def load_prepare_data(self) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
            """Load and split the Iris dataset"""
            iris = load_iris()
            X, y = iris.data, iris.target
            feature_names = iris.feature_names
            target_names = iris.target_names

            #split inot train, calib, and test set
            X_temp, X_test, y_temp, y_test = train_test_split(x, y, test_size=0.2, random_state=42, stratify=y)
            X_train, X_calib, y_train, y_calib = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42, stratify=y_temp)

            return X_train, X_calib, X_test, y_train, y_calib, y_test, feature_names, target_names
        
        def train_model(self, X_train: np.ndarray, y_train: np.ndarray):
            """Train base classifier"""
            self.model = RandomForestClassifier(n_estimators=100, random_state=42)
            self.model.fit(X_train, y_train)
            return self.model
        
        aps_conformity_score()

        def calculate_prediction_sets(self, probabilities: np.ndarray, tau )

SyntaxError: incomplete input (875457832.py, line 33)