In [None]:
from IPython.core.display import HTML

*Authors: Gerard Castro, Leonardo Bocchi* </br>
Public repo: https://github.com/gcastro-98/conformal-bayesian.git
<div align="center">
  <h1>Bayesian Statistics: Final Project</h1>
  <h2>Conformal Bayesian Computation</h2>
</div>
</br>

Conformal Bayesian Computation (CBC) refers to a framework that combines elements of both conformal prediction and Bayesian inference.

Conformal prediction is a machine learning approach that provides valid measures of confidence or credibility for predictions made by a model. It constructs prediction regions or sets that contain future observations with a certain probability. These prediction regions are constructed based on the observed training data.

On the other hand, Bayesian inference is a statistical framework that provides a way to update prior beliefs about a quantity of interest based on observed data. It involves specifying prior beliefs, likelihood functions, and using Bayes' theorem to obtain posterior distributions.

Conformal Bayesian computation combines these two approaches by using conformal prediction to provide valid measures of uncertainty within a Bayesian framework. It leverages the flexibility and interpretability of Bayesian inference while incorporating the notion of confidence or credibility from conformal prediction. By using conformal Bayesian computation, it is possible to obtain probabilistic predictions with well-calibrated uncertainty estimates. This can be useful in various applications, such as regression, classification, and anomaly detection, where it is important to have reliable measures of uncertainty associated with predictions.

In this project we follow the work of [Fong, Edwin, and Chris C. Holmes. "Conformal bayesian computation." Advances in Neural Information Processing Systems 34 (2021): 18268-18279](https://arxiv.org/abs/2106.06137), going through the proposed methods and presenting the obtained results. In their work the authors show that CBC can provide accurate and well-calibrated measures of uncertainty in a variety of settings, including Bayesian linear regression, Bayesian logistic regression, and Bayesian neural networks. They also show that CBC can be used to construct credible intervals for the parameters of interest, and that these intervals have good frequentist coverage properties.

# 1. Introduction

CBC constructs prediction intervals or regions that cover the unknown but true posterior distribution of the parameters of interest with a specified probability level.

In general, Bayesian prediction makes use of the posterior predictive distribution being of the form

$$p(y|x_{n+1},Z_{1:n}) = \int f_{\theta}(y|x_{n+1})\pi(\theta |Z_{1:n})d\theta$$

were $\pi(\theta |Z_{1:n})$ is the Bayesian posterior, and $f_{\theta}(y|x)$ is the model likelihood. Given this Bayesian predictive distribution it is possible to construct the highest density posterior predictive credible interval as well as the central credible interval.
However, it is well known Bayesian intervals can be poorly calibrated in the frequentist sense if that model presents misspecification. </br>
Conformal inference is proposed as a solution to this issue as a way to "de-Bayesing" and "conformalizing". </br> In their work, Edwin Fong and Chris Holmes present a scalable Monte Carlo (MC) method for *conformal Bayes* using an 'add-one-in' importance sampling algorithm, that only makes use of samples of model parameter values from the posterior $\theta \sim \pi(\theta|Z_{1:n})$. The authors demonstrate how "the Bayesian hierarchical model allows for a natural sharing of information between
groups for within-group predictions with covariates". They do so by developing computationally efficient methods in this settings.

## 1.1 Background

Conformal inference methods were first introduced by Gammerman et al. (1998), but full conformal prediction methods are not viable in terms of computational efficiency, requiring the retraining of the model at each test covariate $x_{n+1}$ and for each value in a reference grid of potential outcomes. Other techniques have been proposed in order to reduce the computational cost, such as test split confromal prediction, and cross-conformal prediction; "a detailed
discussion of computational costs of various conformal methods is provided in Barber et al. (2021,
Section 4)".

# 2. Conformal Bayes

## 2.1 Full Conformal Prediction

Conformal prediction is a framework for constructing prediction intervals that come with guarantees of validity. The key idea is to use only the data to construct the intervals, so they are therefore "conformal" to the data.
The Full Conformal Prediction (FCP) approach consists of the following steps:

Define a conformity measure $\sigma$ (for instance the negative squared error for point prediction) and produces the following $1-α$ confidence prediction interval:

$$C_α(Xn+1) = {y ∈ R : π(y) > α}$$

where $π(y)$ is the rank of $σ_{n+1}$ among $σ_{1:n+1}$ and it is defined as

$$π(y) = \frac{1}{n+1}\sum^{n+1}_{i=1} \mathbb{1}(σ_{i}\leq σ_{n+1})$$

and $σ_i := σ(Z_{1:n+1}| Z_i)$ is the conformity measure for the sample i

## 2.2 Conformal Bayes and Add-One-In Importance Sampling

The method adopting as the conformity score the natural suggestion of the posterior predictive density

$$σ(Z_{1:n+1};Z_i) = p(Y_i| X_i, Z_{1:n+1})$$

is denoted as Conformal Bayes (CB). The authors' work highlights a crucial insight: the refitting of the Bayesian model using $${Z_1, . . . , Z_n, \{y, X_{n+1}\}}$$ can be effectively approximated through Importance Sampling (IS), where only $\{y, X_{n+1}\}$ changes between refits. This approach allows to ease computational efforts and make this efficient to compute predictive densities. This realization directly leads to an IS-based approach for achieving full conformal Bayes, wherein the computation of "Add-One-In" (AOI) predictive densities becomes crucial. The term "AOI" signifies the inclusion of $\{Y_{n+1}, X_{n+1}\}$ into the training set, drawing parallels to the concept of "leave-one-out" (LOO) cross-validation.

The use of AOI importance sampling exhibits similarities with the computation of Bayesian leave-one-out cross-validation (LOOCV) predictive densities (Vehtari et al., 2017), which is also employed to account for model misspecification. However, an interesting aspect of AOI, in comparison to LOO, is its ability to generate predictive densities that are less susceptible to instability in the importance weights.






## 2.4 Motivation

The motivation/purpose of the authors' work is stated as follows:

Considerable literature has delved into the differing foundations and interpretations of uncertainty measures between Bayesian and frequentist approaches (Little, 2006; Shafer and Vovk, 2008; Bernardo and Smith, 2009; Wasserman, 2011). A summary of these discussions can be found in the Appendix. In this context, we present the motivation for Conformal Bayesian (CB) predictive intervals from both Bayesian and frequentist perspectives.

Pragmatic Bayesians, who acknowledge the potential for model misspecification in either the prior or likelihood, may find value in employing conformal inference as a safeguard. CB predictive intervals, which guarantee frequentist coverage, can be offered as a complement to the traditional Bayesian predictive intervals. Moreover, the disparity between the Bayesian and conformal intervals can serve as an informal tool for model evaluation (e.g., Gelman et al., 2013). Given that posterior samples obtained through Markov chain Monte Carlo (MCMC) or direct sampling are typically accessible, CB using automated AOI incurs minimal additional computational overhead

The frequentist perspective also offers the possibility of utilizing a Bayesian model as a valuable tool for constructing predictive confidence intervals. There are several advantages to this approach. Firstly, unlike the traditional residual conformity score, the likelihood function in the Bayesian model can account for factors such as skewness and heteroscedasticity. This allows for a more comprehensive representation of the data. Secondly, incorporating features like sparsity, support, and regularization can be achieved through the use of priors in the Bayesian model. This flexibility allows for the inclusion of additional information and prior beliefs, while ensuring that Conformal Bayesian (CB) methodology maintains correct coverage of the predictive intervals.

However, a subtle issue arises in full conformal prediction where validity is compromised if hyperparameter selection is not symmetric with respect to $Z_{n+1}$. For example, if the lasso penalty $λ$ is estimated using only Z1:n before computing the full conformal intervals with $λ(Z_{1:n})$. In contrast, CB addresses this issue by incorporating a prior distribution on hyperparameters. This prior induces a weighting of hyperparameter values through implicit cross-validation for each refit. This approach is supported by previous research (Gneiting and Raftery, 2007; Fong and Holmes, 2020). It is worth noting that this issue does not affect the split conformal method.

## 3.1 Group Conformal Prediction

We now introduce the concept of Group Conformal Prediction (GCP), which is a modification to CBC that allows for prediction sets that cover the true posterior distribution of the parameters within each group with a specified probability level; GCP is a special case of PCP.

GCP consists of the following steps:

1. Divide the observed data into groups.
Use a Bayesian model to generate a set of $M$ posterior samples $\{θ^m\}_{m=1}^M$ for each group, which approximate the posterior distribution of the parameters of interest within each group.
2. For each observed data point $y$ in each group, compute the conformal p-value $P(y,θ^m)$ for each posterior sample $θ^m$ within that group, using the same formula as in CBC.
3. Construct the prediction set $C(y)$ as $C(y) = {θ : P(y,θ) \geq \alpha}$, where $\alpha$ is the pre-specified significance level, but only for each group separately.
Note that GCP can be combined with CBC, CB, and AOIS to handle models with group structure in a wide range of applications.

# 4. Experiments

### 4.0. Necessary imports

In [None]:
import os
from os import makedirs
from sklearn.datasets import load_diabetes, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LassoCV, Lasso
from sklearn.linear_model import LogisticRegressionCV, LogisticRegression
import time
import numpy as np
import pandas as pd
from tqdm import tqdm
import seaborn as sns
import matplotlib.pyplot as plt

Google Collab does not allow for the proper envinronment to run the scripts in the Cloud. However, the code will be displayed and the results obtained locally were uploaded so that they can be presented henceforth

## 4.1 Sparse Regression

The authors apply the CBC framework to sparse regression with the [`sklearn` diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html): a regression when the number of predictors (covariates) is large (d=10) and there  are few samples (n=442).

For reproducibility, all the predictors and the response variable (diabetes progression, continuous) were normalized to have mean $0$ and standard deviation $1$. We will compute the central (1−α) credible interval from the Bayesian posterior predictive CDF estimated using Monte Carlo.

The Bayesian model considered is: $$
\begin{aligned}
& f_\theta(y \mid x)=\mathcal{N}\left(y \mid \theta^{\mathrm{T}} x+\theta_0, \tau^2\right) \\
& \pi\left(\theta_j\right)=\text { Laplace }(0, b), \quad \pi\left(\theta_0\right) \propto 1, \quad \pi(b)=\operatorname{Gamma}(1,1) \quad \pi(\tau)=\mathcal{N}^{+}(0, c)
\end{aligned}
$$
for $j = 1, \ldots , d$, and where $b$ is the scale parameter and $\mathcal{N}^{+}$ is the half-normal distribution.

In the experiment we will consider two
values of $c$ for the hyperprior on $\tau$, which correspond to:
- A **well-specified** prior for this dataset, $c = 1$, according to Jansen, 2013 (Chapter 4.5)
- A **poorly-specified** prior (`misspec`), $c = 0.02$, in which the posterior on $\tau$ will be heavily weighted towards a small value.

To check coverage, we repeatedly divide into a training and test dataset for 50 repeats, with 30% of
the dataset in the test split. We evaluate the conformal prediction set on a grid of size $n_{grid} = 100$ between $[y_{\min} − 2, y_{\max} + 2]$, where $y_{\min}$, $y_{\max}$ is computed from each training dataset.

### Code

In [None]:
"""
Summary of the Fong & Holmes (2021) implementation on Conformal Bayesian
Computation. Specifically, this module deals with the sparse regression
(and its uncertainty quantification) assessed for the `sklearn`:
- Diabetes dataset (diabetes)

The following 4 methods are implemented:
- Bayesian inference (bayes)
- Conformal bayes (cb)
- Split conformal prediction (split)
- Full conformal prediction (full)

A brief summary of the models can be found below:
- 'bayes': This method uses the posterior samples of the model parameters to
compute the likelihood of the test point belonging to each class. It uses the
likelihoods to construct a prediction interval.
- 'split': This method fits a LASSO logistic regression model to the first
half of the training data, then computes the residuals on the second half
of the training data. It uses the residuals to define a threshold for the
prediction intervals, which are constructed using the logistic regression
model on the testing data.
- 'full': This method fits a logistic regression model to the combined
training and testing data, then computes the rank of the test point in
the combined data set. It uses the rank to define a threshold for the
prediction intervals, which are constructed using the logistic
regression model on the test point.
- 'cb': uses the posterior distribution from MCMC sampling to define
conformal prediction intervals.


The execution chain is as it follows:
- The data is loaded using `load_train_test_sparse_regression`
- The Bayesian inference (bayes) is performed through the function
`run_sparse_regression_mcmc` (which requires the MCMC computations
defined at `fit_mcmc_laplace`).
- Then, using the former results, the Conformal Bayesian method
(along the 2 other baselines) is applied using the function
`run_sparse_regression_conformal`:
  - The 'split' and 'full' conformal baselines are defined in the functions
`conformal_split` and `conformal_full`, respectively.
  - The 'cb' method is implemented in the function `compute_cb_region_IS`
  and it also uses the own `logistic_loglikelihood` clause.
- Once launched the main function `run_sparse_regression_conformal`, it
loads the training and testing data and the posterior samples of the model
parameters (computed because `run_sparse_regression_mcmc` was run first).
- Then it iterates through a specified number of repetitions, applying each
of the four methods to compute prediction intervals and coverage probabilities
for each test point.
- For each repetition, the function applies the split method, the full method,
the Bayesian method, and the conformal Bayes method to compute the prediction
intervals and coverage probabilities. It also records the computation time
for each method.
- At the end of the function, these results are saved to various files,
including the coverage probabilities, the lengths of the prediction
intervals, and the computation times for each method.

"""


# #############################################################################
# LOAD DATA (sparse regression)
# #############################################################################

def load_train_test_sparse_regression(train_frac, dataset, seed):
    # Load dataset
    if dataset == "diabetes":
        x, y = load_diabetes(return_X_y=True)
    elif dataset == "boston":
        x, y = load_boston(return_X_y=True)
    else:
        print('Invalid dataset')
        return

    n = np.shape(x)[0]
    d = np.shape(x)[1]

    # Standardize beforehand (for validity)
    x = (x - np.mean(x, axis=0)) / np.std(x, axis=0)
    y = (y - np.mean(y)) / np.std(y)

    # Train test split
    ind_train, ind_test = train_test_split(np.arange(n),
                                           train_size=int(train_frac * n),
                                           random_state=seed)
    x_train = x[ind_train]
    y_train = y[ind_train]
    x_test = x[ind_test]
    y_test = y[ind_test]

    y_plot = np.linspace(np.min(y_train) - 2, np.max(y_train) + 2, 100)

    return x_train, y_train, x_test, y_test, y_plot, n, d


# #############################################################################
# BAYESIAN INFERENCE (MCMC)
# #############################################################################

# Laplace prior PyMC3 model
def fit_mcmc_laplace(y, x, B, seed=100, misspec: bool = False):
    with pm.Model() as _:
        p = np.shape(x)[1]
        # Laplace
        b = pm.Gamma('b', alpha=1, beta=1)
        beta = pm.Laplace('beta', mu=0, b=b, shape=p)
        intercept = pm.Flat('intercept')
        if misspec:
            sigma = pm.HalfNormal("sigma", sigma=0.02)  # misspec prior
        else:
            sigma = pm.HalfNormal("sigma", sigma=1)
        obs = pm.Normal('obs', mu=pm.math.dot(x, beta) + intercept,
                        sigma=sigma, observed=y)
        trace = pm.sample(B, random_seed=seed, chains=4)

    beta_post = trace['beta']
    intercept_post = trace['intercept'].reshape(-1, 1)
    sigma_post = trace['sigma'].reshape(-1, 1)
    b_post = trace['b'].reshape(-1, 1)
    print(np.mean(sigma_post))  # check misspec.

    return beta_post, intercept_post, b_post, sigma_post


# #############################################################################
# APPLICATION OF THE BAYESIAN INFERENCE (MCMC)

# Repeat 50 mcmc runs for different train test splits
def run_sparse_regression_mcmc(dataset, misspec: bool = False):
    # Repeat over 50 reps
    rep = 50
    train_frac = 0.7
    B = 2000

    # Initialize
    x, y, x_test, y_test, y_plot, n, d = load_train_test_sparse_regression(
        train_frac, dataset, 100)

    beta_post = np.zeros((rep, 4 * B, d))
    intercept_post = np.zeros((rep, 4 * B, 1))
    b_post = np.zeros((rep, 4 * B, 1))
    sigma_post = np.zeros((rep, 4 * B, 1))
    times = np.zeros(rep)

    for j in tqdm(range(rep)):
        seed = 100 + j
        x, y, x_test, y_test, y_plot, n, d = load_train_test_sparse_regression(
            train_frac, dataset, seed)
        start = time.time()
        beta_post[j], intercept_post[j], b_post[j], sigma_post[
            j] = fit_mcmc_laplace(y, x, B, seed, misspec)
        end = time.time()
        times[j] = (end - start)

    # Save posterior samples
    if misspec:
        suffix = dataset
    else:
        suffix = dataset + "_misspec"

    print("{}: {} ({})".format(suffix, np.mean(times),
                               np.std(times) / np.sqrt(rep)))

    np.save("samples/beta_post_sparsereg_{}".format(suffix), beta_post)
    np.save("samples/intercept_post_sparsereg_{}".format(suffix),
            intercept_post)
    np.save("samples/b_post_sparsereg_{}".format(suffix), b_post)
    np.save("samples/sigma_post_sparsereg_{}".format(suffix), sigma_post)
    np.save("samples/times_sparsereg_{}".format(suffix), times)


# #############################################################################
# CONFORMAL PREDICTION
# #############################################################################

# Lasso split method baseline
def conformal_split(y, x, x_test, alpha, y_plot, seed=100):
    n = np.shape(y)[0]
    n_test = np.shape(x_test)[0]
    # Fit lasso to training set
    ls = LassoCV(cv=5, random_state=seed)
    n_train = int(n / 2)
    ls.fit(x[0:n_train], y[0:n_train])
    # Predict lasso on validation set
    y_pred_val = ls.predict(x[n_train:])
    resid = np.abs(y_pred_val - y[n_train:])
    k = int(np.ceil((n / 2 + 1) * (1 - alpha)))
    d = np.sort(resid)[k - 1]
    # Compute split conformal interval
    band_split = np.zeros((n_test, 2))
    y_pred_test = ls.predict(x_test)  # predict lasso on test
    band_split[:, 0] = y_pred_test - d
    band_split[:, 1] = y_pred_test + d
    return band_split


# Lasso full method baseline
def conformal_full(y, x, x_test, alpha, y_plot, C, seed=100):
    n = np.shape(y)[0]
    rank_full = np.zeros(np.shape(y_plot)[0])
    for i in range(np.shape(y_plot)[0]):
        y_new = y_plot[i]
        x_aug = np.concatenate((x, x_test), axis=0)
        y_aug = np.append(y, y_new)
        ls = Lasso(alpha=C, random_state=seed)
        ls.fit(x_aug, y_aug)
        y_pred_val = ls.predict(x_aug)
        resid = np.abs(y_pred_val - y_aug)
        rank_full[i] = np.sum(resid >= resid[-1]) / (n + 1)
    region_full = rank_full > alpha
    return region_full


# #############################################################################
# CONFORMAL BAYESIAN
# #############################################################################

# CONFORMAL FROM MCMC SAMPLES (JAX IMPLEMENTATION)

# Compute bayesian central 1-alpha credible interval from MCMC samples
@jit
def compute_bayes_band_MCMC(alpha, y_plot, cdf_pred):
    cdf_pred = jnp.mean(cdf_pred, axis=1)

    band_bayes = np.zeros(2)
    band_bayes = index_update(band_bayes, 0, y_plot[
        jnp.argmin(jnp.abs(cdf_pred - alpha / 2))])
    band_bayes = index_update(band_bayes, 1, y_plot[
        jnp.argmin(jnp.abs(cdf_pred - (1 - alpha / 2)))])
    return band_bayes


# compute rank (un-normalized by n+1)
def compute_rank_IS(logp_samp_n, logwjk):
    # n = jnp.shape(logp_samp_n)[1]  # logp_samp_n is B x n
    # n_plot = jnp.shape(logwjk)[0]
    # rank_cp = jnp.zeros(n_plot)

    # compute importance sampling weights and normalizing
    wjk = jnp.exp(logwjk)
    Zjk = jnp.sum(wjk, axis=1).reshape(-1, 1)

    # compute predictives for y_i,x_i and y_new,x_n+1
    p_cp = jnp.dot(wjk / Zjk, jnp.exp(logp_samp_n))
    p_new = jnp.sum(wjk ** 2, axis=1).reshape(-1, 1) / Zjk

    # compute nonconformity score and sort
    pred_tot = jnp.concatenate((p_cp, p_new), axis=1)
    rank_cp = np.sum(pred_tot <= pred_tot[:, -1].reshape(-1, 1), axis=1)
    return rank_cp


# compute region of grid which is in confidence set
@jit
def compute_cb_region_IS(alpha, logp_samp_n,
                         logwjk):  # assumes they are connected
    n = jnp.shape(logp_samp_n)[1]  # logp_samp_n is B x n
    rank_cp = compute_rank_IS(logp_samp_n, logwjk)
    region_true = rank_cp > alpha * (n + 1)
    return region_true


# #############################################################################
# MAIN PUBLIC FUNCTION (application of CONFORMAL BAYESIAN and the 3 baselines)
# #############################################################################

def run_sparse_regression_conformal(dataset, misspec: bool = False):
    # Compute intervals
    # Initialize
    train_frac = 0.7
    x, y, x_test, y_test, y_plot, n, d = load_train_test_sparse_regression(
        train_frac, dataset, 100)

    # Load posterior samples
    if misspec:
        suffix = dataset
    else:
        suffix = dataset + "_misspec"

    beta_post = jnp.load("samples/beta_post_sparsereg_{}.npy".format(suffix))
    intercept_post = jnp.load(
        "samples/intercept_post_sparsereg_{}.npy".format(suffix))
    sigma_post = jnp.load("samples/sigma_post_sparsereg_{}.npy".format(suffix))

    # Initialize
    alpha = 0.2
    rep = np.shape(beta_post)[0]
    n_test = np.shape(x_test)[0]

    coverage_cb = np.zeros((rep, n_test))
    coverage_cb_exact = np.zeros((rep, n_test))  # avoiding grid effects
    coverage_bayes = np.zeros((rep, n_test))
    coverage_split = np.zeros((rep, n_test))
    coverage_full = np.zeros((rep, n_test))

    length_cb = np.zeros((rep, n_test))
    length_bayes = np.zeros((rep, n_test))
    length_split = np.zeros((rep, n_test))
    length_full = np.zeros((rep, n_test))

    band_bayes = np.zeros((rep, n_test, 2))
    region_cb = np.zeros((rep, n_test, np.shape(y_plot)[0]))
    region_full = np.zeros((rep, n_test, np.shape(y_plot)[0]))
    band_split = np.zeros((rep, n_test, 2))

    times_bayes = np.zeros(rep)
    times_cb = np.zeros(rep)
    times_split = np.zeros(rep)
    times_full = np.zeros(rep)

    for j in tqdm(range(rep)):
        seed = 100 + j
        # load dataset
        x, y, x_test, y_test, y_plot, n, d = load_train_test_sparse_regression(
            train_frac, dataset, seed)
        dy = y_plot[1] - y_plot[0]

        # split method
        start = time.time()
        band_split[j] = conformal_split(y, x, x_test, alpha, y_plot, seed)
        coverage_split[j] = (y_test >= band_split[j, :, 0]) & (
                    y_test <= band_split[j, :, 1])
        length_split[j] = np.abs(band_split[j, :, 0] - band_split[j, :, 1])
        end = time.time()
        times_split[j] = end - start

        # full method
        start = time.time()
        C = 0.004
        for i in (range(n_test)):
            region_full[j, i] = conformal_full(y, x, x_test[i:i + 1], alpha,
                                               y_plot, C, seed)
            coverage_full[j, i] = region_full[
                j, i, np.argmin(np.abs(y_test[i] - y_plot))]
            length_full[j, i] = np.sum(region_full[j, i]) * dy
        end = time.time()
        times_full[j] = end - start

        # Bayes
        start = time.time()

        @jit  # normal cdf from posterior samples
        def normal_likelihood_cdf(_y, _x):
            return norm.cdf(
                _y, loc=jnp.dot(beta_post[j],
                                _x.transpose()) + intercept_post[j],
                scale=sigma_post[j])  # compute likelihood samples

        # Precompute cdfs
        cdf_test = normal_likelihood_cdf(y_plot.reshape(-1, 1, 1), x_test)

        for i in (range(n_test)):
            band_bayes[j, i] = compute_bayes_band_MCMC(
                alpha, y_plot, cdf_test[:, :, i])
            coverage_bayes[j, i] = (y_test[i] >= band_bayes[j, i, 0]) & (
                        y_test[i] <= band_bayes[j, i, 1])
            length_bayes[j, i] = np.abs(
                band_bayes[j, i, 1] - band_bayes[j, i, 0])
        end = time.time()
        times_bayes[j] = end - start

        # Conformal Bayes
        start = time.time()

        @jit  # normal loglik from posterior samples
        def normal_loglikelihood(_y, _x):
            return norm.logpdf(
                _y,
                loc=jnp.dot(beta_post[j], _x.transpose()) + intercept_post[j],
                scale=sigma_post[j])  # compute likelihood samples

        logp_samp_n = normal_loglikelihood(y, x)
        logwjk = normal_loglikelihood(y_plot.reshape(-1, 1, 1), x_test)
        logwjk_test = normal_loglikelihood(y_test, x_test).reshape(1, -1,
                                                                   n_test)

        for i in (range(n_test)):
            region_cb[j, i] = compute_cb_region_IS(
                alpha, logp_samp_n, logwjk[:, :, i])
            coverage_cb[j, i] = region_cb[
                j, i, np.argmin(np.abs(y_test[i] - y_plot))]  # grid coverage
            length_cb[j, i] = np.sum(region_cb[j, i]) * dy
        end = time.time()
        times_cb[j] = end - start

        # compute exact coverage to avoid grid effects
        for i in (range(n_test)):
            coverage_cb_exact[j, i] = compute_cb_region_IS(
                alpha, logp_samp_n, logwjk_test[:, :, i])  # exact coverage

    # #Save regions (need to update)
    np.save("results/region_cb_sparsereg_{}".format(suffix), region_cb)
    np.save("results/band_bayes_sparsereg_{}".format(suffix), band_bayes)
    np.save("results/band_split_sparsereg_{}".format(suffix), band_split)
    np.save("results/region_full_sparsereg_{}".format(suffix), band_split)

    np.save("results/coverage_cb_sparsereg_{}".format(suffix), coverage_cb)
    np.save("results/coverage_cb_exact_sparsereg_{}".format(suffix),
            coverage_cb_exact)
    np.save("results/coverage_bayes_sparsereg_{}".format(suffix),
            coverage_bayes)
    np.save("results/coverage_split_sparsereg_{}".format(suffix),
            coverage_split)
    np.save("results/coverage_full_sparsereg_{}".format(suffix), coverage_full)

    np.save("results/length_cb_sparsereg_{}".format(suffix), length_cb)
    np.save("results/length_bayes_sparsereg_{}".format(suffix), length_bayes)
    np.save("results/length_split_sparsereg_{}".format(suffix), length_split)
    np.save("results/length_full_sparsereg_{}".format(suffix), length_full)

    np.save("results/times_cb_sparsereg_{}".format(suffix), times_cb)
    np.save("results/times_bayes_sparsereg_{}".format(suffix), times_bayes)
    np.save("results/times_split_sparsereg_{}".format(suffix), times_split)
    np.save("results/times_full_sparsereg_{}".format(suffix), times_full)

### Running the scripts

In [None]:
makedirs('samples', exist_ok=True)
makedirs('results', exist_ok=True)

# run MCMC
run_sparse_regression_mcmc('diabetes', misspec=False)  # poorly-specified
run_sparse_regression_mcmc('diabetes', misspec=True)  # well-specified

# run Conformal Bayes
run_sparse_regression_conformal('diabetes', misspec=False)  # poorly-specified
run_sparse_regression_conformal('diabetes', misspec=True)  # well-specified

## 4.2 Sparse Classification

The authors apply the CBC framework to sparse classification with the [`sklearn` Wisconsin breast cancer dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html) (Wolberg and Mangasarian, 1990): a regression when the number of predictors (cell nuclei measurements as covariates) is large ($d=30$) and there  are few samples ($n=569$).

For reproducibility, all the predictors were normalized to have mean $0$ and standard deviation $1$ (the response variable here is binary, corresponding to malignant or benignant tumour). We will compute the Bayesian predictive set, which is the smallest set from $\{0\},\{1\},\{0,1\}$ that contains at least $(1-\alpha)$ of the posterior predictive probability.

Here, we consider the logistic likelihood $f_\theta(y=1 \mid x)=\left[1+\exp \left\{-\left(\theta^{\mathrm{T}} x+\theta_0\right)\right\}\right]^{-1}$, with the same priors for $\theta, \theta_0$ as in the **sparse regression** case.

### Code

In [None]:
"""
Summary of the Fong & Holmes (2021) implementation on Conformal Bayesian
Computation. Specifically, this module deals with the sparse classification
(and its uncertainty quantification) assessed for the:
- UCI ML Breast Cancer Wisconsin dataset (breast)

The following 4 methods are implemented:
- Bayesian inference (bayes)
- Conformal bayes (cb)
- Split conformal prediction (split)
- Full conformal prediction (full)

A brief summary of the models can be found below:
- 'bayes': This method uses the posterior samples of the model parameters to
compute the likelihood of the test point belonging to each class. It uses the
likelihoods to construct a prediction interval.
- 'split': This method fits a LASSO logistic regression model to the first
half of the training data, then computes the residuals on the second half
of the training data. It uses the residuals to define a threshold for the
prediction intervals, which are constructed using the logistic regression
model on the testing data.
- 'full': This method fits a logistic regression model to the combined
training and testing data, then computes the rank of the test point in
the combined data set. It uses the rank to define a threshold for the
prediction intervals, which are constructed using the logistic
regression model on the test point.
- 'cb': uses the posterior distribution from MCMC sampling to define
conformal prediction intervals.


The execution chain is as it follows:
- The data is loaded using `load_train_test_sparse_classification`
- The Bayesian inference (bayes) is performed through the function
`run_sparse_classification_mcmc` (which requires the MCMC computations
defined at `fit_mcmc_laplace`).
- Then, using the former results, the Conformal Bayesian method
(along the 2 other baselines) is applied using the function
`run_sparse_classification_conformal`:
  - The 'split' and 'full' conformal baselines are defined in the functions
`conformal_split` and `conformal_full`, respectively.
  - The 'cb' method is implemented in the function `compute_cb_region_IS`
  and it also uses the own `logistic_loglikelihood` clause.
- Once launched the main function `run_sparse_classification_conformal`, it
loads the training and testing data and the posterior samples of the model
parameters (computed because `run_sparse_classification_mcmc` was run first).
- Then it iterates through a specified number of repetitions, applying each
of the four methods to compute prediction intervals and coverage probabilities
for each test point.
- For each repetition, the function applies the split method, the full method,
the Bayesian method, and the conformal Bayes method to compute the prediction
intervals and coverage probabilities. It also records the computation time
for each method.
- At the end of the function, these results are saved to various files,
including the coverage probabilities, the lengths of the prediction
intervals, and the computation times for each method.

"""


# #############################################################################
# LOAD DATA (sparse classification)
# #############################################################################

def load_train_test_sparse_classification(train_frac, dataset, seed):
    # Load dataset
    if dataset == "breast":
        x, y = load_breast_cancer(return_X_y=True)
    elif dataset == "parkinsons":
        data = pd.read_csv('data/parkinsons.data')
        data[data == '?'] = np.nan
        data.dropna(axis=0, inplace=True)
        y = data['status'].values  # convert strings to integer
        x = data.drop(columns=['name', 'status']).values
    else:
        print('Invalid dataset')
        return

    n = np.shape(x)[0]
    d = np.shape(x)[1]

    # Standardize beforehand (for validity)
    x = (x - np.mean(x, axis=0)) / np.std(x, axis=0)

    # Train test split
    ind_train, ind_test = train_test_split(np.arange(n),
                                           train_size=int(train_frac * n),
                                           random_state=seed)
    x_train = x[ind_train]
    y_train = y[ind_train]
    x_test = x[ind_test]
    y_test = y[ind_test]

    y_plot = np.array([0, 1])

    return x_train, y_train, x_test, y_test, y_plot, n, d


# #############################################################################
# BAYESIAN INFERENCE (MCMC)
# #############################################################################

# Laplace prior PyMC3 model
def fit_mcmc_laplace(y, x, B, seed=100):
    with pm.Model() as _:  # as model
        p = np.shape(x)[1]
        # Laplace
        b = pm.Gamma('b', alpha=1, beta=1)
        beta = pm.Laplace('beta', mu=0, b=b, shape=p)
        intercept = pm.Flat('intercept')
        obs = pm.Bernoulli(
            'obs', logit_p=pm.math.dot(x, beta) + intercept, observed=y)
        trace = pm.sample(B, random_seed=seed, chains=4)

    beta_post = trace['beta']
    intercept_post = trace['intercept'].reshape(-1, 1)
    b_post = trace['b'].reshape(-1, 1)

    return beta_post, intercept_post, b_post


# #############################################################################
# APPLICATION OF THE BAYESIAN INFERENCE (MCMC)

# repeat 50 mcmc runs for different train test splits
def run_sparse_classification_mcmc(dataset):
    # Repeat over 50 reps
    rep = 50
    train_frac = 0.7
    B = 2000

    # Initialize
    x, y, x_test, y_test, y_plot, n, d = load_train_test_sparse_classification(
        train_frac, dataset, 100)

    beta_post = np.zeros((rep, 4 * B, d))
    intercept_post = np.zeros((rep, 4 * B, 1))
    b_post = np.zeros((rep, 4 * B, 1))
    times = np.zeros(rep)

    for j in tqdm(range(rep)):
        seed = 100 + j
        x, y, x_test, y_test, y_plot, n, d = \
            load_train_test_sparse_classification(train_frac, dataset, seed)
        start = time.time()
        beta_post[j], intercept_post[j], b_post[j] = fit_mcmc_laplace(y, x, B,
                                                                      seed)
        end = time.time()
        times[j] = (end - start)

    print("{}: {} ({})".format(dataset, np.mean(times),
                               np.std(times) / np.sqrt(rep)))

    # Save posterior samples
    np.save("samples/beta_post_sparseclass_{}".format(dataset), beta_post)
    np.save("samples/intercept_post_sparseclass_{}".format(dataset),
            intercept_post)
    np.save("samples/b_post_sparseclass_{}".format(dataset), b_post)
    np.save("samples/times_sparseclass_{}".format(dataset), times)


# #############################################################################
# CONFORMAL PREDICTION
# #############################################################################

# Split method baseline
def conformal_split(alpha, y, x, x_test, seed=100):
    n = np.shape(y)[0]
    #  n_test = np.shape(x_test)[0]
    # Fit lasso to training set
    n_train = int(n / 2)
    ls = LogisticRegressionCV(penalty='l1', solver='liblinear', cv=5,
                              random_state=seed)
    ls.fit(x[0:n_train], y[0:n_train])
    resid = ls.predict_proba(x[n_train:])[:, 1]
    resid[y[n_train:] == 0] = 1 - resid[y[n_train:] == 0]
    resid = -np.log(
        np.clip(resid, 1e-6, 1 - 1e-6))  # clip for numerical stability
    k = int(np.ceil((n / 2 + 1) * (1 - alpha)))
    d = np.sort(resid)[k - 1]

    logp_test = -np.log(np.clip(ls.predict_proba(x_test), 1e-6, 1 - 1e-6))
    region_split = logp_test <= d

    return region_split


# Full method baseline
def conformal_full(alpha, y, x, x_test, C, seed=100):
    n = np.shape(y)[0]
    rank_cp = np.zeros(2)
    for y_new in (0, 1):
        x_aug = np.concatenate((x, x_test), axis=0)
        y_aug = np.append(y, y_new)
        ls = LogisticRegression(penalty='l1', solver='liblinear', C=C,
                                random_state=seed)
        ls.fit(x_aug, y_aug)
        resid = ls.predict_proba(x_aug)[:, 1]
        resid[y_aug == 0] = 1 - resid[y_aug == 0]
        resid = -np.log(resid)
        rank_cp[y_new] = np.sum(resid >= resid[-1]) / (n + 1)
    region_full = rank_cp > alpha
    return region_full


# #############################################################################
# CONFORMAL BAYESIAN
# #############################################################################

# CONFORMAL FROM MCMC SAMPLES (JAX IMPLEMENTATION)

# compute rank (un-normalized by n+1)

@jit
def compute_rank_IS(logp_samp_n, logwjk):
    # n = jnp.shape(logp_samp_n)[1]  # logp_samp_n is B x n
    # n_plot = jnp.shape(logwjk)[0]
    # rank_cp = jnp.zeros(n_plot)

    # compute importance sampling weights and normalizing
    wjk = jnp.exp(logwjk)
    Zjk = jnp.sum(wjk, axis=1).reshape(-1, 1)

    # compute predictives for y_i,x_i and y_new,x_n+1
    p_cp = jnp.dot(wjk / Zjk, jnp.exp(logp_samp_n))
    p_new = jnp.sum(wjk ** 2, axis=1).reshape(-1, 1) / Zjk

    # compute non-conformity score and sort
    pred_tot = jnp.concatenate((p_cp, p_new), axis=1)
    rank_cp = np.sum(pred_tot <= pred_tot[:, -1].reshape(-1, 1), axis=1)
    return rank_cp


# compute region of grid which is in confidence set
@jit
def compute_cb_region_IS(alpha, logp_samp_n,
                         logwjk):  # assumes they are connected
    n = jnp.shape(logp_samp_n)[1]  # logp_samp_n is B x n
    rank_cp = compute_rank_IS(logp_samp_n, logwjk)
    region_true = rank_cp > alpha * (n + 1)
    return region_true


# #############################################################################
# MAIN PUBLIC FUNCTION (application of CONFORMAL BAYESIAN and the 3 baselines)
# #############################################################################

def run_sparse_classification_conformal(dataset):
    # Compute intervals
    # Load posterior samples
    beta_post = jnp.load(
        "samples/beta_post_sparseclass_{}.npy".format(dataset))
    intercept_post = jnp.load(
        "samples/intercept_post_sparseclass_{}.npy".format(dataset))

    # Initialize
    train_frac = 0.7
    x, y, x_test, y_test, y_plot, n, d = load_train_test_sparse_classification(
        train_frac, dataset, 100)

    alpha = 0.2
    rep = np.shape(beta_post)[0]
    n_test = np.shape(x_test)[0]

    coverage_cb = np.zeros((rep, n_test))
    coverage_bayes = np.zeros((rep, n_test))
    coverage_split = np.zeros((rep, n_test))
    coverage_full = np.zeros((rep, n_test))

    length_cb = np.zeros((rep, n_test))
    length_bayes = np.zeros((rep, n_test))
    length_split = np.zeros((rep, n_test))
    length_full = np.zeros((rep, n_test))

    p_bayes = np.zeros((rep, n_test))
    region_bayes = np.zeros((rep, n_test, 2))
    region_cb = np.zeros((rep, n_test, 2))
    region_split = np.zeros((rep, n_test, 2))
    region_full = np.zeros((rep, n_test, 2))

    times_bayes = np.zeros(rep)
    times_cb = np.zeros(rep)
    times_split = np.zeros(rep)
    times_full = np.zeros(rep)

    for j in tqdm(range(rep)):
        seed = 100 + j

        # load data
        x, y, x_test, y_test, y_plot, n, d = \
            load_train_test_sparse_classification(train_frac, dataset, seed)

        # Split conformal method
        start = time.time()
        region_split[j] = conformal_split(alpha, y, x, x_test, seed)
        for i in (range(n_test)):
            coverage_split[j, i] = region_split[
                j, i, np.argmin(np.abs(y_test[i] - y_plot))]
            length_split[j, i] = np.sum(region_split[j, i])
        end = time.time()
        times_split[j] = end - start

        # Full conformal method
        start = time.time()
        C = 1.
        for i in (range(n_test)):
            region_full[j, i] = conformal_full(alpha, y, x, x_test[i:i + 1], C,
                                               seed)
            coverage_full[j, i] = region_full[
                j, i, np.argmin(np.abs(y_test[i] - y_plot))]
            length_full[j, i] = np.sum(region_full[j, i])
        end = time.time()
        times_full[j] = end - start

        # ###########################
        @jit
        def logistic_loglikelihood(_y, _x):
            eta = (jnp.dot(beta_post[j], _x.transpose()) + intercept_post[j])
            B = np.shape(eta)[0]
            _n = np.shape(eta)[1]
            eta = eta.reshape(B, _n, 1)
            temp0 = np.zeros((B, _n, 1))
            logp = -jsp.special.logsumexp(
                jnp.concatenate((temp0, -eta), axis=2),
                axis=2)  # numerically stable
            log1p = -jsp.special.logsumexp(
                jnp.concatenate((temp0, eta), axis=2), axis=2)
            return _y * logp + (1 - _y) * log1p  # compute likelihood samples

        # ###########################

        # Bayes
        start = time.time()
        for i in (range(n_test)):
            p_bayes[j, i] = jnp.mean(
                jnp.exp(logistic_loglikelihood(1, x_test[i:i + 1])))
            # Compute region from p_bayes
            if p_bayes[j, i] > (1 - alpha):  # only y = 1
                region_bayes[j, i] = np.array([0, 1])
            elif (1 - p_bayes[j, i]) > (1 - alpha):  # only y = 0
                region_bayes[j, i] = np.array([1, 0])
            else:
                region_bayes[j, i] = np.array([1, 1])
            coverage_bayes[j, i] = region_bayes[
                j, i, np.argmin(np.abs(y_test[i] - y_plot))]
            length_bayes[j, i] = np.sum(region_bayes[j, i])
        end = time.time()
        times_bayes[j] = end - start

        # Conformal Bayes
        start = time.time()
        logp_samp_n = logistic_loglikelihood(y, x)
        logwjk = logistic_loglikelihood(y_plot.reshape(-1, 1, 1), x_test)
        for i in (range(n_test)):
            region_cb[j, i] = compute_cb_region_IS(
                alpha, logp_samp_n, logwjk[:, :, i])
            coverage_cb[j, i] = region_cb[
                j, i, np.argmin(np.abs(y_test[i] - y_plot))]
            length_cb[j, i] = np.sum(region_cb[j, i])
        end = time.time()
        times_cb[j] = end - start

    # Save regions (need to update)
    np.save("results/p_bayes_sparseclass_{}".format(dataset), p_bayes)
    np.save("results/region_bayes_sparseclass_{}".format(dataset),
            region_bayes)
    np.save("results/region_cb_sparseclass_{}".format(dataset), region_cb)
    np.save("results/region_split_sparseclass_{}".format(dataset),
            region_split)
    np.save("results/region_full_sparseclass_{}".format(dataset), region_full)

    np.save("results/coverage_bayes_sparseclass_{}".format(dataset),
            coverage_bayes)
    np.save("results/coverage_cb_sparseclass_{}".format(dataset), coverage_cb)
    np.save("results/coverage_split_sparseclass_{}".format(dataset),
            coverage_split)
    np.save("results/coverage_full_sparseclass_{}".format(dataset),
            coverage_full)

    np.save("results/length_bayes_sparseclass_{}".format(dataset),
            length_bayes)
    np.save("results/length_cb_sparseclass_{}".format(dataset), length_cb)
    np.save("results/length_split_sparseclass_{}".format(dataset),
            length_split)
    np.save("results/length_full_sparseclass_{}".format(dataset), length_full)

    np.save("results/times_bayes_sparseclass_{}".format(dataset), times_bayes)
    np.save("results/times_cb_sparseclass_{}".format(dataset), times_cb)
    np.save("results/times_split_sparseclass_{}".format(dataset), times_split)
    np.save("results/times_full_sparseclass_{}".format(dataset), times_full)

### Runnning the scripts

In [None]:
os.makedirs('samples', exist_ok=True)
os.makedirs('results', exist_ok=True)
run_sparse_classification_mcmc('breast')
run_sparse_classification_conformal('breast')

# Results

Firstly, the results of the former scripts must be placed at the `results` folder. If the Google Collab was not run (due to `conda` issues or time limits), then the files which were obtained locally can be fetched from the [results.zip](https://github.com/gcastro-98/conformal-bayesian/blob/main/results.zip).

Below the main scripts to retrieve the results are presented:

In [None]:
EXAMPLES = [
    'sparsereg_diabetes', 'sparsereg_diabetes_misspec', 'sparseclass_breast'
]
METHODS = ['bayes', 'cb', 'split', 'full']


# def report_mcmc_times() -> None:
#     """
#     Reports how much time was elapsed in the
#     sampling for each posterior distribution.
#     """
#     for example in EXAMPLES:
#         suffix = example
#         times = np.load("samples/times_{}.npy".format(suffix))
#         rep = np.shape(times)[0]
#         print("{} MCMC time: {:.3f} ({:.3f})".format(
#             suffix, np.mean(times), np.std(times)/np.sqrt(rep)))
#     print()


# ############################################################################
# MAIN PUBLIC FUNCTION
# ############################################################################

def report_results(regression: bool) -> None:
    if regression:
        _EXAMPLES = EXAMPLES[:2]
    else:
        _EXAMPLES = [EXAMPLES[2]]

    for example in _EXAMPLES:
        print('EXAMPLE: {}'.format(example))
        for method in METHODS:
            suffix = method + '_' + example

            # Coverage (take mean over test values)

            coverage = np.mean(np.load(
                "results/coverage_{}.npy".format(suffix)), axis=1)
            rep = np.shape(coverage)[0]
            mean = np.mean(coverage)
            se = np.std(coverage)/np.sqrt(rep)
            print("{} coverage is {:.3f} ({:.3f})".format(
                method, mean, se))

            # Return exact coverage if cb
            if method == 'cb' and regression:
                suffix_ex = method + '_exact_' + example
                coverage = np.mean(np.load("results/coverage_{}.npy".format(
                    suffix_ex)), axis=1)  # take mean over test values
                rep = np.shape(coverage)[0]
                mean = np.mean(coverage)
                se = np.std(coverage)/np.sqrt(rep)
                print("{} exact coverage is {:.3f} ({:.3f})".format(
                    method, mean, se))
        print()

        for method in METHODS:
            suffix = method + '_' + example
            # Length
            length = np.mean(np.load(
                "results/length_{}.npy".format(suffix)), axis=1)
            rep = np.shape(length)[0]
            mean = np.mean(length)
            se = np.std(length)/np.sqrt(rep)
            print("{} length is {:.2f} ({:.2f})".format(method, mean, se))
        print()

        for method in METHODS:
            suffix = method + '_' + example
            # Times
            times = np.load("results/times_{}.npy".format(suffix))
            rep = np.shape(times)[0]
            mean = np.mean(times)
            se = np.std(times)/np.sqrt(rep)
            print("{} times is {:.3f} ({:.3f})".format(method, mean, se))
        print()


# ############################################################################
# Sparse classification (only) routines
# ############################################################################

def report_missclassification_rates() -> None:
    example = 'sparseclass_breast'
    for method in ['bayes', 'cb']:
        suffix = method + '_' + example
        coverage = np.load("results/coverage_{}.npy".format(suffix))
        length = np.load("results/length_{}.npy".format(suffix))
        rep = np.shape(coverage)[0]
        n_tot = np.sum(length == 1, axis=1)
        n_misclass = np.sum(
            np.logical_and(length == 1, coverage == 0), axis=1)
        misclass_rate = n_misclass/n_tot
        both_rate = np.mean(length == 2, axis=1)
        empty_rate = np.mean(length == 0, axis=1)

        print('{} misclassification rate is {:.3f} ({:.3f})'.format(
            method, np.mean(misclass_rate),
            np.std(misclass_rate)/np.sqrt(rep)))
        print('{} both rate is {:.3f} ({:.3f})'.format(
            method, np.mean(both_rate),
            np.std(both_rate)/np.sqrt(rep)))
        print('{} empty rate is {:.3f} ({:.3f})'.format(
            method, np.mean(empty_rate),
            np.std(empty_rate)/np.sqrt(rep)))

## 4.1 Sparse regression

As baselines, we compare to the split and full conformal method using the non-Bayesian lasso as the predictor, with the usual residual as the  nonconformity score. For the split method, we fit lasso with cross-validation on the subset of size $n_{\mathrm{train}}/2$ to obtain the lasso penalty $\lambda$. For the full conformal method, we use the grid method for fair timing, as other estimators beyond lasso would not have the shortcut of Lei (2019). As setting a default $\lambda = 1$ gives poor average lengths, we estimate $λ = 0.004$ in cross-validation on one of the training sets, and use this value over the 50 repeats.

In [None]:
report_results(regression=True)

EXAMPLE: sparsereg_diabetes
bayes coverage is 0.806 (0.005)
cb coverage is 0.808 (0.006)
cb exact coverage is 0.810 (0.005)
split coverage is 0.809 (0.006)
full coverage is 0.808 (0.006)

bayes length is 1.84 (0.01)
cb length is 1.87 (0.01)
split length is 1.91 (0.02)
full length is 1.86 (0.01)

bayes times is 0.488 (0.107)
cb times is 0.702 (0.019)
split times is 0.065 (0.001)
full times is 11.529 (0.232)

EXAMPLE: sparsereg_diabetes_misspec
bayes coverage is 0.563 (0.006)
cb coverage is 0.809 (0.006)
cb exact coverage is 0.810 (0.006)
split coverage is 0.809 (0.006)
full coverage is 0.808 (0.006)

bayes length is 1.14 (0.00)
cb length is 1.87 (0.01)
split length is 1.91 (0.02)
full length is 1.86 (0.01)

bayes times is 0.373 (0.002)
cb times is 0.668 (0.003)
split times is 0.066 (0.001)
full times is 11.524 (0.240)



The average coverage, length and run-times with standard errors are given above for $\alpha = 0.2$ ($80\%$ of confidence). Note that:
- For $c=1$ (well-specified), the Bayesian intervals have coverage close to $1 − \alpha$ with the smallest expected length, with CB slightly wider and more conservative.
- For $c=0.02$ (when the prior is misspecified), the Bayes intervals severely undercover, whilst the CB coverage and length remain unchanged from the c = 1 case.
- The split method has wider intervals than CB/full, but performs well given the low computational costs (of course in the split/full methods, the values of $c$ do not change anything).
- The full conformal method performs as well as CB, but is comparable in time as
MCMC + CB, whilst not refitting $\lambda$.

## 4.2. Sparse classification

The conformal baselines are as above but with $L_1$-penalized logistic regression and for the full conformal method we have $\lambda = 1$. We again have 50 repeats with 70-30 train-test split, and set $\alpha = 0.2$.

In [None]:
report_results(regression=False)

EXAMPLE: sparseclass_breast
bayes coverage is 0.990 (0.001)
cb coverage is 0.812 (0.005)
split coverage is 0.809 (0.006)
full coverage is 0.811 (0.005)

bayes length is 1.06 (0.00)
cb length is 0.81 (0.00)
split length is 0.81 (0.01)
full length is 0.81 (0.00)

bayes times is 0.364 (0.007)
cb times is 0.665 (0.012)
split times is 0.079 (0.002)
full times is 1.008 (0.016)



The results are provided above. It is easy to note that Bayes over-covers substantially, even with reasonable priors. However, CB corrects it spending little more time on it.

Moreover, the misclassification rates for the Bayes and the CB method are presented below, being the latter the lowest one.

In [None]:
report_missclassification_rates()

bayes misclassification rate is 0.011 (0.001)
bayes both rate is 0.059 (0.002)
bayes empty rate is 0.000 (0.000)
cb misclassification rate is 0.002 (0.000)
cb both rate is 0.000 (0.000)
cb empty rate is 0.186 (0.005)


# Conclusions

In conclusion, the Conformal Bayesian (CB) method offers significant advantages over traditional Bayesian inference approaches. One notable advantage is its superior coverage performance. The CB method constructs prediction regions that have a specified coverage probability, providing valid measures of confidence for individual predictions. This ensures that the true value falls within the predicted region with a high degree of accuracy, resulting in reliable uncertainty estimates.

Furthermore, the CB method demonstrates robustness in the presence of misspecified priors. In Bayesian inference, the choice of prior distributions can have a substantial impact on the posterior estimates. If the priors do not accurately represent the true underlying distribution, the posterior estimates may be biased or unreliable. In contrast, the CB method's construction of prediction regions is not as heavily dependent on the choice of priors. It focuses on achieving the desired coverage probability based on the observed data, making it more robust to misspecification of the prior distribution.

Additionally, the CB method benefits from the computational efficiency provided by the AOI (Add-One-In) sampling procedure. This procedure efficiently generates samples from the posterior distribution, making the construction of prediction regions computationally inexpensive. This aspect is particularly advantageous when dealing with large datasets or complex models, as it allows for scalable implementation of the CB method.

In summary, the Conformal Bayesian method outperforms traditional Bayesian approaches in terms of coverage accuracy, robustness to misspecified priors, and computational efficiency. Its ability to provide calibrated uncertainty estimates for individual predictions makes it a valuable tool in decision-making scenarios. The combination of conformal prediction principles with Bayesian inference, along with the utilization of the AOI sampling procedure, offers a powerful framework for reliable uncertainty quantification. Therefore, the CB method should be considered as a preferred approach in various machine learning and statistical applications.