# Conformal prediction

Conformal predictors are predictive models that associate each of their predictions with a measure of statistically valid confidence. Given a test object $x_i$ and a user-specified significance level $\epsilon \in (0, 1)$, a conformal predictor outputs a prediction region $\Gamma_i^{\epsilon} \subseteq Y$ that contains the true output value $y_i \in Y$ with probability $1-\epsilon$.


# C3PO Usage:

### Example 1: Simple ICP (classification)
In this example, we construct a simple inductive conformal predictor for classification, using a support vector classifier as the underlying model.

In [2]:
from sklearn.datasets import load_iris
import numpy as np
from sklearn.svm import SVC
from c3pO.icp import ICPClassifier
from c3pO.base import ModelWrapper
from c3pO.nc.classifier import MarginErrFunc, ClassifierNC
    
iris = load_iris()
idx = np.random.permutation(iris.target.size)

# Divide the data into proper training set, calibration set and test set
idx_train, idx_cal, idx_test = idx[:50], idx[50:100], idx[100:]

# build a svc model
model = SVC(probability=True)
# fit the model
model.fit(iris.data[idx_train, :], iris.target[idx_train])

pretrained_model = ModelWrapper(model)	# Wrap the model to make it framework agnostic
nc = ClassifierNC(pretrained_model, err_func=MarginErrFunc())	# Create a default nonconformity function
icp = ICPClassifier(nc)	 # Create an inductive conformal classifier


# Calibrate the ICP using the calibration set
icp.calibrate(iris.data[idx_cal, :], iris.target[idx_cal])

# Produce predictions for the test set, with confidence 95%
prediction = icp.predict(iris.data[idx_test, :], significance=0.05)

# Print the first 5 predictions
print(prediction[:5, :])

[[ True False False]
 [ True False False]
 [False  True False]
 [False  True  True]
 [False False  True]]


The result is a boolean numpy.array with shape (n_test, n_classes), where each row is a boolean vector denoting the class labels included in the prediction region at the specified significance level.

For this particular example, we might obtain, for a given test object, a boolean vector [ True True False ], meaning that the $1-\epsilon$ confidence prediction region contains class labels 0 and 1 (i.e., with 95% probability, one of these two classes will be correct).

### Example 2: Simple ICP (regression)

In this example, we construct a simple inductive conformal predictor for regression, this time using a random forest regression model as the underlying model.

In [11]:
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
import numpy as np
from c3pO.icp import ICPRegressor
from c3pO.base import ModelWrapper
from c3pO.nc.regressor import AbsErrorErrFunc, RegressorNC

boston = load_boston()
idx = np.random.permutation(boston.target.size)

# Divide the data into proper training set, calibration set and test set
idx_train, idx_cal, idx_test = idx[:300], idx[300:399], idx[399:]

model = RandomForestRegressor()	# Create the underlying model
model.fit(boston.data[idx_train, :], boston.target[idx_train])

# create pretrained  model
pretrained_model = ModelWrapper(model)
nc = RegressorNC(model=pretrained_model, err_func=AbsErrorErrFunc())# Create a default nonconformity function
icp = ICPRegressor(nc)			# Create an inductive conformal regressor

# Calibrate the ICP using the calibration set
icp.calibrate(boston.data[idx_cal, :], boston.target[idx_cal])

# Produce predictions for the test set, with confidence 95%
prediction = icp.predict(boston.data[idx_test, :], significance=0.1)

# Print the first 5 predictions
print(prediction[:5, :])

[[14.981 23.841]
 [ 9.299 18.159]
 [ 4.027 12.887]
 [21.733 30.593]
 [11.917 20.777]]


This time the result is a numerical numpy.array with shape (n_test, 2), where each row is a vector signifying the lower and upper bounds of an interval, denoting the prediction region at the specified significance level.

For this particular example, we might obtain, for a given test object, a numerical vector [ 8.8  21.6 ], meaning that the $1-\epsilon$ confidence prediction region is the interval $[8.8, 21.6]$ (i.e., with 95% probability, the correct output value lies somehwere on this interval).

### Example 3: Normalization
Normalized nonconformity functions, i.e., nonconformity functions that leverage an additional underlying model that attempts to predict the difficulty of predicting the output of a given test pattern. This is typically used in the context of regression (in order to obtain prediction intervals whose sizes vary depending on the estimated difficulty of the test pattern), supports normalized nonconformity functions for classification are also supported.

Note that the normalization model should always be a scikit learning regression model (also for classification problems).

#### Regression

In [12]:
from sklearn.datasets import load_boston
import numpy as np
from c3pO.icp import ICPRegressor
from c3pO.base import ModelWrapper, Normalizer
from c3pO.nc.regressor import AbsErrorErrFunc, RegressorNC
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor


boston = load_boston()
idx = np.random.permutation(boston.target.size)

# Divide the data into proper training set, calibration set and test set
idx_train, idx_cal, idx_test = idx[:300], idx[300:399], idx[399:]

model = RandomForestRegressor()	# Create the underlying model
model.fit(boston.data[idx_train, :], boston.target[idx_train])


normalizer_model = Normalizer(model,err_func=AbsErrorErrFunc(), normalizer_model=KNeighborsRegressor(n_neighbors=11))
normalizer_model.fit(boston.data[idx_train, :], boston.target[idx_train])
# create pretrained  model
pretrained_model = ModelWrapper(model)
nc = RegressorNC(model=pretrained_model, err_func=AbsErrorErrFunc(), normalizer=normalizer_model)# Create a default nonconformity function
icp = ICPRegressor(nc)			# Create an inductive conformal regressor

# Calibrate the ICP using the calibration set
icp.calibrate(boston.data[idx_cal, :], boston.target[idx_cal])

# Produce predictions for the test set, with confidence 95%
prediction = icp.predict(boston.data[idx_test, :], significance=0.05)

# Print the first 5 predictions
print(prediction[:5, :])

[[ 6.6939295  33.0820705 ]
 [10.79869199 35.86930801]
 [10.8332121  27.4487879 ]
 [15.36374998 33.13625002]
 [11.51694833 26.64905167]]


### Example 4:  Credibility and confidence prediction

In [14]:
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from c3pO.icp import ICPClassifier
from c3pO.base import ModelWrapper
from c3pO.nc.classifier import MarginErrFunc, ClassifierNC
    
iris = load_iris()
idx = np.random.permutation(iris.target.size)

# Divide the data into proper training set, calibration set and test set
idx_train, idx_cal, idx_test = idx[:50], idx[50:100], idx[100:]

# build a svc model
model = SVC(probability=True)
# fit the model
model.fit(iris.data[idx_train, :], iris.target[idx_train])

pretrained_model = ModelWrapper(model)	# Wrap the model to make it framework agnostic
nc = ClassifierNC(pretrained_model, err_func=MarginErrFunc())	# Create a default nonconformity function
icp = ICPClassifier(nc)	 # Create an inductive conformal classifier


# Calibrate the ICP using the calibration set
icp.calibrate(iris.data[idx_cal, :], iris.target[idx_cal])

# Produce predictions for the test set, with confidence 95%
#prediction = icp.predict(iris.data[idx_test, :], significance=0.05)
pd.DataFrame(icp.predict_conf(iris.data[idx_test, :]), columns=["Label", "Confidence", "Credibility"])

Unnamed: 0,Label,Confidence,Credibility
0,2.0,0.991113,0.444097
1,2.0,0.987369,0.373574
2,2.0,0.987041,0.066901
3,2.0,0.984201,0.231499
4,1.0,0.988913,0.341441
5,2.0,0.996536,0.930356
6,2.0,0.983256,0.386589
7,0.0,0.99474,0.630185
8,0.0,0.985235,0.420036
9,1.0,0.980572,0.172459
