# Comparison with baseline methods

The library also contains the implementation of static ensemble selection techniques as well as baseline methods.

In this version, we provide the main algorithms that are used as baseline to compare the performance of dynamic selection techniques:

- Oracle: 
- Single best
- Static selection

In this example, we compare the performance of the baseline methods with a DS technique

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.calibration import CalibratedClassifierCV
from sklearn.ensemble import BaggingClassifier

# Example of DES techniques
from deslib.des.knora_e import KNORAE
# Baseline methods:

from deslib.static.oracle import Oracle
from deslib.static.single_best import SingleBest
from deslib.static.static_selection import StaticSelection

## Loading a classification dataset and preparing the data

In [2]:
data = load_breast_cancer()
X = data.data
y = data.target
# split the data into training and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

# Scale the variables to have 0 mean and unit variance
scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.transform(X_test)

# Split the data into training and DSEL for DS techniques
X_train, X_dsel, y_train, y_dsel = train_test_split(X_train, y_train, test_size=0.5)

## Training a pool of classifiers


Here we train a pool containing 50 perceptron classifiers. The CalibratedClassifierCV class from scikit-learn is used in order to obtain probabilistic outputs.

In [3]:
# Calibrating Perceptrons to estimate probabilities
model = CalibratedClassifierCV(Perceptron(max_iter=10))

# Train a pool of 10 classifiers
classifiers_pool = BaggingClassifier(model, n_estimators=50)
classifiers_pool.fit(X_train, y_train)

BaggingClassifier(base_estimator=CalibratedClassifierCV(base_estimator=Perceptron(alpha=0.0001, class_weight=None, eta0=1.0, fit_intercept=True,
      max_iter=10, n_iter=None, n_jobs=1, penalty=None, random_state=0,
      shuffle=True, tol=None, verbose=0, warm_start=False),
            cv=3, method='sigmoid'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=50, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

## Initializing methods

The static methods are based on the same interface than the dynamic selection techniques: fit, predict, predict_proba and score. Making the comparisons between methods extremely easy!

In [4]:
knorae = KNORAE(classifiers_pool)
sb = SingleBest(classifiers_pool)
ss = StaticSelection(classifiers_pool)
oracle = Oracle(classifiers_pool)

knorae.fit(X_dsel, y_dsel)
sb.fit(X_dsel, y_dsel)
ss.fit(X_dsel, y_dsel)

## Getting the classificatiion performance of the techniques

In [5]:
print('Classification perforance KNORA-E: ', knorae.score(X_test, y_test))
print('Classification perforance Single Best: ', sb.score(X_test, y_test))
print('Classification perforance Static Selection: ', ss.score(X_test, y_test))
print('Classification perforance Oracle: ', oracle.score(X_test, y_test))

Classification perforance KNORA-E:  0.9574468085106383
Classification perforance Single Best:  0.9627659574468085
Classification perforance Static Selection:  0.9680851063829787
Classification perforance Oracle:  0.9893617021276596


Based on the Oracle performance, we can see that there is a lot of room for improvements in DS techniques in order to achieve its upper limit performance. 

It is important to mention that the Oracle is an ideal model, which takes the information of the query labels in order to check whether there is a base classifier in the pool that predicts the correct label. This model is just used to know the upper limit performance we can achieve using a given pool of classifiers, cannot be used as a classification technique.