# Example using an heterogeneous pool

The library also support a heterogenous pool of classifiers. A pool is called heterogeneous when different classifier models are used to generate a diverse pool of classifiers (e.g., svm, decision tree, naive bayes...)

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Example of DCS techniques
from deslib.dcs import OLA
from deslib.dcs import APriori
from deslib.dcs import MCB
# Example of DES techniques
from deslib.des import KNORAE
from deslib.des import DESP
from deslib.des import KNORAU

## Loading a classification dataset and preparing the data

In [2]:
data = load_breast_cancer()
X = data.data
y = data.target

# split the data into training and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

# Split the data into training and DSEL for DS techniques
X_train, X_dsel, y_train, y_dsel = train_test_split(X_train, y_train, test_size=0.5)

## Training an heterogeneous pool of classifiers

In this example we train a pool composed of:
1. Perceptron classifier
2. Linear SVM
3. Gaussian SVM
4. Naive Bayes
5. Decision tree
6. KNN for k=1

In [3]:
from sklearn.linear_model import Perceptron
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.ensemble import VotingClassifier

model_perceptron = CalibratedClassifierCV(Perceptron(max_iter=100)).fit(X_train, y_train)
model_linear_svm = CalibratedClassifierCV(LinearSVC()).fit(X_train, y_train)
model_svc = SVC(probability=True).fit(X_train, y_train)
model_bayes = GaussianNB().fit(X_train, y_train)
model_tree = DecisionTreeClassifier().fit(X_train, y_train)
model_knn = KNeighborsClassifier(n_neighbors=1).fit(X_train, y_train)
pool_classifiers = [model_perceptron, model_linear_svm,
                    model_svc, model_bayes,  model_tree, model_knn]
voting_classifiers = [("perceptron", model_perceptron), ("linear_svm", model_linear_svm),
                      ("svc", model_svc), ("bayes", model_bayes),  ("tree", model_tree), ("knn", model_knn)]
model_voting = VotingClassifier(estimators=voting_classifiers).fit(X_train, y_train)

## Initializing DS techniques

Here we initialize the DS techniques. Three DCS and three DES techniques are considered in this example. In this example, we specify the size of the region of competence (k = 5)

In [4]:
# DES techniques
knorau = KNORAU(pool_classifiers, k=5)
kne = KNORAE(pool_classifiers, k=5)
desp = DESP(pool_classifiers, k=5)
# DCS techniques
ola = OLA(pool_classifiers, k=5)
mcb = MCB(pool_classifiers, k=5)
apriori = APriori(pool_classifiers, k=5)

## Fitting DS techniques

The function fit(data, target) is used to fit each dynamic selection method. The fit function prepares the algorithm that estimates the region of competence (e.g., K-NN algorithm) and pre-process information required to apply the DS techniques.

In [5]:
knorau.fit(X_dsel, y_dsel)
kne.fit(X_dsel, y_dsel)
desp.fit(X_dsel, y_dsel)
ola.fit(X_dsel, y_dsel)
mcb.fit(X_dsel, y_dsel)
apriori.fit(X_dsel, y_dsel)

## Calculate classification accuracy of each technique

In this case, the first result is the classification accuracy of the random forest classifier, which combines the outputs of each base decision tree using the majority voting scheme. 

Using DS techniques, instead of combining all decision trees, only the ones that are more competent locally are used for classification. In the case of DCS techniques, the decision tree that is most competent locally is used for prediction. In the case of DES techniques, an ensemble containing the most competent decision trees are selected to predict the label of a given query sample.

In [6]:
print('Classification accuracy of Majority voting the pool: ', model_voting.score(X_test, y_test))
print('Classification accuracy of KNORA-Union: ', knorau.score(X_test, y_test))
print('Classification accuracy of KNORA-Eliminate: ', kne.score(X_test, y_test))
print('Classification accuracy of DESP: ', desp.score(X_test, y_test))
print('Classification accuracy of OLA: ', ola.score(X_test, y_test))
print('Classification accuracy of A priori: ', apriori.score(X_test, y_test))

Classification accuracy of Majority voting the pool:  0.925531914893617
Classification accuracy of KNORA-Union:  0.9468085106382979
Classification accuracy of KNORA-Eliminate:  0.9574468085106383
Classification accuracy of DESP:  0.9468085106382979
Classification accuracy of OLA:  0.9521276595744681
Classification accuracy of A priori:  0.9574468085106383
