# Ensemble methods. Stacking


Stacking method use several classifiers and use them to generate an input feature matrix for a stacked classifier that do the final prediction.

Let's load the data set first.

In [1]:
%store -r data_set
%store -r labels
%store -r test_data_set
%store -r test_labels
%store -r unique_labels

To simplify the notebook, we use methods that are available in scikit-learn package. We load only several, but in the exercise you gonna need to load other methods as well.

In [2]:
import numpy as np
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

In the first part we build three different models based on three different classifiers:

In [3]:
def build_classifiers():
    
    neighbors = KNeighborsClassifier()
    neighbors.fit(data_set, labels)

    linear_regression = LinearRegression()
    linear_regression.fit(data_set, labels)

    qda = QuadraticDiscriminantAnalysis()
    qda.fit(data_set, labels)

    return neighbors, linear_regression, qda

Based on the classifiers prediction, we build a feature vector for the decision tree classifier. Finally, we train and predict with the stacked classifier.

In [4]:
def build_stacked_classifier(classifiers):
    output = []
    for classifier in classifiers:
        output.append(classifier.predict(data_set))
    decision_tree = DecisionTreeClassifier()
    output = np.array(output).reshape((130,3))
    
    # stacked classifier part:
    decision_tree.fit(output.reshape((130,3)), labels.reshape((130,)))
    test_set = []
    for classifier in classifiers:
        test_set.append(classifier.predict(test_data_set))
    test_set = np.array(test_set).reshape((len(test_set[0]),3))
    predicted = decision_tree.predict(test_set)
    return predicted

Stacked classifier accuracy can be measured as below:

In [5]:
classifiers = build_classifiers()
predicted = build_stacked_classifier(classifiers)
accuracy = accuracy_score(test_labels, predicted)
print(accuracy)

0.7


In this case, the three used classifiers does not give any value, because we get a higher value using just the decision treee classifier.

In [6]:
decision_tree = DecisionTreeClassifier()
decision_tree.fit(data_set, labels)
predicted = decision_tree.predict(test_data_set)
accuracy = accuracy_score(test_labels, predicted)
print(accuracy)

0.95
