# Stacking

Stacking is a way to combine the outcome of multiple ML models. Stacking is an ensemble which uses a meta-learning that learns the best combination for the prediction from multiple ML models. Other ensemble techniques, bagging and boosting, used homogeneous weak learners but stacking often makes use of heterogeneous weak learners.

This techniques make use of meta learner which tunes based on the prediction from different heterogeneous dataset. A metal-learner takes input a prediction value from the various model and learns to approximate final prediction. The prediction value from machine learnings are the feature input for the meta-learner. This final layer of meta-learner is stacked on top of other machine learning models hence, the name **Stacking**.

In [None]:
import pandas as pd

# Import classifiers from sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier

# Import datasets
from sklearn.datasets import load_breast_cancer

# Import accuracy metrics
from sklearn.metrics import accuracy_score

# Import stacking ensemble
from sklearn.ensemble import StackingClassifier

# Import training and test data splitter
from sklearn.model_selection import train_test_split, cross_val_score

# Supress warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load datasets
breast_cancer_ds = load_breast_cancer()

# Transform X dataframe
X = pd.DataFrame(breast_cancer_ds['data'], columns = breast_cancer_ds['feature_names'])

# Build a target dataframe
y = pd.DataFrame(breast_cancer_ds['target'], columns = ['cancer_type'])

# Check the shape of data
X.shape, y.shape

((569, 30), (569, 1))

In [None]:
# Perform Train test split of the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

# Create Decision Tree Classifier
dtree_clf = DecisionTreeClassifier()

# Create Logistic Regression Classifier
logistic_clf = LogisticRegression()

# Create Support Vector Classifier
support_vector_clf = SVC()

# Create Gaussian Naive Bayes Classifier
gaussian_nb_clf = GaussianNB()

# Create K Nearest Neighbors Classifier
knn_clf = KNeighborsClassifier()

# Create level 2 classifier for Stack
level1_clf = LogisticRegression()


level0 = list()
level0.append(('dtree', DecisionTreeClassifier()))
level0.append(('lr', LogisticRegression()))
level0.append(('knn', KNeighborsClassifier()))
level0.append(('svc', SVC()))
level0.append(('nb', GaussianNB()))

# Create Stacking Classifier
stack_clf = StackingClassifier(
    estimators = level0,
    final_estimator = level1_clf,
    cv = 10
)

In [None]:
# Fit the classifiers
dtree_clf.fit(X_train, y_train)
logistic_clf.fit(X_train, y_train)
support_vector_clf.fit(X_train, y_train)
gaussian_nb_clf.fit(X_train, y_train)
knn_clf.fit(X_train, y_train)

# Predict on test samples using classifiers
y_pred_dtree = dtree_clf.predict(X_test)
y_pred_logistic = logistic_clf.predict(X_test)
y_pred_svc = support_vector_clf.predict(X_test)
y_pred_nb = gaussian_nb_clf.predict(X_test)
y_pred_knn = gaussian_nb_clf.predict(X_test)

# Compute accuracy for each classifiers
acc_dtree = accuracy_score(y_test, y_pred_dtree)
acc_logistic = accuracy_score(y_test, y_pred_logistic)
acc_svc = accuracy_score(y_test, y_pred_svc)
acc_nb = accuracy_score(y_test, y_pred_nb)
acc_knn = accuracy_score(y_test, y_pred_knn)

In [None]:
# Print the individual accuracy
print("Accuracy for Decision Tree: ", acc_dtree)
print("Accuracy for Logistic Regression: ", acc_logistic)
print("Accuracy for Support Vector Machine: ", acc_svc)
print("Accuracy for Naive Bayes: ", acc_nb)
print("Accuracy for K-Nearest Neighbors: ", acc_knn)

Accuracy for Decision Tree:  0.9181286549707602
Accuracy for Logistic Regression:  0.9415204678362573
Accuracy for Support Vector Machine:  0.9064327485380117
Accuracy for Naive Bayes:  0.935672514619883
Accuracy for K-Nearest Neighbors:  0.935672514619883


In [None]:
# Train the stack
stack_clf.fit(X_train, y_train)

# Evaluate the stack
y_pred_stack = stack_clf.predict(X_test)

# Compute the accuracy
acc_stack = accuracy_score(y_test, y_pred_stack)

# print the accuracy for stack
print("Accuracy for stacking: ", acc_stack)

Accuracy for stacking:  0.9590643274853801
