https://towardsdatascience.com/ensemble-learning-stacking-blending-voting-b37737c4f483

There are different types of Ensemble Learning techniques which differ mainly by:

    the type of models used (homogeneous or heterogeneous models), 
    the data sampling (with or without replacement, k-fold, etc.) and 
    the decision function (voting, average, meta model, etc).
    
Therefore, Ensemble Learning techniques can be classified as:
    
    Bagging
    Boosting
    Stacking


In addition to these three main categories, two important variations emerge: 
    
    Voting (which is a complement of Bagging) and 
    Blending (a subtype of Stacking). 

Although Voting and Blending are a complement and a subtype of Bagging and Stacking respectively, these techniques are often found as direct types of Ensemble Learning.


**Stacking:**

Better known as Stacking Generalization, the key is to reduce the generalization error of different generalizers (i.e. ML models). 

The general idea of the Stacking Generalization method is the generation of a Meta-Model. 

The Stacking Generalization method is commonly composed of 2 training stages, “level 0” and “level 1”. 

It is important to mention that it can be added as many levels as necessary. However, in practice it is common to use only 2 levels. 

The aim of the first stage (level 0) is to generate the training data for the meta-model, this is carried out by implementing k-fold cross validation for each “weak learner” defined in the first stage. 

The predictions of each one of these“weak learners” are “stacked” in order to build such such “new training set” (the meta-model). 

The aim of the second stage (level 1) is to train the meta-model, such training is carried out through an already determined “final learner”.


**Blending:**

Blending is a technique derived from Stacking Generalization. 

The only difference is that in Blending, the k-fold cross validation technique is not used to generate the training data of the meta-model. 

Blending implements “one-holdout set”, that is, a small portion of the training data (validation) to make predictions which will be “stacked” to form the training data of the meta-model. 

Also, predictions are made from the test data to form the meta-model test data.


**Voting:**

The Voting Classifier is a homogeneous and heterogeneous type of Ensemble Learning, that is, the base classifiers can be of the same or different type. 

This type of ensemble also works as an extension of bagging (e.g. Random Forest).

The architecture of a Voting Classifier is made up of a number “n” of ML models, whose predictions are valued in two different ways: 

    Hard:
        In ***hard*** mode, the winning prediction is the one with “the most votes”. 
        
    Soft:
        ***Soft*** mode considers the probabilities thrown by each ML model, 
        these probabilities will be weighted and averaged, 
        consequently the winning class will be the one with the highest weighted and averaged probability.



Dataset: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

In [1]:
import numpy as np
import pandas as pd

from sklearn.naive_bayes import GaussianNB 
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier, StackingClassifier
#from mlxtend.classifier import StackingClassifier

from sklearn.model_selection import train_test_split, cross_val_score, RepeatedStratifiedKFold

import warnings
warnings.filterwarnings("ignore")

In [2]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

In [3]:
diabetes_df = pd.read_csv('data/diabetes.csv')

In [4]:
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [5]:
diabetes_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


In [6]:
diabetes_df.isna().any()

Pregnancies                 False
Glucose                     False
BloodPressure               False
SkinThickness               False
Insulin                     False
BMI                         False
DiabetesPedigreeFunction    False
Age                         False
Outcome                     False
dtype: bool

In [7]:
diabetes_df.columns

Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'],
      dtype='object')

In [8]:
X_train, X_test, y_train, y_test = train_test_split(diabetes_df.drop(columns=['Outcome']), 
                                                    diabetes_df['Outcome'], 
                                                    test_size=0.3, 
                                                    random_state=32)

In [9]:
clf1 = GaussianNB()
clf2 = LogisticRegression(random_state=32)
clf3 = DecisionTreeClassifier(random_state=32)
clf4 = RandomForestClassifier(random_state=32)
clf5 = KNeighborsClassifier(n_neighbors=1)
# For SVC probability parameter needed to be set to True to use with soft voting classifier
clf6 = SVC(probability=True, random_state=32) 

In [10]:
models = [clf1, clf2, clf3, clf4, clf5, clf6]
labels = ['Naive Bayes', 'Logistic Regression', 'Decision Tree', 'Random Forest', 'K-Nearest Neighbors', 'SVC']

In [11]:
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=32)

In [12]:
for clf, label in zip(models, labels):

    scores = cross_val_score(clf, X_train, y_train, cv=cv, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

Accuracy: 0.75 (+/- 0.05) [Naive Bayes]
Accuracy: 0.77 (+/- 0.05) [Logistic Regression]
Accuracy: 0.73 (+/- 0.06) [Decision Tree]
Accuracy: 0.76 (+/- 0.06) [Random Forest]
Accuracy: 0.69 (+/- 0.06) [K-Nearest Neighbors]
Accuracy: 0.74 (+/- 0.05) [SVC]


# Voting Classifier

In [13]:
voting_clf_hard = VotingClassifier(estimators = list(zip(labels,models)), voting = 'hard')

In [14]:
scores = cross_val_score(voting_clf_hard, X_train, y_train, cv=cv, scoring='accuracy')
print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), 'voting_clf_hard'))

Accuracy: 0.77 (+/- 0.05) [voting_clf_hard]


In [15]:
voting_clf_hard.fit(X_train, y_train)
y_pred = voting_clf_hard.predict(X_test)

# evaluating model performance using accuracy_score and classification report
score = accuracy_score(y_test, y_pred)
clf_report = classification_report(y_test, y_pred)

print(score)
print(clf_report)

0.7835497835497836
              precision    recall  f1-score   support

           0       0.78      0.93      0.84       147
           1       0.80      0.54      0.64        84

    accuracy                           0.78       231
   macro avg       0.79      0.73      0.74       231
weighted avg       0.79      0.78      0.77       231



In [16]:
voting_clf_soft = VotingClassifier(estimators = list(zip(labels,models)), voting = 'soft')

In [17]:
scores = cross_val_score(voting_clf_soft, X_train, y_train, cv=cv, scoring='accuracy')
print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), 'voting_clf_soft'))

Accuracy: 0.77 (+/- 0.05) [voting_clf_soft]


In [18]:
voting_clf_soft.fit(X_train, y_train)
y_pred = voting_clf_soft.predict(X_test)

# evaluating model performance using accuracy_score and classification report
score = accuracy_score(y_test, y_pred)
clf_report = classification_report(y_test, y_pred)

print(score)
print(clf_report)

0.7575757575757576
              precision    recall  f1-score   support

           0       0.78      0.86      0.82       147
           1       0.71      0.57      0.63        84

    accuracy                           0.76       231
   macro avg       0.74      0.72      0.73       231
weighted avg       0.75      0.76      0.75       231



# Stacking Classifier

In [19]:
clf1 = GaussianNB()
clf2 = LogisticRegression(random_state=32)
clf3 = DecisionTreeClassifier(random_state=32)
clf4 = RandomForestClassifier(random_state=32)
clf5 = KNeighborsClassifier(n_neighbors=1)
# For SVC probability parameter needed to be set to True to use with soft voting classifier
clf6 = SVC(probability=True, random_state=32) 

meta_clf = LogisticRegression(random_state=32)

In [20]:
models = [clf1, clf2, clf3, clf4, clf5, clf6]
labels = ['Naive Bayes', 'Logistic Regression', 'Decision Tree', 'Random Forest', 'K-Nearest Neighbors', 'SVC']

In [21]:
stack_clf = StackingClassifier(list(zip(labels,models)), meta_clf)

models.append(stack_clf)
labels.append('Stacking Classifier')

In [24]:
for clf, label in zip(models, labels):

    scores = cross_val_score(clf, X_train, y_train, cv=cv, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

Accuracy: 0.75 (+/- 0.05) [Naive Bayes]
Accuracy: 0.77 (+/- 0.05) [Logistic Regression]
Accuracy: 0.73 (+/- 0.06) [Decision Tree]
Accuracy: 0.76 (+/- 0.06) [Random Forest]
Accuracy: 0.69 (+/- 0.06) [K-Nearest Neighbors]
Accuracy: 0.74 (+/- 0.05) [SVC]
Accuracy: 0.77 (+/- 0.06) [Stacking Classifier]


In [25]:
stack_clf.fit(X_train, y_train)
y_pred = stack_clf.predict(X_test)

# evaluating model performance using accuracy_score and classification report
score = accuracy_score(y_test, y_pred)
clf_report = classification_report(y_test, y_pred)

print(score)
print(clf_report)

0.7662337662337663
              precision    recall  f1-score   support

           0       0.79      0.87      0.83       147
           1       0.72      0.58      0.64        84

    accuracy                           0.77       231
   macro avg       0.75      0.73      0.74       231
weighted avg       0.76      0.77      0.76       231

