## <font color='darkblue'>Preface</font>
([article source](https://towardsdatascience.com/ensemble-learning-stacking-blending-voting-b37737c4f483), [github](https://github.com/FernandoLpz/Stacking-Blending-Voting-Ensembles)) <font size='3ptx'>**If you want to increase the effectiveness of your ML model, maybe you should consider Ensemble Learning**</font>
![1.jpeg](images/1.jpeg)
<br/>

We have heard the phrase “unity is strength”, whose meaning can be transferred to different areas of life. Sometimes correct answers to a specific problem are supported by several sources and not just one. **This is what Ensemble Learning tries to do, that is, to put together a group of ML models to improve solutions to specific problems.**

Throughout this blog, we will learrn what Ensemble Learning is, what are the types of Ensembles that exist and we will specifically address Voting and Stacking Ensembles. Therefore, this blog will be divided into the following sections:
* <font size='3ptx'>[**What is Ensemble Learning?**](#sect1)</font>
* <font size='3ptx'>[**Stacking**](#sect2)</font>
* <font size='3ptx'>[**Blending**](#sect3)</font>
* <font size='3ptx'>[**Voting**](#sect4)</font>

## <font color='darkblue'>What is Ensemble Learning?</font>
[**Ensemble Learning**](https://en.wikipedia.org/wiki/Ensemble_learning) refers to the use of ML algorithms jointly to solve classification and/or regression problems mainly. These algorithms can be the same type (homogeneous Ensemble Learning) or different types (heterogeneous Ensemble Learning). **Ensemble Learning performs a strategic combination of various experts or ML models in order to improve the effectiveness obtained using a single weak model** \[[1](http://www.scholarpedia.org/article/Ensemble_learning), [2](https://tjzhifei.github.io/links/EMFA.pdf)]. Figure 1 provides a visual overview regarding the comparison of a model that does not implement Ensemble Learning and a model that does implement Ensemble Learning.
![2](images/2.jpeg)
<br/>

There are different types of Ensemble Learning techniques which differ mainly by the type of models used (<font color='brown'>homogeneous or heterogeneous models</font>), the data sampling (<font color='brown'>with or without replacement, k-fold, etc.</font>) and the decision function (<font color='brown'>voting, average, meta model, etc</font>). Therefore, Ensemble Learning techniques can be classified as:
* Bagging
* Boosting
* Stacking

In addition to these three main categories, two important variations emerge: **<font color='darkblue'>Voting</font>** (<font color='brown'>which is a complement of Bagging</font>) and **<font color='darkblue'>Blending</font>** (<font color='brown'>a subtype of Stacking</font>). Although Voting and Blending are a complement and a subtype of Bagging and Stacking respectively, these techniques are often found as direct types of Ensemble Learning.

In this blog we will specifically address the Stacking, Blending and Voting techniques, let’s go for it!

<a id='sect2'></a>
## <font color='darkblue'>Stacking</font>
**Better known as Stacking Generalization, it is a method introduced by [David H. Wolpert in 1992](https://www.sciencedirect.com/science/article/abs/pii/S0893608005800231)** \[3] **where the key is to reduce the generalization error of different generalizers** (<font color='brown'>i.e. ML models</font>). The general idea of the Stacking Generalization method is the generation of a Meta-Model. Such **a Meta-Model is made up of the predictions of a set of ML base models** (<font color='brown'>i.e. weak learners</font>) through the k-fold cross validation technique. Finally, the Meta-Model is trained with an additional ML model (<font color='brown'>which is commonly known as the “final estimator” or “final learner”</font>).

**The Stacking Generalization method is commonly composed of 2 training stages, better known as “level 0” and “level 1”.** It is important to mention that it can be added as many levels as necessary. However, in practice it is common to use only 2 levels. The aim of the first stage (<font color='brown'>level 0</font>) is to generate the training data for the meta-model, this is carried out by implementing k-fold cross validation for each “weak learner” defined in the first stage. The predictions of each one of these“weak learners” are “stacked” in order to build such such “new training set” (<font color='brown'>the meta-model</font>). **The aim of the second stage** (<font color='brown'>level 1</font>) **is to train the meta-model, such training is carried out through an already determined “<font color='darkblue'>final learner</font>”.**

In figure 2 we see a graphical description of an architecture of a Stacking Generalization Classifier that is composed of 3 base models (<font color='brown'>weak learners</font>) and a final estimator.
![3](images/3.jpeg)
<br/>
**Perfect, so far we already know how the <font color='darkblue'>Stacking Generalization</font> technique works**. Now let’s see a small example of how we would do this in code (<font color='brown'>it is important to mention that this technique can be implemented directly from [scikit-learn](https://scikit-learn.org/stable/auto_examples/ensemble/plot_stack_predictors.html), however, in order to make the explanation more demonstrative, let’s see how we do it from scratch</font>).

In [2]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression

class Ensemble:
    def __init__(self):
        self.x_train = None
        self.x_test = None
        self.y_train = None
        self.y_test = None
        self.k = 5

    def load_data(self):
        x, y = load_breast_cancer(return_X_y=True)
        self.x_train, self.x_test, self.y_train, self.y_test = train_test_split(x, y, test_size=0.3, random_state=23)
    
    def StackingClassifier(self):
        # Define weak learners
        weak_learners = [
            ('dt', DecisionTreeClassifier()),
            ('knn', KNeighborsClassifier()),
            ('rf', RandomForestClassifier()),
            ('gb', GradientBoostingClassifier()),
            ('gn', GaussianNB())
        ]
        
        # Finaler learner or meta model
        final_learner = LogisticRegression()

        train_meta_model = None
        test_meta_model = None

        # Start stacking
        for clf_id, clf in weak_learners:
            # Predictions for each classifier based on k-fold
            predictions_clf = self.k_fold_cross_validation(clf)
            
            # Predictions for test set for each classifier based on train of level 0
            test_predictions_clf = self.train_level_0(clf)
            
            # Stack predictions which will form 
            # the inputa data for the data model
            if isinstance(train_meta_model, np.ndarray):
                train_meta_model = np.vstack((train_meta_model, predictions_clf))
            else:
                train_meta_model = predictions_clf

            # Stack predictions from test set
            # which will form test data for meta model
            if isinstance(test_meta_model, np.ndarray):
                test_meta_model = np.vstack((test_meta_model, test_predictions_clf))
            else:
                test_meta_model = test_predictions_clf
        
        # Transpose train_meta_model
        train_meta_model = train_meta_model.T

        # Transpose test_meta_model
        test_meta_model = test_meta_model.T
        
        # Training level 1
        self.train_level_1(final_learner, train_meta_model, test_meta_model)

    def k_fold_cross_validation(self, clf):        
        predictions_clf = None

        # Number of samples per fold
        batch_size = int(len(self.x_train) / self.k)

        # Stars k-fold cross validation
        for fold in range(self.k):

            # Settings for each batch_size
            if fold == (self.k - 1):
                test = self.x_train[(batch_size * fold):, :]
                batch_start = batch_size * fold
                batch_finish = self.x_train.shape[0]
            else:
                test = self.x_train[(batch_size * fold): (batch_size * (fold + 1)), :]
                batch_start = batch_size * fold
                batch_finish = batch_size * (fold + 1)
            
            # test & training samples for each fold iteration
            fold_x_test = self.x_train[batch_start:batch_finish, :]
            fold_x_train = self.x_train[[index for index in range(self.x_train.shape[0]) if index not in range(batch_start, batch_finish)], :]

            # test & training targets for each fold iteration
            fold_y_test = self.y_train[batch_start:batch_finish]
            fold_y_train = self.y_train[[index for index in range(self.x_train.shape[0]) if index not in range(batch_start, batch_finish)]]

            # Fit current classifier
            clf.fit(fold_x_train, fold_y_train)
            fold_y_pred = clf.predict(fold_x_test)

            # Store predictions for each fold_x_test
            if isinstance(predictions_clf, np.ndarray):
                predictions_clf = np.concatenate((predictions_clf, fold_y_pred))
            else:
                predictions_clf = fold_y_pred

        return predictions_clf

    def train_level_0(self, clf):
        # Train in full real training set
        clf.fit(self.x_train, self.y_train)
        # Get predictions from full real test set
        y_pred = clf.predict(self.x_test)
        
        return y_pred

    def train_level_1(self, final_learner, train_meta_model, test_meta_model):
        # Train is carried out with final learner or meta model
        final_learner.fit(train_meta_model, self.y_train)
        # Getting train and test accuracies from meta_model
        print(f"Train accuracy: {final_learner.score(train_meta_model,  self.y_train)}")
        print(f"Test accuracy: {final_learner.score(test_meta_model, self.y_test)}")

Above class use [load_breast_cancer](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html) to load dataset "[**breast cancer wisconsin dataset**](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data)". Let's see how to execute it:

In [4]:
es = Ensemble()
es.load_data()
es.StackingClassifier()

Train accuracy: 0.9597989949748744
Test accuracy: 0.9824561403508771


Let’s analyze the key parts, we are defining 5 classifiers and stored in `weak_learners` (<font color='brown'>weak learners</font>) that will be the base models of our stack (<font color='brown'>which are trained at level 0</font>). We define the final classifier `final_learner` by using [**LogisticRegression**](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) (<font color='brown'>which is the meta-model classifier</font>). Now, level 0 training begins with the for loop defined in line `for clf_id, clf in weak_learners`. As we can see, in line `predictions_clf = self.k_fold_cross_validation(clf)` we are receiving the predictions of k-fold cross validation and “stacking” these predictions (<font color='brown'>the which are forming the training data of the meta-model</font>). On line `test_predictions_clf = self.train_level_0(clf)` we are receiving the predictions from the test set which are “stacked” to form the meta-model test data. Finally, in line `self.train_level_1(final_learner, train_meta_model, test_meta_model)` we carry out the level 1 training, that is, the meta-model training.

Well, so far we already know how the Stacking Generalization technique works. **As we mentioned, one of the key parts of this method is the use of the k-fold cross validation for the generation of the meta-model training data. However, there is a variation, we can omit k-fold cross validation and only use “one-holdout set”, this small but significant variation is called “<font color='darkblue'>Blending</font>”.**

<a id='sect3'></a>
## <font color='darkblue'>Blending</font>
**<font color='darkblue'>Blending</font> is a technique derived from Stacking Generalization.** The only difference is that in Blending, the k-fold cross validation technique is not used to generate the training data of the meta-model. Blending implements “one-holdout set”, that is, a small portion of the training data (<font color='brown'>validation</font>) to make predictions which will be “stacked” to form the training data of the meta-model. Also, predictions are made from the test data to form the meta-model test data.

In figure 3 we can see a Blending architecture using 3 base models (<font color='brown'>weak learners</font>) and a final classifier. The blue boxes represent that portion of the training data that is used to generate predictions (<font color='brown'>yellow boxes</font>) to form the meta-model. The green boxes represent the test data which is used to generate predictions to form the meta-model test data (<font color='brown'>purple boxes</font>).
![3](images/4.jpeg)
<br/>

Great, now that you’re familiar with the Blending architecture, let’s see how we do this in code:

In [11]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression

class Ensemble:
    def __init__(self):
        self.x_train = None
        self.x_test = None
        self.y_train = None
        self.y_test = None

    def load_data(self):
        x, y = load_breast_cancer(return_X_y=True)
        self.x_train, self.x_test, self.y_train, self.y_test = train_test_split(x, y, test_size=0.15, random_state=23)
        self.x_train, self.x_val, self.y_train, self.y_val = train_test_split(
            self.x_train, 
            self.y_train, 
            test_size=0.3, 
            random_state=23
        )
        
    def go(self):
        self.load_data()
        self.BlendingClassifier()
    
    def BlendingClassifier(self):

        # Define weak learners
        weak_learners = [('dt', DecisionTreeClassifier()),
                        ('knn', KNeighborsClassifier()),
                        ('rf', RandomForestClassifier()),
                        ('gb', GradientBoostingClassifier()),
                        ('gn', GaussianNB())]
        
        # Finaler learner or meta model
        final_learner = LogisticRegression()

        train_meta_model = None
        test_meta_model = None

        # Start stacking
        for clf_id, clf in weak_learners:
            
            # Predictions for each classifier based on k-fold
            val_predictions, test_predictions = self.train_level_0(clf)
            
            # Stack predictions which will form 
            # the inputa data for the data model
            if isinstance(train_meta_model, np.ndarray):
                train_meta_model = np.vstack((train_meta_model, val_predictions))
            else:
                train_meta_model = val_predictions

            # Stack predictions from test set
            # which will form test data for meta model
            if isinstance(test_meta_model, np.ndarray):
                test_meta_model = np.vstack((test_meta_model, test_predictions))
            else:
                test_meta_model = test_predictions
        
        # Transpose train_meta_model
        train_meta_model = train_meta_model.T

        # Transpose test_meta_model
        test_meta_model = test_meta_model.T
        
        # Training level 1
        self.train_level_1(final_learner, train_meta_model, test_meta_model)


    def train_level_0(self, clf):
        # Train with base x_train
        clf.fit(self.x_train, self.y_train)
        
        # Generate predictions for the holdout set (validation)
        # These predictions will build the input for the meta model
        val_predictions = clf.predict(self.x_val)
        
        # Generate predictions for original test set
        # These predictions will be used to test the meta model
        test_predictions = clf.predict(self.x_test)

        return val_predictions, test_predictions

    def train_level_1(self, final_learner, train_meta_model, test_meta_model):
        # Train is carried out with final learner or meta model
        final_learner.fit(train_meta_model, self.y_val)
       
        # Getting train and test accuracies from meta_model
        print(f"Train accuracy: {final_learner.score(train_meta_model,  self.y_val)}")
        print(f"Test accuracy: {final_learner.score(test_meta_model, self.y_test)}")

Let’s analyze the key parts of this model. We are defining the 5 base classifiers and stored them in `weak_learners` that we will use (<font color='brown'>weak learners</font>), we define the final classifier `final_learner`, as in the previous example, we will use [**LogisticRegression**](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html). Level 0 training begins on line `for clf_id, clf in weak_learners:`. As we can see, on line `val_predictions, test_predictions = self.train_level_0(clf)` we are receiving the predictions of the validation set (<font color='brown'>which will form the training data of the meta-model</font>) and the predictions of the test data (<font color='brown'>the which will form the meta-model test data</font>). Also, we are “stacking” the predictions of each base classifier. Finally, on line `self.train_level_1(final_learner, train_meta_model, test_meta_model)` we are moving to level 1 training, and that is it!

As we can see, the **Blending** architecture is slightly simpler and more compact than Stack Generalization. **Omitting k-fold cross validation can make us optimize the processing time.**

Great, by now you already know the Stacked Generalization architecture and how it works as well as the variation that arises from it (<font color='brown'>Blending</font>). **The million dollar question remains: which technique is better? When should I apply Stacking or Blending? Well, that will depend 100% on the task you are trying to solve, the amount of data you have as well as the computing power and memory available.**

Finally, let’s talk about an Ensemble Learning technique that is simple, intuitive and that can sometimes be a good option, let’s talk about Voting!

In [12]:
es = Ensemble()
es.go()

Train accuracy: 0.9448275862068966
Test accuracy: 0.9651162790697675


<a id='sect4'></a>
## <font color='darkblue'>Voting</font>
This type of ensemble is one of the most intuitive and easy to understand. The <font color='darkblue'>**Voting Classifier**</font> is a homogeneous and heterogeneous type of Ensemble Learning, that is, the base classifiers can be of the same or different type. As mentioned earlier, **this type of ensemble also works as an extension of bagging** (<font color='brown'>e.g. Random Forest</font>).

**The architecture of a Voting Classifier is made up of a number “n” of ML models, whose predictions are valued in two different ways: hard and soft.** In hard mode, the winning prediction is the one with “the most votes”. In Figure 2 we see an example of how the Voting Classifier works in hard mode.
![5](images/5.jpeg)
<br/>

On the other hand, **the Voting Classifier in soft mode considers the probabilities thrown by each ML model, these probabilities will be weighted and averaged, consequently the winning class will be the one with the highest weighted and averaged probability**. In Figure 3 we see an example of how the Voting Classifier works in the soft mode.
![6](images/6.jpeg)
<br/>
Ok, now that we know how the Voiting Classifier works, let’s see how to do this in code. On this occasion, since it is a simple and intuitive ensemble technique (<font color='brown'>compared to Stacking or Blending</font>), let’s make use of the function provided by scikit-learn for the implementation of Voting, let’s do it!

In [23]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import VotingClassifier


class Ensemble:
    def __init__(self):
        self.x_train = None
        self.x_test = None
        self.y_train = None
        self.y_test = None

    def load_data(self):
        x, y = load_breast_cancer(return_X_y=True)
        self.x_train, self.x_test, self.y_train, self.y_test = train_test_split(x, y, test_size=0.25, random_state=23)

    def go(self):
        self.load_data()
        # vc, decision_tree, knn, logistic_regression = self.__VotingClassifier__()
       
        # Getting train and test accuracies from meta_model
        for name, clf in self.__VotingClassifier__():
            print(f"{name} Train accuracy: {clf.score(self.x_train,  self.y_train)}")
            print(f"{name} Test accuracy: {clf.score(self.x_test, self.y_test)}")
            print("=" * 50)
        
    @staticmethod
    def __Classifiers__(name=None):
        # See for reproducibility
        random_state = 23
        
        if name == 'decision_tree':
            return DecisionTreeClassifier(random_state=random_state)
        if name == 'kneighbors':
            return KNeighborsClassifier()
        if name == 'logistic_regression':
            return LogisticRegression(random_state=random_state)

    def __DecisionTreeClassifier__(self):
        
        # Decision Tree Classifier
        decision_tree = Ensemble.__Classifiers__(name='decision_tree')
        
        # Train Decision Tree
        decision_tree.fit(self.x_train, self.y_train)
        return decision_tree

    def __KNearestNeighborsClassifier__(self):
        
        # K-Nearest Neighbors Classifier
        knn = Ensemble.__Classifiers__(name='kneighbors')
        
        # Train K-Nearest Neighbos
        knn.fit(self.x_train, self.y_train)
        return knn

    def __LogisticRegression__(self):
        
        # Decision Tree Classifier
        logistic_regression = Ensemble.__Classifiers__(name='logistic_regression')
        
        # Init Grid Search
        logistic_regression.fit(self.x_train, self.y_train)
        return logistic_regression
    
    def __VotingClassifier__(self):

        # Instantiate classifiers
        decision_tree = Ensemble.__Classifiers__(name='decision_tree')
        knn = Ensemble.__Classifiers__(name='kneighbors')
        logistic_regression = Ensemble.__Classifiers__(name='logistic_regression')

        # Voting Classifier initialization
        vc = VotingClassifier(
            estimators=[
                ('decision_tree', decision_tree), 
                ('knn', knn), 
                ('logistic_regression', logistic_regression)], 
            voting='soft'
        )
        
        # Init Grid Search
        vc.fit(self.x_train, self.y_train)
        return (
            ("voting", vc), 
            ('decision_tree', self.__DecisionTreeClassifier__()), 
            ('knn', self.__KNearestNeighborsClassifier__()), 
            ('logistic_regression', self.__LogisticRegression__())
        )

In [24]:
es = Ensemble()
es.go()

voting Train accuracy: 0.9835680751173709
voting Test accuracy: 0.965034965034965
decision_tree Train accuracy: 1.0
decision_tree Test accuracy: 0.958041958041958
knn Train accuracy: 0.9460093896713615
knn Test accuracy: 0.9300699300699301
logistic_regression Train accuracy: 0.9507042253521126
logistic_regression Test accuracy: 0.9440559440559441


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In the code above we are creating a class which will contain different classifiers which are: Decision Tree, K-Nearest Neighbors, Logistic Regression and [**Voting Classifier**](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html). To compare the effectiveness between “weak classifiers” and the Ensemble, we will make use of the “[**breast_cancer**](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data)” toy dataset. We will use each classifier with its default values.

**As we can see, the Test accuracy of the Voting Classifier is slightly better than that of the weak classifiers**. It is very important to mention that, although Voting Classifier is a great alternative to improve the accuracy of your models, it may not always be the best option due to various factors, including processing time.