<a id='sect0'></a>
## <font color='darkblue'>Preface</font>
([article source](https://machinelearningmastery.com/super-learner-ensemble-in-python/)) <font size='3ptx'>**Selecting a machine learning algorithm for a predictive modeling problem involves evaluating many different models and model configurations using k-fold cross-validation.**</font>

The super learner is an [ensemble machine learning](https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/) algorithm that combines all of the models and model configurations that you might investigate for a predictive modeling problem and uses them to make a prediction as-good-as or better than any single model that you may have investigated.

**The <font color='darkblue'>super learner algorithm</font> is an application of [stacked generalization](https://machinelearningmastery.com/implementing-stacking-scratch-python/), called stacking or blending, to [k-fold cross-validation](https://machinelearningmastery.com/k-fold-cross-validation/) where all models use the same k-fold splits of the data and a meta-model is fit on the out-of-fold predictions from each model.**

In this tutorial, you will discover the super learner ensemble machine learning algorithm. After completing this tutorial, you will know:
* Super learner is the application of stacked generalization using out-of-fold predictions during k-fold cross-validation.
* The super learner ensemble algorithm is straightforward to implement in Python using scikit-learn models.
* The ML-Ensemble (mlens) library provides a convenient implementation that allows the super learner to be fit and used in just a few lines of code.

### <font color='darkgreen'>Tutorial Overview</font>
This tutorial is divided into three parts; they are:
* <font size='3ptx'>[**What Is the Super Learner?**](#sect1)</font>
* <font size='3ptx'>[**Manually Develop a Super Learner With scikit-learn**](#sect2)</font>
* <font size='3ptx'>[**Super Learner With ML-Ensemble Library**](#sect3)</font>

<a id='sect1'></a>
## <font color='darkblue'>What Is the Super Learner?</font>
<font size='3ptx'>**There are many hundreds of models to choose from for a predictive modeling problem; which one is best?**</font>

Then, after a model is chosen, how do you best configure it for your specific dataset?

These are open questions in applied machine learning. The best answer we have at the moment is to use empirical experimentation to test and discover what works best for your dataset.
> In practice, it is generally impossible to know a priori which learner will perform best for a given prediction problem and data set.<br/><br/>
> [**— Super Learner, 2007.**](https://www.degruyter.com/view/j/sagmb.2007.6.issue-1/sagmb.2007.6.1.1309/sagmb.2007.6.1.1309.xml)

This involves selecting many different algorithms that may be appropriate for your regression or classification problem and evaluating their performance on your dataset using a resampling technique, such as [k-fold cross-validation](https://machinelearningmastery.com/k-fold-cross-validation/).

The algorithm that performs the best on your dataset according to k-fold cross-validation is then selected, fit on all available data, and you can then start using it to make predictions.

There is an alternative approach.

Consider that you have already fit many different algorithms on your dataset, and some algorithms have been evaluated many times with different configurations. **You may have many tens or hundreds of different models of your problem. Why not use all those models instead of the best model from the group?**

**This is the intuition behind the so-called “super learner” ensemble algorithm.**

The super learner algorithm involves first pre-defining the k-fold split of your data, then evaluating all different algorithms and algorithm configurations on the same split of the data. All out-of-fold predictions are then kept and used to train a that learns how to best combine the predictions.
> The algorithms may differ in the subset of the covariates used, the basis functions, the loss functions, the searching algorithm, and the range of tuning parameters, among others. <br/><br/>
> [**— Super Learner In Prediction, 2010.**](https://biostats.bepress.com/ucbbiostat/paper266/)

The results of this model should be no worse than the best performing model evaluated during k-fold cross-validation and has the likelihood of performing better than any single model.

The super learner algorithm was proposed by [Mark van der Laan](https://en.wikipedia.org/wiki/Mark_van_der_Laan), [Eric Polley](https://www.mayo.edu/research/faculty/polley-eric-c-ph-d/bio-20316771), and [Alan Hubbard](https://ahubb40.github.io/) from Berkeley in their 2007 paper titled “[**Super Learner.**](https://www.degruyter.com/view/j/sagmb.2007.6.issue-1/sagmb.2007.6.1.1309/sagmb.2007.6.1.1309.xml)” It was published in a biological journal, which may be sheltered from the broader machine learning community.

The super learner technique is an example of the general method called “stacked generalization,” or “stacking” for short, and is known in applied machine learning as blending, as often a linear model is used as the meta-model.
> The super learner is related to the stacking algorithm introduced in neural networks context … <br/><br/>
> [**— Super Learner In Prediction, 2010.**](https://biostats.bepress.com/ucbbiostat/paper266/)

For more on the topic stacking, see the posts:
* [How to Develop a Stacking Ensemble for Deep Learning Neural Networks in Python With Keras](https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/)
* [How to Implement Stacked Generalization (Stacking) From Scratch With Python](https://machinelearningmastery.com/implementing-stacking-scratch-python/)

**We can think of the “super learner” as a specific configuration of stacking specifically to k-fold cross-validation.**

I have sometimes seen this type of blending ensemble referred to as a cross-validation ensemble. The procedure can be summarized as follows:
1. Select a k-fold split of the training dataset.
2. Select m base-models or model configurations.
3. For each basemodel:
    * Evaluate using k-fold cross-validation.
    * Store all out-of-fold predictions.
    * Fit the model on the full training dataset and store.
4. Fit a meta-model on the out-of-fold predictions.
5. Evaluate the model on a holdout dataset or use model to make predictions.

The image below, taken from the original paper, summarizes this data flow:
![1.png](images/1.png)
<br/>
Let’s take a closer look at some common sticking points you may have with this procedure.

#### Q. What are the inputs and outputs for the meta-model?
The meta-model takes in predictions from base-models as input and predicts the target for the training dataset as output:
* **Input:** Predictions from base-models.
* **Output:** Prediction for training dataset.

For example, if we had 50 base-models, then one input sample would be a vector with 50 values, each value in the vector representing a prediction from one of the base-models for one sample of the training dataset.

If we had 1,000 examples (rows) in the training dataset and 50 models, then the input data for the meta-model would be 1,000 rows and 50 columns.

#### Q. Won’t the meta-model overfit the training data?
Probably not.

This is the trick of the super learner, and the stacked generalization procedure in general.

**The input to the meta-model is the out-of-fold** (<font color='brown'>out-of-sample</font>) **predictions. In aggregate, the out-of-fold predictions for a model represent the model’s skill or capability in making predictions on data not seen during training.**

By training a meta-model on out-of-sample predictions of other models, the meta-model learns how to both correct the out-of-sample predictions for each model and to best combine the out-of-sample predictions from multiple models; actually, it does both tasks at the same time.

**Importantly, to get an idea of the true capability of the meta-model, it must be evaluated on new out-of-sample data. That is, data not used to train the base models.**

#### Q. Can this work for regression and classification?
Yes, it was described in the papers for regression (<font color='brown'>predicting a numerical value</font>).

It can work just as well for classification (<font color='brown'>predicting a class label</font>), although it is probably best to predict probabilities to give the meta-model more granularity when combining predictions.

#### Q. Why do we fit each base-model on the entire training dataset?
Each base-model is fit on the entire training dataset so that the model can be used later to make predictions on new examples not seen during training.

This step is strictly not required until predictions are needed by the super learner.

#### Q. How do we make a prediction?
To make a prediction on a new sample (row of data), first, the row of data is provided as input to each base model to generate a prediction from each model.

The predictions from the base-models are then concatenated into a vector and provided as input to the meta-model. The meta-model then makes a final prediction for the row of data.

We can summarize this procedure as follows:
1. Take a sample not seen by the models during training.
2. For each base-model:
  * Make a prediction given the sample.
  * Store prediction.
3. Concatenate predictions from submodel into a single vector.
4. Provide vector as input to the meta-model to make a final prediction.

<a id='sect2'></a>
## <font color='darkblue'>Manually Develop a Super Learner With scikit-learn</font> ([back](#sect0))
<font size='3ptx'>**The Super Learner algorithm is relatively straightforward to implement on top of the scikit-learn Python machine learning library.**</font>

In this section, we will develop an example of super learning for both regression and classification that you can adapt to your own problems.
* <font size='3ptx'>[**Super Learner for Regression**](#sect2_1)</font>
* <font size='3ptx'>[**Super Learner for Classification**](#sect2_2)</font>

In [1]:
# example of a super learner model for regression
from math import sqrt
from numpy import hstack
from numpy import vstack
from numpy import asarray
from sklearn.datasets import make_regression
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor

<a id='sect2_1'></a>
### <font color='darkgreen'>Super Learner for Regression</font> ([back](#sect2))
We will use the [make_regression()](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html) test problem and generate 1,000 examples (rows) with 100 features (columns). This is a simple regression problem with a linear relationship between input and output, with added noise.

**We will split the data so that 50 percent is used for training the model and 50 percent is held back to evaluate the final super model and base-models.**

In [2]:
# create the inputs and outputs
X, y = make_regression(n_samples=1000, n_features=100, noise=0.5)

# split
X, X_val, y, y_val = train_test_split(X, y, test_size=0.4)
print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)

Train (600, 100) (600,) Test (400, 100) (400,)


Next, we will define a bunch of different regression models.

In this case, we will use nine different algorithms with modest configuration. You can use any models or model configurations you like. The <font color='blue'>get_models()</font> function below defines all of the models and returns them as a list.

In [27]:
# create a list of base-models
def get_models():
    models = list()
    models.append(LinearRegression())
    models.append(ElasticNet())
    models.append(SVR(gamma='scale'))
    models.append(DecisionTreeRegressor())
    models.append(KNeighborsRegressor())
    models.append(AdaBoostRegressor())
    models.append(BaggingRegressor(n_estimators=10))
    models.append(RandomForestRegressor(n_estimators=10))
    models.append(ExtraTreesRegressor(n_estimators=10))
    return models

Next, we will use k-fold cross-validation to make out-of-fold predictions that will be used as the dataset to train the meta-model or “super learner.”

This involves first splitting the data into k folds; we will use 10. For each fold, we will fit the model on the training part of the split and make out-of-fold predictions on the test part of the split. This is repeated for each model and all out-of-fold predictions are stored.

**Each [out-of-fold prediction](https://machinelearningmastery.com/out-of-fold-predictions-in-machine-learning) will be a column for the meta-model input.** We will collect columns from each algorithm for one fold of the data, horizontally stacking the rows. Then for all groups of columns we collect, we will vertically stack these rows into one long dataset with 500 rows and nine columns.

The <font color='blue'>get_out_of_fold_predictions()</font> function below does this for a given test dataset and list of models; it will return the input and output dataset required to train the meta-model.

In [4]:
# collect out of fold predictions form k-fold cross validation
def get_out_of_fold_predictions(X, y, models):
    meta_X, meta_y = list(), list()

    # define split of data
    kfold = KFold(n_splits=10, shuffle=True)

    # enumerate splits
    for train_ix, test_ix in kfold.split(X):
        fold_yhats = list()
        # get data
        train_X, test_X = X[train_ix], X[test_ix]
        train_y, test_y = y[train_ix], y[test_ix]
        meta_y.extend(test_y)
        
        # fit and make predictions with each sub-model
        for model in models:
            model.fit(train_X, train_y)
            yhat = model.predict(test_X)
            
            # store columns
            fold_yhats.append(yhat.reshape(len(yhat),1))
        
        # store fold yhats as columns
        meta_X.append(hstack(fold_yhats))
        
    return vstack(meta_X), asarray(meta_y)

We can then call the function to get the models and the function to prepare the meta-model dataset.

In [5]:
%%time
# get models
models = get_models()

# get out of fold predictions
meta_X, meta_y = get_out_of_fold_predictions(X, y, models)

Wall time: 7.89 s


In [6]:
print(f'Meta meta_X.shap={meta_X.shape}; meta_y.shape={meta_y.shape}')

Meta meta_X.shap=(600, 9); meta_y.shape=(600,)


Next, we can fit all of the base-models on the entire training dataset.

In [7]:
# fit all base models on the training dataset
def fit_base_models(X, y, models):
    for model in models:
        model.fit(X, y)

Then, we can fit the meta-model on the prepared dataset.

In this case, we will use a linear regression model as the meta-model, as was used in the original paper.

In [8]:
# fit a meta model
def fit_meta_model(X, y):
    model = LinearRegression()
    model.fit(X, y)
    return model

Next, we can evaluate the base-models on the holdout dataset.

In [9]:
# evaluate a list of models on a dataset
def evaluate_models(X, y, models):
    for model in models:
        yhat = model.predict(X)
        mse = mean_squared_error(y, yhat)
        print('%s: RMSE %.3f' % (model.__class__.__name__, sqrt(mse)))

And, finally, use the super learner (<font color='brown'>base and meta-model</font>) to make predictions on the holdout dataset and evaluate the performance of the approach. The <font color='blue'>super_learner_predictions()</font> function below will use the meta-model to make predictions for new data.

In [10]:
# make predictions with stacked model
def super_learner_predictions(X, models, meta_model):
    meta_X = list()
    for model in models:
        yhat = model.predict(X)
        meta_X.append(yhat.reshape(len(yhat),1))
    meta_X = hstack(meta_X)

    # predict
    return meta_model.predict(meta_X)

We can call this function and evaluate the results:

In [11]:
# fit base models
fit_base_models(X, y, models)

# fit the meta model
meta_model = fit_meta_model(meta_X, meta_y)

# evaluate base models
evaluate_models(X_val, y_val, models)

# evaluate meta model
yhat = super_learner_predictions(X_val, models, meta_model)
print('Super Learner: RMSE %.3f' % (sqrt(mean_squared_error(y_val, yhat))))

LinearRegression: RMSE 0.535
ElasticNet: RMSE 77.614
SVR: RMSE 207.266
DecisionTreeRegressor: RMSE 191.468
KNeighborsRegressor: RMSE 184.046
AdaBoostRegressor: RMSE 117.860
BaggingRegressor: RMSE 131.757
RandomForestRegressor: RMSE 132.749
ExtraTreesRegressor: RMSE 131.049
Super Learner: RMSE 0.536


In this case, we can see that the linear models perform well on the dataset and the nonlinear algorithms not so well. We can also see that the super learner's performance is similar to the linear model (LinearRegression).

<a id='sect2_2'></a>
### <font color='darkgreen'>Super Learner for Classification</font> ([back](#sect2))
<font size='3ptx'>**The super learner algorithm for classification is much the same.**</font>

The inputs to the meta learner can be class labels or class probabilities, with the latter more likely to be useful given the increased granularity or uncertainty captured in the predictions.

In this problem, we will use the [make_blobs()](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html) test classification problem and use 1,000 examples with 100 input variables and two class labels.

In [13]:
from numpy import hstack
from numpy import vstack
from numpy import asarray
from sklearn.datasets import make_blobs
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier

In [14]:
# create the inputs and outputs
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# split
X, X_val, y, y_val = train_test_split(X, y, test_size=0.50)
print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)

Train (500, 100) (500,) Test (500, 100) (500,)


Next, we can change the <font color='blue'>get_classifier_models()</font> function to define a suite of linear and nonlinear classification algorithms.

In [15]:
# create a list of base-models
def get_classifier_models():
    models = list()
    models.append(LogisticRegression(solver='liblinear'))
    models.append(DecisionTreeClassifier())
    models.append(SVC(gamma='scale', probability=True))
    models.append(GaussianNB())
    models.append(KNeighborsClassifier())
    models.append(AdaBoostClassifier())
    models.append(BaggingClassifier(n_estimators=10))
    models.append(RandomForestClassifier(n_estimators=10))
    models.append(ExtraTreesClassifier(n_estimators=10))
    return models

Next, we can change the <font color='blue'>get_out_of_fold_classifier_predictions()</font> function to predict probabilities by a call to the <font color='blue'>predict_proba()</font> function.

In [19]:
# collect out of fold predictions form k-fold cross validation
def get_out_of_fold_classifier_predictions(X, y, models):
    meta_X, meta_y = list(), list()
    # define split of data
    kfold = KFold(n_splits=10, shuffle=True)
    # enumerate splits
    for train_ix, test_ix in kfold.split(X):
        fold_yhats = list()
        # get data
        train_X, test_X = X[train_ix], X[test_ix]
        train_y, test_y = y[train_ix], y[test_ix]
        meta_y.extend(test_y)
        # fit and make predictions with each sub-model
        for model in models:
            model.fit(train_X, train_y)
            yhat = model.predict_proba(test_X)
            # store columns
            fold_yhats.append(yhat)
        # store fold yhats as columns
        meta_X.append(hstack(fold_yhats))
    return vstack(meta_X), asarray(meta_y)

A Logistic Regression algorithm instead of a Linear Regression algorithm will be used as the meta-algorithm in the <font color='blue'>fit_classifier_meta_model()</font> function.

In [17]:
# fit a meta model
def fit_classifier_meta_model(X, y):
    model = LogisticRegression(solver='liblinear')
    model.fit(X, y)
    return model

And classification accuracy will be used to report model performance:

In [22]:
# evaluate a list of models on a dataset
def evaluate_classifier_models(X, y, models):
    for model in models:
        yhat = model.predict(X)
        acc = accuracy_score(y, yhat)
        print('%s: %.3f' % (model.__class__.__name__, acc*100))


# make predictions with stacked model
def super_learner_classifier_predictions(X, models, meta_model):
    meta_X = list()
    for model in models:
        yhat = model.predict_proba(X)
        meta_X.append(yhat)
    meta_X = hstack(meta_X)
    # predict
    return meta_model.predict(meta_X)

        
# get models
models = get_classifier_models()

# get out of fold predictions
meta_X, meta_y = get_out_of_fold_classifier_predictions(X, y, models)
print('Meta ', meta_X.shape, meta_y.shape)

# fit base models
fit_base_models(X, y, models)

# fit the meta model
meta_model = fit_classifier_meta_model(meta_X, meta_y)

# evaluate base models
evaluate_classifier_models(X_val, y_val, models)

# evaluate meta model
yhat = super_learner_classifier_predictions(X_val, models, meta_model)
print('Super Learner: %.3f' % (accuracy_score(y_val, yhat) * 100))

Meta  (500, 18) (500,)
LogisticRegression: 93.600
DecisionTreeClassifier: 70.600
SVC: 96.200
GaussianNB: 97.000
KNeighborsClassifier: 92.800
AdaBoostClassifier: 89.800
BaggingClassifier: 85.800
RandomForestClassifier: 84.400
ExtraTreesClassifier: 81.600
Super Learner: 97.200


In this case, we can see that the super learner has slightly better performance than the base learner algorithms.

<a id='sect3'></a>
## <font color='darkblue'>Super Learner With ML-Ensemble Library</font> ([back](#sect0))
<font size='3ptx'>**Implementing the super learner manually is a good exercise but is not ideal.**</font>

We may introduce bugs in the implementation and the example as listed does not make use of multiple cores to speed up the execution.

Thankfully, [Sebastian Flennerhag](http://flennerhag.com/) provides an efficient and tested implementation of the Super Learner algorithm and other ensemble algorithms in his [ML-Ensemble](https://github.com/flennerhag/mlens) (<font color='brown'>mlens</font>) Python library. It is specifically designed to work with scikit-learn models.

First, the library must be installed, which can be achieved via pip, as follows:

In [24]:
#!pip install mlens

Next, a [**SuperLearner**](https://mlens.readthedocs.io/en/0.1.x/source/mlens.ensemble.super_learner/#mlens.ensemble.super_learner.SuperLearner) class can be defined, models added via a call to the <font color='blue'>add()</font> function, the meta learner added via a call to the <font color='blue'>add_meta()</font> function, then the model used like any other scikit-learn model:
```python
...
# configure model
ensemble = SuperLearner(...)
# add list of base learners
ensemble.add(...)
# add meta learner
ensemble.add_meta(...)
# use model ...
```
We can use this class on the regression and classification problems from the previous section:
* <font size='3ptx'>[**Super Learner for Regression With the ML-Ensemble Library**](#sect3_1)</font>
* <font size='3ptx'>[**Super Learner for Classification With the ML-Ensemble Library**](#sect3_2)</font>

<a id='sect3_1'></a>
### <font color='darkgreen'>Super Learner for Regression With the ML-Ensemble Library</font>
First, we can define a function to calculate RMSE for our problem that the super learner can use to evaluate base-models.

In [30]:
# example of a super learner for regression using the mlens library
from math import sqrt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor
from mlens.ensemble import SuperLearner

[MLENS] backend: threading


In [25]:
# cost function for base models
def rmse(yreal, yhat):
    return sqrt(mean_squared_error(yreal, yhat))

Next, we can configure the [**SuperLearner**](https://mlens.readthedocs.io/en/0.1.x/source/mlens.ensemble.super_learner/#mlens.ensemble.super_learner.SuperLearner) with 10-fold cross-validation, our evaluation function, and the use of the entire training dataset when preparing out-of-fold predictions to use as input for the meta-model.

The <font color='blue'>get_super_learner()</font> function below implements this:

In [28]:
# create the super learner
def get_super_learner(X):
    ensemble = SuperLearner(scorer=rmse, folds=10, shuffle=True, sample_size=len(X))
    # add base models
    models = get_models()
    ensemble.add(models)
    # add the meta model
    ensemble.add_meta(LinearRegression())
    return ensemble

We can then fit the model on the training dataset.

In [33]:
# create the inputs and outputs
X, y = make_regression(n_samples=1000, n_features=100, noise=0.5)

# split
X, X_val, y, y_val = train_test_split(X, y, test_size=0.50)

print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)

Train (500, 100) (500,) Test (500, 100) (500,)


In [34]:
%%time
# create the super learner
ensemble = get_super_learner(X)

# fit the super learner
ensemble.fit(X, y)

Wall time: 1.97 s


SuperLearner(array_check=None, backend=None, folds=10,
       layers=[Layer(backend='threading', dtype=<class 'numpy.float32'>, n_jobs=-1,
   name='layer-1', propagate_features=None, raise_on_exception=True,
   random_state=None, shuffle=True,
   stack=[Group(backend='threading', dtype=<class 'numpy.float32'>,
   indexer=FoldIndex(X=None, folds=10, raise_on_ex...2C51AF0>)],
   n_jobs=-1, name='group-3', raise_on_exception=True, transformers=[])],
   verbose=0)],
       model_selection=False, n_jobs=None, raise_on_exception=True,
       random_state=None, sample_size=500,
       scorer=<function rmse at 0x00000204E2C51AF0>, shuffle=True,
       verbose=False)

Once fit, we can get a nice report of the performance of each of the base-models on the training dataset using k-fold cross-validation by accessing the “<font color='violet'>data</font>” attribute on the model.

In [35]:
# summarize base learners
print(ensemble.data)

                                  score-m  score-s  ft-m  ft-s  pt-m  pt-s
layer-1  adaboostregressor          91.45    10.99  0.56  0.01  0.03  0.01
layer-1  baggingregressor          100.65     8.56  0.23  0.01  0.01  0.01
layer-1  decisiontreeregressor     153.78    14.03  0.03  0.00  0.00  0.00
layer-1  elasticnet                 65.79     7.28  0.00  0.00  0.00  0.00
layer-1  extratreesregressor        91.14     8.66  0.18  0.06  0.01  0.00
layer-1  kneighborsregressor       159.19    15.35  0.01  0.00  0.01  0.00
layer-1  linearregression            0.55     0.04  0.03  0.01  0.00  0.00
layer-1  randomforestregressor     102.41     9.67  0.22  0.03  0.00  0.00
layer-1  svr                       183.47    15.75  0.03  0.00  0.00  0.00



Note that **we cannot compare the base learner scores in the table to the super learner as the base learners were evaluated on the training dataset only, not the holdout dataset.**

<a id='sect3_2'></a>
### <font color='darkgreen'>Super Learner for Classification With the ML-Ensemble Library</font>
<font size='3ptx'>**The ML-Ensemble is also very easy to use for classification problems, following the same general pattern.**</font>

**In this case, we will use our list of classifier models and a logistic regression model as the meta-model.** The complete example of fitting and evaluating a super learner model for a test classification problem with the mlens library is listed below.

In [36]:
# example of a super learner using the mlens library
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from mlens.ensemble import SuperLearner

In [37]:
# create the super learner
def get_super_classifier_learner(X):
    ensemble = SuperLearner(
        scorer=accuracy_score,
        folds=10,
        shuffle=True,
        sample_size=len(X)
    )
    
    # add base models
    models = get_classifier_models()
    ensemble.add(models)
    
    # add the meta model
    ensemble.add_meta(LogisticRegression(solver='lbfgs'))
    return ensemble

In [39]:
%%time
# create the inputs and outputs
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# split
X, X_val, y, y_val = train_test_split(X, y, test_size=0.50)
print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)

# create the super learner
ensemble = get_super_classifier_learner(X)

# fit the super learner
ensemble.fit(X, y)

Train (500, 100) (500,) Test (500, 100) (500,)
Wall time: 1.56 s


SuperLearner(array_check=None, backend=None, folds=10,
       layers=[Layer(backend='threading', dtype=<class 'numpy.float32'>, n_jobs=-1,
   name='layer-1', propagate_features=None, raise_on_exception=True,
   random_state=None, shuffle=True,
   stack=[Group(backend='threading', dtype=<class 'numpy.float32'>,
   indexer=FoldIndex(X=None, folds=10, raise_on_ex...D746DC0>)],
   n_jobs=-1, name='group-7', raise_on_exception=True, transformers=[])],
   verbose=0)],
       model_selection=False, n_jobs=None, raise_on_exception=True,
       random_state=None, sample_size=500,
       scorer=<function accuracy_score at 0x00000204DD746DC0>,
       shuffle=True, verbose=False)

In [40]:
# summarize base learners
print(ensemble.data)

# make predictions on hold out set
yhat = ensemble.predict(X_val)
print('Super Learner: %.3f' % (accuracy_score(y_val, yhat) * 100))

                                   score-m  score-s  ft-m  ft-s  pt-m  pt-s
layer-1  adaboostclassifier           0.90     0.04  0.44  0.03  0.04  0.01
layer-1  baggingclassifier            0.82     0.04  0.20  0.01  0.01  0.01
layer-1  decisiontreeclassifier       0.73     0.04  0.03  0.00  0.00  0.00
layer-1  extratreesclassifier         0.82     0.05  0.07  0.03  0.01  0.00
layer-1  gaussiannb                   0.97     0.02  0.00  0.00  0.00  0.00
layer-1  kneighborsclassifier         0.93     0.03  0.01  0.00  0.01  0.00
layer-1  logisticregression           0.96     0.02  0.01  0.00  0.00  0.00
layer-1  randomforestclassifier       0.83     0.06  0.07  0.01  0.01  0.00
layer-1  svc                          0.97     0.02  0.10  0.00  0.00  0.00

Super Learner: 97.000


Again, we can see that the super learner performs well on this test problem, and more importantly, is fit and evaluated very quickly as compared to the manual example in the previous section.

## <font color='darkblue'>Supplement</font>
* [How to Develop a Stacking Ensemble for Deep Learning Neural Networks in Python With Keras](https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/)
* [How to Implement Stacked Generalization (Stacking) From Scratch With Python](https://machinelearningmastery.com/implementing-stacking-scratch-python/)
* [How to Create a Bagging Ensemble of Deep Learning Models in Keras](https://machinelearningmastery.com/how-to-create-a-random-split-cross-validation-and-bagging-ensemble-for-deep-learning-in-keras/)
* [How to Use Out-of-Fold Predictions in Machine Learning](https://machinelearningmastery.com/out-of-fold-predictions-in-machine-learning)