# StackingCVRegressor

An ensemble-learning meta-regressor for stacking using out-of-fold predictions to prepare the inputs for the level-2 regressor to prevent overfitting.

# Algorithm

Stacking is an ensemble learning technique to combine multiple regression models via a meta-regressor. The StackingCVRegressor extends the standard stacking algorithm (implemented as StackingRegressor) using out-of-fold predictions to prepare the input data for the level-2 classifier.

In the standard stacking procedure, the first-level regressors are fit to the same training set that is used prepare the inputs for the second-level regressor, which may lead to overfitting. The StackingCVRegressor, however, uses the concept of out-of-fold predictions: the dataset is split into k folds, and in k successive rounds, k-1 folds are used to fit the first level regressor; in each round, the first-level regressors are then applied to the remaining 1 subset that was not used for model fitting in each iteration. The resulting predictions are then stacked and provided -- as input data -- to the second-level regressor. After the training of the StackingCVRegressor, the first-level regressors are fit to the entire dataset for optimal training.

In [None]:
from mlxtend.regressor import StackingCVRegressor
from sklearn.datasets import load_boston
from sklearn.svm import SVR
from sklearn.linear_model import Lasso
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score

X, y = load_boston(return_X_y=True)

svr = SVR(kernel='linear')
lasso = Lasso()
rf = RandomForestRegressor(n_estimators=5)

stack = StackingRegressor(regressors=(svr, lasso, rf),
                          meta_regressor=lasso)

print('3-fold cross validation scores:\n')

for clf, label in zip([svr, lasso, rf, stack],
                      ['SVM', 'Lasso', 'Random Forest', 'StackingClassifier']):
    scores = cross_val_score(clf, X, y, cv=3, scoring='r2')
    print("R2 score: %0.2f (+/- %0.2f) [%s]"
          % (scores.mean(), scores.std(), label))