# Stacking

We've made 6 models, some of them better than the others. Can't we make a model that combines all these models into one big one?

Yes we can, and it's called "stacking".

## Data import

We exported the data before into a pickle-file. That means we can quite simply import it here again.

In [None]:
import pickle

# Load the pickle file
with open('../exports/non_linear_data.pkl', 'rb') as file:
    data_dict = pickle.load(file)

# Display the loaded data

X_train = data_dict["X_train"]
X_test = data_dict["X_test"]
y_train = data_dict["y_train"]
y_test = data_dict["y_test"]

## The base models

To create a stacking model we'll need to follow the following steps:

Step-by-Step Implementation

1) Train base models.
1) Generate predictions from base models.
1) Train the meta-model using the predictions from the base models.
1) Evaluate the performance of the stacked model.

We've already trained the base models and generated the prediction. These are all safely stored in pickle-files, so we'll need to import them first.

Right?

No. What we saved were the predictions on the test-set, not on the training-set. To train a model we'll need all that data, not just the predictions. That means we'll have to rebuild all models (as we didn't save those).

But repeating is part of the learning process, so let's start with the linear regressor:

In [None]:
#DELETE

from sklearn.linear_model import LinearRegression

linear_regressor = LinearRegression()
linear_regressor.fit(X_train, y_train)
y_pred_lr = linear_regressor.predict(X_train)

Next was the decision tree.

In [None]:
#DELETE
from sklearn.tree import DecisionTreeRegressor

tree_regressor = DecisionTreeRegressor(max_depth=3)
tree_regressor.fit(X_train, y_train)
y_pred_tree = tree_regressor.predict(X_train)

Random forest! We found out the best parameters were:

```Python
{'max_depth': 5, 'min_samples_leaf': 4, 'min_samples_split': 10, 'n_estimators': 200}
```

In [None]:
#DELETE
from sklearn.ensemble import RandomForestRegressor

random_forest_regressor = RandomForestRegressor(n_estimators=200, random_state=42, max_depth=5, min_samples_leaf=4, min_samples_split=10)
random_forest_regressor.fit(X_train, y_train.ravel())

y_pred_forest = random_forest_regressor.predict(X_train)

Now the boosters, starting with the gradient booster. Try 46 estimators, not the 39 the grid-search came up with.

In [None]:
#DELETE
from sklearn.ensemble import GradientBoostingRegressor

final_gbr = GradientBoostingRegressor(n_estimators=46, random_state=42)
final_gbr.fit(X_train, y_train.ravel())

y_pred_gradientbooster = final_gbr.predict(X_train)

And finally XGBoost. The best parameters there were:

```Python
{'colsample_bytree': 0.6, 'gamma': 0.2, 'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 50, 'subsample': 0.6}
```

In [None]:
#DELETE
import xgboost as xgb

xgb_regressor = xgb.XGBRegressor(
    colsample_bytree=0.6,
    gamma=0.2,
    learning_rate=0.1,
    max_depth=3,
    n_estimators=50,
    subsample=0.6,
    random_state=42
)

xgb_regressor.fit(X_train, y_train.ravel())
y_pred_xgb = xgb_regressor.predict(X_train)

## The meta-model

First thing we'll need to do is create a dataset to train the model on. For this we'll paste the predicitons together. This will be our X. The actual Y-data remains, as it is what we want to predict.

So paste all our y_pred-variables together. Use np.column_stack.

Note: to check the meta-model we'll also need a dataset with all the predictions on the test-sets. We didn't create those yet, but the models are still in memory. Assemble a second numpy-dataset with all predictions on the testing-data.

Another note: You may want to ravel the Y-predictions for linear regression.

In [None]:
#DELETE
import numpy as np
meta_X_train = np.column_stack((y_pred_lr.ravel(), y_pred_tree, y_pred_forest, y_pred_gradientbooster, y_pred_xgb)) 
meta_X_test = np.column_stack((linear_regressor.predict(X_test), tree_regressor.predict(X_test), random_forest_regressor.predict(X_test), final_gbr.predict(X_test), xgb_regressor.predict(X_test)))

Next we'll train a linear regressor as our meta-model. Why the worst model in our test?

* Simplicity: Linear regression is simple and interpretable. It can effectively capture linear relationships between the predictions of the base models and the target variable.
* Speed: Linear regression is computationally efficient, making it quick to train and evaluate, especially when the number of base model predictions is relatively small.
* Baseline Performance: Linear regression can serve as a good baseline model. If it performs well, it indicates that the base models are providing useful information.
* Avoiding Overfitting: A simpler model like linear regression is less likely to overfit the predictions of the base models, especially if the base models are already complex (e.g., decision trees or ensemble methods).

When to Use More Complex Meta-Models

* Non-Linear Relationships: If the relationship between the predictions of the base models and the target variable is non-linear, using a more complex meta-model like a decision tree or a random forest may be beneficial.
* Diversity of Base Models: If the base models are diverse and capture different aspects of the data, a more flexible meta-model can help combine their predictions more effectively.
* Performance Improvement: If initial experiments show that a simple meta-model (like linear regression) does not perform well, it may be worth trying more complex models like decision trees, random forests, or even gradient boosting models.

When choosing a meta-model for a stacking ensemble, several factors should be considered. First, the nature of the problem—such as whether it is a regression or classification task—can guide model selection, as regression tasks may be well-served by linear models, while classification tasks might benefit from more complex approaches.

The complexity of the base models is also important; if the base models are already complex (e.g., ensemble methods), a simpler meta-model can help prevent overfitting, whereas simple base models may justify the use of a more complex meta-model. Additionally, the size of the dataset plays a role: smaller datasets often favor simpler models that generalize better, while larger datasets can support more complex ones.

Cross-validation should be used to empirically assess the performance of different meta-models and determine which is most effective in a given stacking configuration. Ultimately, experimentation is essential—trying out various meta-models and comparing their performance using relevant evaluation metrics is the best way to arrive at an optimal solution.

For now, we'll try a linear regressor and a decision tree. We'll stop there and not do all five models.

In [None]:
#DELETE
from sklearn.metrics import mean_squared_error

# Fit the meta-model
meta_model = LinearRegression()
meta_model.fit(meta_X_train, y_train)

# Make predictions with the meta-model
final_predictions = meta_model.predict(meta_X_test)

# Calculate Mean Squared Error
rmse = np.sqrt(mean_squared_error(y_test, final_predictions))
print(f'Root Mean Squared Error (Stacked Model): {rmse:.3f}')

And a decision tree?

In [None]:
#DELETE

# Fit the meta-model using a Decision Tree
meta_model_tree = DecisionTreeRegressor(random_state=42)
meta_model_tree.fit(meta_X_train, y_train)

# Make predictions with the meta-model
final_predictions = meta_model_tree.predict(meta_X_test)

# Calculate Mean Squared Error
rmse = np.sqrt(mean_squared_error(y_test, final_predictions))
print(f'Root Mean Squared Error (Stacked Model with Decision Tree): {rmse:.3f}')

2.1 vs 2.9, Decision tree it is! Although a random forest just may be better, with the tuning and all.

 ![](../files/2025-05-10-13-28-53.png)

In [None]:
# Nope.

## Implementing stacking

We have the base learners (linear_regressor, tree_regressor, random_forest_regressor, final_gbr, xgb_regressor) and the meta model (meta_model). Now stack'em up!

In [None]:
#DELETE
from sklearn.ensemble import StackingRegressor

# Define the base learners
base_learners = [
    ('linear_regressor', linear_regressor),
    ('tree_regressor', tree_regressor),
    ('random_forest_regressor', random_forest_regressor),
    ('gradient_boosting_regressor', final_gbr),
    ('xgb_regressor', xgb_regressor)
]

# Create the StackingRegressor
stacking_regressor = StackingRegressor(estimators=base_learners, final_estimator=meta_model)

# Fit the stacking regressor
stacking_regressor.fit(X_train, y_train.ravel())

# Make predictions
stacking_predictions = stacking_regressor.predict(X_test)

# Evaluate the model
stacking_rmse = np.sqrt(mean_squared_error(y_test, stacking_predictions))
print(f'Root Mean Squared Error (Stacking Regressor): {stacking_rmse:.3f}')

We did it! We took all our models and created a model that was better than the best of them!

In [None]:
with open('../exports/y_pred_stacking.pkl', 'wb') as f:
    pickle.dump(stacking_predictions, f)