# 30 Days of ML - Stacked Ensembles


I like to think about this solution as "**aggressive ensembling**." It turns out, this approach is very popular in Kaggle because it lets you squeeze as much performance as possible out of your solution.

The `Trainer` class and the set of model classes reduced the code footprint considerably and made experimentation much easier.


## Recompiling LGBM with GPU support

By default, you can't use an `LGBMRegressor` model with GPU support. 

For this competition, CPU training is out of the question, so let's recompile LGBM with GPU support.

In [None]:
%%capture
!git clone --recursive https://github.com/Microsoft/LightGBM
!apt-get install -y -qq libboost-all-dev

In [None]:
%%capture 
%%bash
cd LightGBM
rm -r build
mkdir build
cd build
cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ ..
make -j$(nproc)

In [None]:
%%capture
!cd LightGBM/python-package/;python3 setup.py install --precompile

In [None]:
%%capture
!mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
!rm -r LightGBM

## Importing the libraries we need

Here is where the competition-specific code begins.

Let's import the libraries we need and define a couple of constants that we'll use throughout the notebook.

In [None]:
import glob
import pandas as pd
import numpy as np
import optuna
import random 

from sklearn import compose
from sklearn import ensemble
from sklearn import impute
from sklearn import linear_model
from sklearn import metrics
from sklearn import model_selection
from sklearn import pipeline
from sklearn import preprocessing

import lightgbm as lgbm
import xgboost as xgb
import catboost as cat


# This is nice handy constant to turn on and off the GPU. When `False`
# the notebook will ignore the GPU even when present.
GPU_ENABLED = True

if GPU_ENABLED:
    # If we want to use the GPU, but we didn't enable it, the code will
    # blow up. To make sure this doesn't happen, here I'm checking whether
    # we truly have access to a GPU.
    from tensorflow.python.client import device_lib
    GPU_ENABLED = len(device_lib.list_local_devices()) >= 2

# All of the models in this notebook are fitted in a k-fold cross-validation 
# manner. This constant represents the value of `k`. I got the best results 
# using 20 folds, but it takes around 8 hours to run this notebook on CPU 
# using 20 folds, so I set it to 10 here.
CROSS_VALIDATION_FOLDS = 10

## Loading the data

Let's start by loading the train and test data.

We also want to extend the available features by generating dummies for all categorical columns.

At the end of this process, we should end up with a dataset with 82 columns.

In [None]:
df_train = pd.read_csv("../input/30-days-of-ml/train.csv")
df_test = pd.read_csv("../input/30-days-of-ml/test.csv")

cont_features = [f for f in df_train.columns.tolist() if f.startswith('cont')]
cat_features = [f for f in df_train.columns.tolist() if f.startswith('cat')]

dummies = pd.get_dummies(df_train.append(df_test)[cat_features])
df_train[dummies.columns] = dummies.iloc[:len(df_train), :]
df_test[dummies.columns] = dummies.iloc[len(df_train): , :]

## Duplicating code sucks

It does big time.

For this competition, I'll be using several ensemble models stacked on top of each other. I didn't want to duplicate all of the boilerplate code to make that work, so I created a few classes that will help me keep the notebook as clean as possible. 

Bonus: Experimenting with this mini-framework should be way easier than having to make the exact change all over the notebook.

For this idea to work, I first created a wrapper class for each of the four different models I'll be using.

In [None]:
class Model(object):
    """
    Base model class from which every specific model implementation will inherit.
    Args:
        - preprocessor: A standard sklearn pipeline component that will be used
            to transform the data.
        - model_params: A dictionary with the parameters that will be used
            to construct the model.
    """
    
    def __init__(self, preprocessor=None, model_params={}):
        self.preprocessor = preprocessor
        self.model_params = model_params
        self._model = None

    def preprocess(self, datasets):
        """
        Preprocesses the list of supplied datasets using the configured
        preprocessor and returns the transformed data.
        """
        
        if self.preprocessor is None:
            return datasets

        if datasets is None or len(datasets) == 0:
            return []

        result = [self.preprocessor.fit_transform(datasets[0])]

        for i in range(1, len(datasets)):
            result.append(self.preprocessor.transform(datasets[i]))

        return result

    def predict(self, dataset):
        """
        A pass-through function that runs predictions on a dataset.
        """
        
        if self._model is None:
            return None

        return self._model.predict(dataset)


class XGBModel(Model):
    """
    A wrapper implementation of an XGBRegressor model.
    """
    
    def __init__(self, preprocessor=None, model_params={}):
        super().__init__(preprocessor, model_params)

    def fit(self, X_train, y_train, X_valid, y_valid, gpu_enabled=True):
        """
        Fits an instance of the model on the supplied data.
        Args:
            - X_train: Train data.
            - y_train: Train target.
            - X_valid: Validation data.
            - y_valid: Validation target.
            - gpu_enabled: Whether we want to fit the model using the GPU.
        """
        
        if gpu_enabled:
            self.model_params["tree_method"] = "gpu_hist"
            self.model_params["predictor"] = "gpu_predictor"
        else:
            self.model_params["n_jobs"] = -1

        self._model = xgb.XGBRegressor(
            objective="reg:squarederror",
            random_state=0, 
            **self.model_params
        ) 

        self._model.fit(
            X_train, 
            y_train, 
            early_stopping_rounds=300, 
            eval_set=[(X_valid, y_valid)], 
            verbose=False
        )

        return self._model


class LGBMModel(Model):
    """
    A wrapper implementation of an LGBMRegressor model.
    """

    def __init__(self, preprocessor=None, model_params={}):
        super().__init__(preprocessor, model_params)

    def fit(self, X_train, y_train, X_valid, y_valid, gpu_enabled=True):
        """
        Fits an instance of the model on the supplied data.
        Args:
            - X_train: Train data.
            - y_train: Train target.
            - X_valid: Validation data.
            - y_valid: Validation target.
            - gpu_enabled: Whether we want to fit the model using the GPU.
        """
        
        if gpu_enabled:
            self.model_params["device"] = "gpu"
            self.model_params["gpu_platform_id"] = 0
            self.model_params["gpu_device_id"] = 0
        else:
            self.model_params["n_jobs"] = -1

        self._model = lgbm.LGBMRegressor(
            objective='regression', 
            metric="rmse",
            random_state=0,
            **self.model_params
        )

        self._model.fit(
            X_train, 
            y_train, 
            early_stopping_rounds=300, 
            eval_set=[(X_valid, y_valid)], 
            verbose=False
        )

        return self._model


class CatBoostModel(Model):
    """
    A wrapper implementation of a CatBoostRegressor model.
    """

    def __init__(self, preprocessor=None, model_params={}):
        super().__init__(preprocessor, model_params)

    def fit(self, X_train, y_train, X_valid, y_valid, gpu_enabled=True):
        """
        Fits an instance of the model on the supplied data.
        Args:
            - X_train: Train data.
            - y_train: Train target.
            - X_valid: Validation data.
            - y_valid: Validation target.
            - gpu_enabled: Whether we want to fit the model using the GPU.
        """
        
        if gpu_enabled:
            self.model_params["task_type"] = "GPU"
            self.model_params["bootstrap_type"] = "Poisson"

        self._model = cat.CatBoostRegressor(
            loss_function='RMSE', 
            random_state=0,
            **self.model_params
        )

        self._model.fit(
            X_train, 
            y_train, 
            early_stopping_rounds=300, 
            eval_set=[(X_valid, y_valid)], 
            verbose=False
        )

        return self._model

    
class LassoModel(Model):
    """
    A wrapper implementation of a Lasso model.
    """

    def __init__(self, preprocessor=None, model_params={}):
        super().__init__(preprocessor, model_params)

    def fit(self, X_train, y_train, **kwargs):
        """
        Fits an instance of the model on the supplied data.
        Args:
            - X_train: Train data.
            - y_train: Train target.
        """
        
        self._model = linear_model.Lasso(
            precompute=True,
            positive=True, 
            random_state=999, 
            fit_intercept=True,
            **self.model_params
        )

        self._model.fit(
            X_train, 
            y_train
        )

        return self._model
    

Let's now define the class where the actual exciting stuff happens.

The `Trainer` class is the one I'll use to train all of the models. It encapsulates the k-fold cross-validation plumbing, predictions, and metrics.

In [None]:
class Trainer(object):
    """
    Builds an ensemble of models using k-fold cross-validation.
    
    Args:
        - df_train: Pandas dataframe containing the train data.
        - df_test: Pandas dataframe containing the test data.
        - folds: How many folds will be used for the k-fold cross-validation
            process.
    """
    
    def __init__(self, df_train, df_test, folds=10):
        self.df_train = df_train
        self.df_test = df_test
        self.folds = folds

        self.predictions_valid = None
        self.predictions_test = None

    def fit(self, models, gpu_enabled=True):
        """
        Fits each one of the supplied models in a k-fold cross-validation manner.
        Args:
            - models: The list of models that will be fitted.
            - gpu_enabled: Whether we want to fit the models using the GPU.
        Returns:
            The list of MSE scores corresponding to each one of the 
            supplied models.
        """
        
        model_scores = dict()
        self.predictions_valid = dict()
        self.predictions_test = dict()

        for model_index in range(len(models)):
            model_scores.setdefault(model_index, [])
            self.predictions_valid.setdefault(model_index, dict())
            self.predictions_test.setdefault(model_index, [])

        self.preprocessed_data = dict()
        
        kfold = model_selection.KFold(
            n_splits=self.folds, 
            shuffle=True, 
            random_state=42
        )

        for fold, (train_idx, valid_idx) in enumerate(kfold.split(self.df_train)):
            X_train, X_valid = self.df_train.iloc[train_idx], self.df_train.iloc[valid_idx]

            y_train = X_train.target
            y_valid = X_valid.target

            X_train = X_train.drop(["target"], axis=1)
            X_valid = X_valid.drop(["target"], axis=1)

            for model_index, model in enumerate(models):
                [xtrain, xvalid, xtest] = model.preprocess([
                    X_train, 
                    X_valid, 
                    self.df_test
                ])

                model.fit(
                    xtrain, 
                    y_train, 
                    X_valid=xvalid, 
                    y_valid=y_valid,
                    gpu_enabled=gpu_enabled
                )

                yhat_valid = model.predict(xvalid)
                yhat_test = model.predict(xtest)

                valid_ids = X_valid.id.values.tolist()
                self.predictions_valid[model_index].update(
                    dict(zip(valid_ids, yhat_valid))
                )

                self.predictions_test[model_index].append(yhat_test)

                rmse = metrics.mean_squared_error(y_valid, yhat_valid, squared=False)
                model_scores[model_index].append(rmse)

                print(f"[FOLD {fold}] Model {model_index + 1} Score: {rmse}")

        print()
        scores = []
        for index in range(len(models)):
            score = np.mean(model_scores[index])
            scores.append(score)
            print(f"Model {index} Overall Score: {score}")

        return scores
    
    def get_prediction_data(self):
        """
        Returns two new Pandas datasets: A train dataset containing 
        the predictions on the validation data, and a test dataset containing 
        the predictions on the test data.
        
        For example, if we fit 3 models, the output datasets will contain a 
        column with the predictions generated with each one of these models. 
        The columns will be named consecutively, i.e. `prediction_0` (predictions 
        generated by the first model,) `prediction_1` (predictions generated
        by the second model,) and `prediction_2` (predictions generated by the 
        third model.)
        
        Both the `id` and `target` columns will be included in the resultant 
        train dataset. The `id` column will also be included in the resultant 
        `test` dataset.
        """
        
        if self.predictions_valid is None:
            return None
        
        df_train_predictions = self.df_train[["id", "target"]]
        df_test_predictions = self.df_test[["id"]]

        number_of_trained_models = len(self.predictions_valid.keys())
        for index in range(number_of_trained_models):
            df_predictions_valid = pd.DataFrame.from_dict(
                self.predictions_valid[index], orient="index"
            ).reset_index()
            df_predictions_valid.columns = ["id", f"prediction_{index}"]

            df_predictions_test = pd.DataFrame({
                "id": self.df_test.id,
                f"prediction_{index}": np.mean(
                    np.column_stack(self.predictions_test[index]
                ), axis=1)
            })

            df_train_predictions = df_train_predictions.merge(
                df_predictions_valid, on="id", how="left"
            )
            df_test_predictions = df_test_predictions.merge(
                df_predictions_test, on="id", how="left"
            )

        return df_train_predictions, df_test_predictions

## Preprocessing pipeline

Let's define the preprocessing transformations that I will use with the original data.

Notice that there are only three different transformations that I will be using:

1. Scaling values using a Min-Max Scaler.
2. Transforming categorical columns to ordinal values.
3. One-hot encoding categorical columns.

I combine all of this later to preprocess the data in different ways.

In [None]:
numerical_preprocessor = pipeline.Pipeline(steps=[
    ("imputer", impute.SimpleImputer(strategy="mean")),
    ("scaler", preprocessing.MinMaxScaler()),
])

categorical_preprocessor = pipeline.Pipeline(steps=[
    ("imputer", impute.SimpleImputer(strategy="most_frequent")),
    ("ordinal", preprocessing.OrdinalEncoder()),
])

onehot_preprocessor = preprocessing.OneHotEncoder(handle_unknown="ignore")

There are three different preprocessors here that I will use to transform the data for the first three different models:

1. `preprocessor1`: Scales numerical columns and converts categorical columns to ordinal values.
2. `preprocessor2`: Scales numerical columns, converts low-order categorical columns to ordinal values, and one-hot encodes high-order categorical columns.
3. `preprocessor3`: Scales numerical columns, converts some of the categorical columns to ordinal values and keeps a list of dummy-generated columns untouched.

Each one of these preprocessors will be used together with one of the models of the first ensemble.

By the way, I tried many different transformations. In the end, these were the ones that gave me the best results.

In [None]:
preprocessor1 = compose.ColumnTransformer(
    transformers=[
        ("numerical", numerical_preprocessor, cont_features),
        ("categorical", categorical_preprocessor, cat_features),
    ]
)

preprocessor2 = compose.ColumnTransformer(
    transformers=[
        ("numerical", numerical_preprocessor, cont_features),
        ("categorical", categorical_preprocessor, [
            "cat0", "cat1", "cat2", "cat6", "cat7", "cat8", "cat9"
        ]),
        ("onehot", onehot_preprocessor, ["cat3", "cat4", "cat5"])
    ]
)

preprocessor3 = compose.ColumnTransformer(
    transformers=[
        ("numerical", numerical_preprocessor, cont_features),
        ("categorical", categorical_preprocessor, ["cat1", "cat5", "cat8"]),
        ("passthrough", "passthrough", ["cat1_A", "cat3_C", "cat8_C", "cat8_E"])
    ]
)

## Building the first ensemble

I decided to build three different models:

1. An `XGBRegressor` paired with `preprocessor1`.
1. An `XGBRegressor` paired with `preprocessor2`.
1. An `XGBRegressor` paired with `preprocessor3`.

Since each model uses slightly different data, they should learn different things that we can later put together in a more robust model. Instead of using three `XGBRegressor` models, a better approach would've been using three completely different models. In this case, since the data is different for each model, the three `XGBRegressor` models do an excellent job of approaching the problem differently.

I also experimented with an additional `LGBMRegressor` and a `CatBoostRegressor` for a total of five models. After a lot of back and forth, the solution with only the three `XGBRegressor` models was much better, so I got rid of the other models.

I tuned the hyperparameters of each one of these models in a separate notebook.

In [None]:
ensemble1_model1 = XGBModel(preprocessor1, model_params = {
    'n_estimators': 9609,
    'colsample_bytree': 0.1023603386994367,
    'learning_rate': 0.05058244812315823,
    'max_depth': 4,
    'reg_alpha': 52.62163397880715,
    'reg_lambda': 0.09796566981609614,
    'subsample': 0.9362560643601167
})


ensemble1_model2 = XGBModel(preprocessor2, model_params = {
    'n_estimators': 7128,
    'reg_alpha': 19.910244642239753,
    'reg_lambda': 1.8655469044829698,
    'subsample': 0.7843892199651581,
    'learning_rate': 0.0315114451657133,
    'max_depth': 5,
    'colsample_bytree': 0.103069950932443
})


ensemble1_model3 = XGBModel(preprocessor3, model_params = {
    'n_estimators': 7144,
    'reg_alpha': 3.9027631496250404e-05,
    'reg_lambda': 38.44801773583271,
    'subsample': 0.6,
    'learning_rate': 0.04150001273205032,
    'max_depth': 3,
    'colsample_bytree': 0.11939190672649105
})

Now that every model of the first ensemble is defined, we can use the `Trainer` class to train them in a cross-validated manner.

In [None]:
ensemble1 = Trainer(df_train, df_test, folds=CROSS_VALIDATION_FOLDS)
ensemble1.fit([ensemble1_model1, ensemble1_model2, ensemble1_model3], GPU_ENABLED)

[FOLD 0] Model 1 Score: 0.7170723419448163
[FOLD 0] Model 2 Score: 0.716975951972156
[FOLD 0] Model 3 Score: 0.7173449693087539
[FOLD 1] Model 1 Score: 0.7167072315356829
[FOLD 1] Model 2 Score: 0.7165407076990261
[FOLD 1] Model 3 Score: 0.7171438385121012
[FOLD 2] Model 1 Score: 0.7160510948987522
[FOLD 2] Model 2 Score: 0.7158262945427482
[FOLD 2] Model 3 Score: 0.7160312657278953
[FOLD 3] Model 1 Score: 0.7179011964034449
[FOLD 3] Model 2 Score: 0.7179614277501329
[FOLD 3] Model 3 Score: 0.718187265874558
[FOLD 4] Model 1 Score: 0.7219106390797846
[FOLD 4] Model 2 Score: 0.7218587912247788
[FOLD 4] Model 3 Score: 0.7222953960855346
[FOLD 5] Model 1 Score: 0.7152015037498958
[FOLD 5] Model 2 Score: 0.7149304408745646
[FOLD 5] Model 3 Score: 0.7153699512878776
[FOLD 6] Model 1 Score: 0.7180949884791823
[FOLD 6] Model 2 Score: 0.7178412762887201
[FOLD 6] Model 3 Score: 0.7184699004006689
[FOLD 7] Model 1 Score: 0.7188960203142062
[FOLD 7] Model 2 Score: 0.7188102517738338
[FOLD 7] Mode

[0.7175691300864897, 0.7174611010357497, 0.7177889817421428]

## Building the second ensemble

At this point, we have the results of the first ensemble of models. We are now going to stack a second ensemble on top of it. The models of this second ensemble will use the predicted results of the first set of models as the input features. 

Let's prepare this new dataset:

1. The training dataset contains the predictions of the first set of models on the validation data.
2. The test dataset contains the predictions of the first set of models on the test data.

The implementation of the `get_prediction_data()` function combines the results of the ensemble models into the format we need.

In [None]:
df_train_ensemble1, df_test_ensemble1 = ensemble1.get_prediction_data()
df_train_ensemble1

Unnamed: 0,id,target,prediction_0,prediction_1,prediction_2
0,1,8.113634,8.412793,8.469507,8.464657
1,2,8.481233,8.359716,8.385630,8.396006
2,3,8.364351,8.206654,8.201128,8.187977
3,4,8.049253,8.412392,8.399385,8.374225
4,6,7.972260,8.187385,8.206850,8.247770
...,...,...,...,...,...
299995,499993,7.945605,8.300077,8.306719,8.319247
299996,499996,7.326118,7.723757,7.708890,7.705111
299997,499997,8.706755,8.248857,8.245758,8.250740
299998,499998,7.229569,7.954183,7.947504,7.987178


We don't need to do any data transformation for this second ensemble because we are working with numerical columns with similar characteristics. 

The `passthrough_preprocessor` transformer lets the relevant columns pass through (every column starting with `prediction_`) and drops the columns we don't need (like `id` and `target`.)

In [None]:
passthrough_preprocessor = compose.ColumnTransformer(
    transformers=[(
        "passthrough", 
        "passthrough", 
        [c for c in df_train_ensemble1.columns.tolist() if c.startswith("prediction")]
    )]
)

For this second ensemble, I built another three models:

1. An `XGBRegressor`.
1. An `LGBMRegressor`.
1. A `CatBoostRegressor`.

In this case, these models will be looking at the exact same data, but their characteristics are different, so hopefully, they'll learn different patterns from the data.

I tuned the hyperparameters of each one of these models in a separate notebook.

Finally, we can also fit all three models using the `Trainer` class. (Here is where the `Trainer` class starts paying off because we don't need to duplicate all of that code.)

In [None]:
ensemble2_model1 = XGBModel(passthrough_preprocessor, model_params={
    'n_estimators': 10911,
    'reg_alpha': 1.1662382319696787e-08,
    'reg_lambda': 18.709461290330285,
    'subsample': 0.5,
    'learning_rate': 0.02841601836601206,
    'max_depth': 2,
    'colsample_bytree': 0.6137187091045387
})


ensemble2_model2 = LGBMModel(passthrough_preprocessor, model_params = {
    'n_estimators': 5000,
    'learning_rate': 0.021040088115256594,
    'max_depth': 1,
    'num_leaves': 136,
    'min_child_samples': 52
})


ensemble2_model3 = CatBoostModel(passthrough_preprocessor, model_params = {
    'iterations': 20969,
    'od_wait': 2248,
    'learning_rate': 0.010572354868717486,
    'reg_lambda': 27.522864565371602,
    'subsample': 0.022762417549602867,
    'random_strength': 30.635308112394423,
    'depth': 1,
    'min_data_in_leaf': 20,
    'leaf_estimation_iterations': 3
})


ensemble2 = Trainer(
    df_train_ensemble1, 
    df_test_ensemble1, 
    folds=CROSS_VALIDATION_FOLDS
)
ensemble2.fit([ensemble2_model1, ensemble2_model2, ensemble2_model3], GPU_ENABLED)

[FOLD 0] Model 1 Score: 0.7170621976081205
[FOLD 0] Model 2 Score: 0.7169762504315812
[FOLD 0] Model 3 Score: 0.7169394905208561
[FOLD 1] Model 1 Score: 0.7163572833101155
[FOLD 1] Model 2 Score: 0.7164912442891678
[FOLD 1] Model 3 Score: 0.7163359144475072
[FOLD 2] Model 1 Score: 0.7157173922945768
[FOLD 2] Model 2 Score: 0.7156887678267895
[FOLD 2] Model 3 Score: 0.7156942801086544
[FOLD 3] Model 1 Score: 0.7175521111494717
[FOLD 3] Model 2 Score: 0.7175668039182104
[FOLD 3] Model 3 Score: 0.7175460447000896
[FOLD 4] Model 1 Score: 0.7217543165494482
[FOLD 4] Model 2 Score: 0.7218103241826395
[FOLD 4] Model 3 Score: 0.7217329428078125
[FOLD 5] Model 1 Score: 0.7147483726458623
[FOLD 5] Model 2 Score: 0.7146625758590391
[FOLD 5] Model 3 Score: 0.7146350497126678
[FOLD 6] Model 1 Score: 0.7175948238586292
[FOLD 6] Model 2 Score: 0.7175494118663038
[FOLD 6] Model 3 Score: 0.717578598693998
[FOLD 7] Model 1 Score: 0.7189303275028172
[FOLD 7] Model 2 Score: 0.7188244169186816
[FOLD 7] Mod

[0.7172655735285699, 0.7172698297667756, 0.7172123198834632]

## Building the final model

We need one more model to produce the final results.

We will stack this model on top of the second ensemble, and its goal will be to combine the predictions of each model into a final prediction.

As before, we will use the data produced by the second ensemble to train and evaluate this final model.

In [None]:
df_train_ensemble2, df_test_ensemble2 = ensemble2.get_prediction_data()
df_train_ensemble2

Unnamed: 0,id,target,prediction_0,prediction_1,prediction_2
0,1,8.113634,8.450144,8.442234,8.460534
1,2,8.481233,8.396575,8.401650,8.395044
2,3,8.364351,8.210033,8.209859,8.207698
3,4,8.049253,8.417482,8.418248,8.409137
4,6,7.972260,8.220647,8.230699,8.219082
...,...,...,...,...,...
299995,499993,7.945605,8.313033,8.317646,8.328501
299996,499996,7.326118,7.679708,7.692613,7.645113
299997,499997,8.706755,8.256454,8.260073,8.257978
299998,499998,7.229569,7.941735,7.939393,7.934277


Just like before, let's use a transformer that allows the essential columns through and use a `Lasso` model to produce the final results.

In [None]:
passthrough_preprocessor = compose.ColumnTransformer(
    transformers=[(
        "passthrough", 
        "passthrough", 
        [c for c in df_train_ensemble1.columns.tolist() if c.startswith("prediction")]
    )]
)

final_model = LassoModel(passthrough_preprocessor, model_params={
    'alpha': 0.0001, 
    'max_iter': 20000
})


trainer = Trainer(
    df_train_ensemble2, 
    df_test_ensemble2, 
    folds=CROSS_VALIDATION_FOLDS
)
trainer.fit([final_model])

[FOLD 0] Model 1 Score: 0.7169627780241739
[FOLD 1] Model 1 Score: 0.7163733060987669
[FOLD 2] Model 1 Score: 0.7156846459771987
[FOLD 3] Model 1 Score: 0.7175379180522874
[FOLD 4] Model 1 Score: 0.7217435344107982
[FOLD 5] Model 1 Score: 0.71462950096099
[FOLD 6] Model 1 Score: 0.7175654792322992
[FOLD 7] Model 1 Score: 0.7188885988496246
[FOLD 8] Model 1 Score: 0.7197768321371735
[FOLD 9] Model 1 Score: 0.713071325968279

Model 0 Overall Score: 0.7172233919711591


[0.7172233919711591]

## Preparing the final submission

We trained a single model to get the final results directly from `trainer3.predictions_test[0]`. 

This is an array with the prediction results of each one of the 10-fold models that we trained. We can compute the mean of these results and create the submission file.

In [None]:
predictions = trainer.predictions_test[0]
target = np.mean(np.column_stack(predictions), axis=1)
output = pd.DataFrame({'id': df_test.id, 'target': target})
output.to_csv('submission.csv', index=False)