### Layered LightGBM Ensemble: Time-Sliced Training on Clustered Features and Ridge Stacking

In this notebook, I use the hierarchical clustering performed on the masked features in the feature_engineering notebook and aggregate the features in each cluster by taking the simple average of all features in the cluster. I experimented with alternative aggregation methods, such as weighted averages, but they did not provide any improvement over the plain average. The order book features are then added to these 174 masked aggregate features for training the boosted trees.

The training data is split into 80% training and 20% validation. Because of the time-series nature of the data, the dataset is first sorted by time, with the earliest 80% used for training and the latest 20% reserved for validation. Both the training and validation sets are further divided into 20 subsamples by selecting every 20th row into a distinct slice. Each slice is used to train and tune a boosted tree. Finally, to ensemble their outputs, I train a ridge regression model on the predictions from all trees.

In [1]:
# libraries and settings
import gc
import pandas as pd
import numpy as np
from scipy.cluster.hierarchy import linkage, fcluster
from scipy.spatial.distance import squareform
from scipy.stats import pearsonr
from sklearn.linear_model import Ridge
import lightgbm as lgb
import optuna
import warnings
warnings.filterwarnings('ignore')

In [2]:
df_train = pd.read_parquet('../data/input/train.parquet')
df_test = pd.read_parquet('../data/input/test.parquet')

In [3]:
# Clustering the features
X_cols = df_train.columns[df_train.columns.str.startswith('X')]
clusters = pd.Series(pd.read_csv('../data/intermediate/clusters.csv', index_col = 0).iloc[:,0])

# Group columns by cluster and take the mean across features inside each cluster
df_reduced = (
    df_train[X_cols].groupby(clusters, axis=1)
      .mean()
    #   .sort_index(axis=1) 
)

df_reduced.set_index(df_train.index, inplace = True)
df_reduced = pd.merge(
    df_train[['bid_qty', 'ask_qty', 'buy_qty', 'sell_qty', 'volume', 'label']],
    df_reduced,
    left_index = True, 
    right_index = True
)


df_reduced_test = (
    df_test[X_cols].groupby(clusters, axis=1)
      .mean()
    #   .sort_index(axis=1) 
)

df_reduced_test.set_index(df_test.index, inplace = True)
df_reduced_test = pd.merge(
    df_test[['bid_qty', 'ask_qty', 'buy_qty', 'sell_qty', 'volume']],
    df_reduced_test,
    left_index = True, 
    right_index = True
)


#### Ensemble Prediction Model
I design a class to pipeline the tuning and training procedures. First, the training and validation datasets are split into 20 subsamples by selecting every 20th row (after sorting teh data based on their timestamps) into each subsample. For each subsample, I train a boosted tree on the corresponding training data, tuning them on their paired validation subsample using the correlation metric. The correlation metric is chosen because it was the evaluation metric used in the competition.

Once all 20 tuned trees are trained, I generate predictions from them on the full training and validation datasets. These stacked predictions form the inputs to a ridge regression model, which is trained on the training predictions. The ridge regularization parameter is tuned based on the correlation metric evaluated on the validation predictions.

Finally, after identifying the optimal ridge parameter, I retrain all boosted trees with their tuned hyperparameters on the full dataset (including the validation sample). I then refit the ridge regression using the optimized regularization parameter on the stacked predictions from the full data. This final ensemble model is used to produce predictions on the test dataset.

In [30]:
""" **************** ENSEMBLE MODEL ****************** """
class LightGBMTimeSeriesEnsemble:
    def __init__(self, n_models, n_trials, alphas=None, metric='rmse', random_seed=42):
        """
        n_models  : number of base learners (e.g. 50)
        n_trials  : Optuna trials per learner
        alphas    : candidate ridge regularization strengths
        metric    : LightGBM eval metric (for early stopping)
        """
        self.n_models = n_models
        self.n_trials = n_trials
        self.metric = metric
        self.alphas = alphas if alphas is not None else np.logspace(-2, 5, 30)
        self.seed = random_seed

        self.best_params_list = []
        self.best_iterations = []
        self.models = []
        self.ridge = None
        self.best_alpha = None

    def _optuna_objective(self, trial, X_tr, y_tr, X_val, y_val):
        params = {
            'objective': 'regression',
            'metric': 'rmse',  
            'verbosity': -1,
            'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.1, log=True),
            'num_leaves': trial.suggest_int('num_leaves', 16, 64),
            'max_depth': trial.suggest_int('max_depth', 2, 6),
            'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 10, 100),
            'feature_fraction': trial.suggest_float('feature_fraction', 0.1, 0.5),
            'bagging_fraction': trial.suggest_float('bagging_fraction', 0.2, 0.6),
            'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
            'lambda_l1': trial.suggest_float('lambda_l1', 0.0, 5.0),
            'lambda_l2': trial.suggest_float('lambda_l2', 0.0, 5.0),
            'num_boost_round': trial.suggest_int('num_boost_round', 200,500),
            'early_stopping_rounds': 50
        }
        model = lgb.train(
            params,
            lgb.Dataset(X_tr, y_tr),
            valid_sets=[lgb.Dataset(X_val, y_val)]
        )
        pred = model.predict(X_val)
        corr, _ = pearsonr(pred, y_val)
        return -corr

    def fit(self, X_train, y_train, X_val, y_val):
        """
        1) Fit each base model on X_train[i::n] versus y_train[i::n] with Optuna tuning.
        Next steps are carried out in the fit_ridge function:
        2) Make predictions of each base model on *full* X_train => (n_samples_train x n_models)
        3) Ridge-fit on (train_preds, y_train)
        4) Select alpha by evaluating ridge on full X_val predictions
        """
        self.models = []
        self.best_params_list = []
        self.best_iterations = []

        
        for i in range(self.n_models):
            print(f'Training model {i+1}/{self.n_models}')
            Xt = X_train.iloc[i::self.n_models]
            yt = y_train.iloc[i::self.n_models]
            Xv = X_val.iloc[i::self.n_models]
            yv = y_val.iloc[i::self.n_models]

            # Tune
            sampler = optuna.samplers.TPESampler(seed=self.seed)
            study = optuna.create_study(direction='minimize', sampler=sampler)
            study.optimize(
                lambda trial: self._optuna_objective(trial, Xt, yt, Xv, yv),
                n_trials=self.n_trials,
            )
            best_params = study.best_params
            best_params['seed'] = self.seed
            

            # Train model with best params
            model = lgb.train(
                {
                **best_params, 
                'objective':'regression',
                'metric':self.metric,
                'verbosity':-1, 
                'early_stopping_rounds': 50
                },
                lgb.Dataset(Xt, yt),
                valid_sets=[lgb.Dataset(Xv, yv)]
            )
            best_params['num_boost_round'] = model.best_iteration
            self.best_params_list.append(best_params)
            self.best_iterations.append(model.best_iteration)
            self.models.append(model)

        self.fit_ridge(X_train, y_train, X_val, y_val, self.alphas)
        
        
    def fit_ridge(self, X_train, y_train, X_val, y_val, alphas):
        """
        1) Make predictions of each base model on *full* X_train => (n_samples_train x n_models)
        2) Ridge-fit on (train_preds, y_train)
        3) Select alpha by evaluating ridge on full X_val predictions
        """
        # --- predictions on full training set ---
        train_stack = np.column_stack([m.predict(X_train, num_iteration=it)
                                       for m, it in zip(self.models, self.best_iterations)])

        # --- search alpha by evaluating on validation set ---
        val_stack = np.column_stack([m.predict(X_val, num_iteration=it)
                                     for m, it in zip(self.models, self.best_iterations)])

        self.alphas = alphas
        best_corr = -np.inf
        for a in self.alphas:
            r = Ridge(alpha=a)
            r.fit(train_stack, y_train)
            corr, _ = pearsonr(r.predict(val_stack), y_val)
            print(f' ridge alpha: {a:.5f}, val corr: {corr:.5f}')
            if corr > best_corr:
                best_corr = corr
                self.best_alpha = a
                self.ridge = r

        print(f'Best ridge alpha: {self.best_alpha:.5f}, val corr: {best_corr:.5f}')
    
    
    def refit_full(self, X_full, y_full):
        """
        Retrain base models and the ridge regression model on full data using best params and alpha.
        """
        self.models = []
        for i, params in enumerate(self.best_params_list):
            print(f'Refitting model {i+1}/{self.n_models} on full data...')
            model = lgb.train(
                {
                **params,  
                'objective':'regression',
                'metric':self.metric,
                'verbosity':-1
                },
                lgb.Dataset(X_full, y_full)
            )
            self.models.append(model)
        
        full_stack = np.column_stack([
            m.predict(X_full, num_iteration=it) for m, it in zip(self.models, self.best_iterations)
        ])
        self.ridge = Ridge(alpha=self.best_alpha)
        self.ridge.fit(full_stack, y_full)

    def predict(self, X):
        """
        Predict stacked output on full dataset X
        """
        base = np.column_stack([m.predict(X, num_iteration=it)
                                for m, it in zip(self.models, self.best_iterations)])
        return self.ridge.predict(base)

In [31]:
df_reduced.sort_index(inplace = True)
n_train = round(df_reduced.shape[0]*0.8)
X_train = df_reduced.iloc[:n_train].drop(columns = ['label'])
X_val = df_reduced.iloc[n_train:].drop(columns = ['label'])
y_train = df_reduced.iloc[:n_train]['label'] 
y_val = df_reduced.iloc[n_train:]['label'] 

ensemble = LightGBMTimeSeriesEnsemble(n_models=20, n_trials=50)
ensemble.fit(X_train, y_train, X_val, y_val)

[I 2025-08-17 23:53:55,199] A new study created in memory with name: no-name-736ef66f-4529-46a0-b69c-d58d9e9e04d8


Training model 1/20


[I 2025-08-17 23:53:55,594] Trial 0 finished with value: -0.08518368256045201 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.08518368256045201.
[I 2025-08-17 23:53:55,982] Trial 1 finished with value: -0.0922343442823971 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.0922343442823971.
[I 2025-08-17 23:53:56,249] Trial 2 finished with value: -0.07961012521227545 and parameters: {'learning_rate': 0.03126143958203108, 'num

Training model 2/20


[I 2025-08-17 23:54:14,862] Trial 0 finished with value: -0.10677102700565064 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.10677102700565064.
[I 2025-08-17 23:54:15,369] Trial 1 finished with value: -0.10277257671291347 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.10677102700565064.
[I 2025-08-17 23:54:15,641] Trial 2 finished with value: -0.11384222314486506 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 3/20


[I 2025-08-17 23:54:35,915] Trial 0 finished with value: -0.07832794650006403 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.07832794650006403.
[I 2025-08-17 23:54:36,307] Trial 1 finished with value: -0.08508523191612072 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.08508523191612072.
[I 2025-08-17 23:54:36,606] Trial 2 finished with value: -0.10165143982728986 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 4/20


[I 2025-08-17 23:54:58,075] Trial 0 finished with value: -0.08405135641618583 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.08405135641618583.
[I 2025-08-17 23:54:58,474] Trial 1 finished with value: -0.08309718577391753 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.08405135641618583.
[I 2025-08-17 23:54:58,760] Trial 2 finished with value: -0.08801204145527987 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 5/20


[I 2025-08-17 23:55:16,237] Trial 0 finished with value: -0.0933281434439414 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.0933281434439414.
[I 2025-08-17 23:55:16,751] Trial 1 finished with value: -0.08778085532288514 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.0933281434439414.
[I 2025-08-17 23:55:17,020] Trial 2 finished with value: -0.06572577208151804 and parameters: {'learning_rate': 0.03126143958203108, 'num_

Training model 6/20


[I 2025-08-17 23:55:43,825] Trial 0 finished with value: -0.09448498846019242 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.09448498846019242.
[I 2025-08-17 23:55:45,272] Trial 1 finished with value: -0.10401058978758815 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.10401058978758815.
[I 2025-08-17 23:55:46,198] Trial 2 finished with value: -0.10968328209601992 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 7/20


[I 2025-08-17 23:56:15,588] Trial 0 finished with value: -0.10466506132518918 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.10466506132518918.
[I 2025-08-17 23:56:16,261] Trial 1 finished with value: -0.097895858480785 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.10466506132518918.
[I 2025-08-17 23:56:16,628] Trial 2 finished with value: -0.08686940949789079 and parameters: {'learning_rate': 0.03126143958203108, 'num

Training model 8/20


[I 2025-08-17 23:56:40,783] Trial 0 finished with value: -0.10831212830672488 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.10831212830672488.
[I 2025-08-17 23:56:41,339] Trial 1 finished with value: -0.09850568640529804 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.10831212830672488.
[I 2025-08-17 23:56:41,631] Trial 2 finished with value: -0.10682773986054586 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 9/20


[I 2025-08-17 23:57:10,665] Trial 0 finished with value: -0.10670692052457231 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.10670692052457231.
[I 2025-08-17 23:57:11,541] Trial 1 finished with value: -0.10081460210403928 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.10670692052457231.
[I 2025-08-17 23:57:12,256] Trial 2 finished with value: -0.10785668122132117 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 10/20


[I 2025-08-17 23:57:39,840] Trial 0 finished with value: -0.07756381903014856 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.07756381903014856.
[I 2025-08-17 23:57:40,355] Trial 1 finished with value: -0.09165308658323815 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.09165308658323815.
[I 2025-08-17 23:57:40,672] Trial 2 finished with value: -0.09144151974951942 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 11/20


[I 2025-08-17 23:58:02,030] Trial 0 finished with value: -0.08530575541907744 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.08530575541907744.
[I 2025-08-17 23:58:02,750] Trial 1 finished with value: -0.09127934377502596 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.09127934377502596.
[I 2025-08-17 23:58:03,406] Trial 2 finished with value: -0.11800581814777752 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 12/20


[I 2025-08-17 23:58:35,783] Trial 0 finished with value: -0.09402307167140271 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.09402307167140271.
[I 2025-08-17 23:58:37,532] Trial 1 finished with value: -0.10725339710431174 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.10725339710431174.
[I 2025-08-17 23:58:38,405] Trial 2 finished with value: -0.10874866751212012 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 13/20


[I 2025-08-17 23:59:06,572] Trial 0 finished with value: -0.10199380826641413 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.10199380826641413.
[I 2025-08-17 23:59:07,151] Trial 1 finished with value: -0.09655924908897846 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.10199380826641413.
[I 2025-08-17 23:59:07,565] Trial 2 finished with value: -0.12962646081100826 and parameters: {'learning_rate': 0.03126143958203108, 'n

Training model 14/20


[I 2025-08-17 23:59:33,672] Trial 0 finished with value: -0.10334269984461285 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.10334269984461285.
[I 2025-08-17 23:59:34,212] Trial 1 finished with value: -0.103837525515039 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.103837525515039.
[I 2025-08-17 23:59:34,548] Trial 2 finished with value: -0.11252758739272702 and parameters: {'learning_rate': 0.03126143958203108, 'num_l

Training model 15/20


[I 2025-08-18 00:00:00,465] Trial 0 finished with value: -0.0966777585805138 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.0966777585805138.
[I 2025-08-18 00:00:01,197] Trial 1 finished with value: -0.10049490451716844 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.10049490451716844.
[I 2025-08-18 00:00:01,618] Trial 2 finished with value: -0.1025573701647316 and parameters: {'learning_rate': 0.03126143958203108, 'num_

Training model 16/20


[I 2025-08-18 00:00:37,645] Trial 0 finished with value: -0.10788415576198043 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.10788415576198043.
[I 2025-08-18 00:00:38,342] Trial 1 finished with value: -0.1007116602177099 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.10788415576198043.
[I 2025-08-18 00:00:38,763] Trial 2 finished with value: -0.11144346260207195 and parameters: {'learning_rate': 0.03126143958203108, 'nu

Training model 17/20


[I 2025-08-18 00:01:08,283] Trial 0 finished with value: -0.09862275228698056 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.09862275228698056.
[I 2025-08-18 00:01:08,773] Trial 1 finished with value: -0.1010564209098334 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.1010564209098334.
[I 2025-08-18 00:01:09,078] Trial 2 finished with value: -0.11088182080285212 and parameters: {'learning_rate': 0.03126143958203108, 'num

Training model 18/20


[I 2025-08-18 00:01:44,478] Trial 0 finished with value: -0.09000883625344258 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.09000883625344258.
[I 2025-08-18 00:01:45,051] Trial 1 finished with value: -0.08584364499180627 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.09000883625344258.
[I 2025-08-18 00:01:45,432] Trial 2 finished with value: -0.1113206389535658 and parameters: {'learning_rate': 0.03126143958203108, 'nu

Training model 19/20


[I 2025-08-18 00:02:10,163] Trial 0 finished with value: -0.09004110142263866 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.09004110142263866.
[I 2025-08-18 00:02:10,801] Trial 1 finished with value: -0.107317367142029 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 1 with value: -0.107317367142029.
[I 2025-08-18 00:02:11,179] Trial 2 finished with value: -0.10383785525240426 and parameters: {'learning_rate': 0.03126143958203108, 'num_l

Training model 20/20


[I 2025-08-18 00:02:46,152] Trial 0 finished with value: -0.0964290803536345 and parameters: {'learning_rate': 0.015355286838886862, 'num_leaves': 62, 'max_depth': 5, 'min_data_in_leaf': 64, 'feature_fraction': 0.1624074561769746, 'bagging_fraction': 0.26239780813448105, 'bagging_freq': 1, 'lambda_l1': 4.330880728874676, 'lambda_l2': 3.005575058716044, 'num_boost_round': 413}. Best is trial 0 with value: -0.0964290803536345.
[I 2025-08-18 00:02:46,808] Trial 1 finished with value: -0.08307061552441082 and parameters: {'learning_rate': 0.005318033256270142, 'num_leaves': 63, 'max_depth': 6, 'min_data_in_leaf': 29, 'feature_fraction': 0.17272998688284025, 'bagging_fraction': 0.27336180394137355, 'bagging_freq': 4, 'lambda_l1': 2.6237821581611893, 'lambda_l2': 2.1597250932105787, 'num_boost_round': 287}. Best is trial 0 with value: -0.0964290803536345.
[I 2025-08-18 00:02:47,180] Trial 2 finished with value: -0.08358536910568948 and parameters: {'learning_rate': 0.03126143958203108, 'num_

 ridge alpha: 0.01000, val corr: 0.09592
 ridge alpha: 0.01743, val corr: 0.09592
 ridge alpha: 0.03039, val corr: 0.09593
 ridge alpha: 0.05298, val corr: 0.09593
 ridge alpha: 0.09237, val corr: 0.09593
 ridge alpha: 0.16103, val corr: 0.09593
 ridge alpha: 0.28072, val corr: 0.09594
 ridge alpha: 0.48939, val corr: 0.09595
 ridge alpha: 0.85317, val corr: 0.09597
 ridge alpha: 1.48735, val corr: 0.09600
 ridge alpha: 2.59294, val corr: 0.09605
 ridge alpha: 4.52035, val corr: 0.09615
 ridge alpha: 7.88046, val corr: 0.09631
 ridge alpha: 13.73824, val corr: 0.09658
 ridge alpha: 23.95027, val corr: 0.09704
 ridge alpha: 41.75319, val corr: 0.09779
 ridge alpha: 72.78954, val corr: 0.09899
 ridge alpha: 126.89610, val corr: 0.10087
 ridge alpha: 221.22163, val corr: 0.10369
 ridge alpha: 385.66204, val corr: 0.10765
 ridge alpha: 672.33575, val corr: 0.11261
 ridge alpha: 1172.10230, val corr: 0.11776
 ridge alpha: 2043.35972, val corr: 0.12187
 ridge alpha: 3562.24789, val corr: 0.1

In [34]:
# further fine-tuning the ridge regression regularization parameter
ensemble.fit_ridge(X_train, y_train, X_val, y_val, range(6000, 15000, 500))

 ridge alpha: 6000.00000, val corr: 0.12500
 ridge alpha: 6500.00000, val corr: 0.12505
 ridge alpha: 7000.00000, val corr: 0.12508
 ridge alpha: 7500.00000, val corr: 0.12509
 ridge alpha: 8000.00000, val corr: 0.12510
 ridge alpha: 8500.00000, val corr: 0.12510
 ridge alpha: 9000.00000, val corr: 0.12510
 ridge alpha: 9500.00000, val corr: 0.12510
 ridge alpha: 10000.00000, val corr: 0.12509
 ridge alpha: 10500.00000, val corr: 0.12508
 ridge alpha: 11000.00000, val corr: 0.12507
 ridge alpha: 11500.00000, val corr: 0.12506
 ridge alpha: 12000.00000, val corr: 0.12505
 ridge alpha: 12500.00000, val corr: 0.12503
 ridge alpha: 13000.00000, val corr: 0.12502
 ridge alpha: 13500.00000, val corr: 0.12501
 ridge alpha: 14000.00000, val corr: 0.12500
 ridge alpha: 14500.00000, val corr: 0.12498
Best ridge alpha: 8500.00000, val corr: 0.12510


In [35]:
ensemble.refit_full(df_reduced.drop(columns='label'), df_reduced['label'])

y_test_pred = ensemble.predict(df_reduced_test)
y_test_pred_pd = pd.Series(y_test_pred, name = 'prediction')
y_test_pred_pd.index = df_test.index
y_test_pred_pd.to_csv('../data/output/submission4.csv')

Refitting model 1/20 on full data...
Refitting model 2/20 on full data...
Refitting model 3/20 on full data...
Refitting model 4/20 on full data...
Refitting model 5/20 on full data...
Refitting model 6/20 on full data...
Refitting model 7/20 on full data...
Refitting model 8/20 on full data...
Refitting model 9/20 on full data...
Refitting model 10/20 on full data...
Refitting model 11/20 on full data...
Refitting model 12/20 on full data...
Refitting model 13/20 on full data...
Refitting model 14/20 on full data...
Refitting model 15/20 on full data...
Refitting model 16/20 on full data...
Refitting model 17/20 on full data...
Refitting model 18/20 on full data...
Refitting model 19/20 on full data...
Refitting model 20/20 on full data...
