# A Basic Ensembling Technique with Cross Validation

This notebook demonstrates how to ensemble predictions from two different models. For the ensembling, we need the out-of-fold (OOF) predictions as well as the test (submission) predictions of the two models. The OOF predictions will be used as the features to train an ensemble regressor to predict the OOF target values. The coefficients of the ensemble regressor will then be used as the weights of each model's test predictions.

## Load Packages

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import RidgeCV

## Competition Metric

Competition metric is the mean absolute error, which is calculated only over the inspiration phase data (`u_out==0`). 

In [None]:
def mae(ytrue, ypred, uout=None):
    if isinstance(uout, pd.Series):
        print(f'MAE (Inspiration Phase):')
        return np.mean(np.abs((ytrue - ypred)[uout == 0]))
    else:
        print('MAE (All Phases):')
        return np.mean(np.abs((ytrue - ypred)))

## Load Data

From the training data, we will need the training targets (pressure values). Since the competition metric, mean absolute error, is evaluated only on the inspiration phase (`u_out==0`), we also need the u_out values.

In [None]:
data = pd.read_csv('../input/ventilator-pressure-prediction/train.csv', usecols=['pressure', 'u_out'])
ytrue = data.pressure
uout = data.u_out

## Postprocessing Tools

In [None]:
pressure_sorted = np.sort(data['pressure'].unique())
PRESSURE_MIN = pressure_sorted[0]
PRESSURE_MAX = pressure_sorted[-1]
PRESSURE_STEP = pressure_sorted[1] - pressure_sorted[0]

def post_process(pressure):
    pressure = np.round((pressure - PRESSURE_MIN) / PRESSURE_STEP) * PRESSURE_STEP + PRESSURE_MIN
    pressure = np.clip(pressure, PRESSURE_MIN, PRESSURE_MAX)
    return pressure

## Load the OOF and the Submission Predictions

In [None]:
oof1 = np.load('../input/vpp-156-oof/oof_preds.npy')
sub1 = pd.read_csv('../input/tfbidirectional156/submission_median_round_tfbidirec.csv')
sub1.pressure = post_process(sub1.pressure)
print(mae(ytrue, oof1, uout))

oof2 = np.load('../input/gb-vpp-why-so-serious/oof_preds.npy')
sub2 = pd.read_csv('../input/gb-vpp-why-so-serious/submission_median_round.csv')
print(mae(ytrue, oof2, uout))

oof3 = np.load('../input/gb-vpp-to-infinity-and-beyond-td/oof_preds.npy')
sub3 = pd.read_csv('../input/gb-vpp-to-infinity-and-beyond-td/submission_median_round.csv')
print(mae(ytrue, oof3, uout))


oof4 = pd.read_csv('../input/ventilator-train-classification/exp080_conti_rc/oof.csv', usecols=['oof'])
oof4 = oof4.to_numpy().ravel()
sub4 = pd.read_csv('../input/ventilator-train-classification/exp080_conti_rc/submission_median_pp.csv')
print(mae(ytrue, oof4, uout))


## Input & Target Values

In [None]:
X = np.stack([oof1, oof2, oof3, oof4], 1)[uout == 0]
y = ytrue[uout == 0]

print(f'X shape: {X.shape}')
print(f'y shape: {y.shape}')

Since we ensemble four models, we have only four features for each timestep as can be seen above.

## Regression Ensembler Training
The ensemble weights will be estimated with the Ridge regression method.

In [None]:
lin_reg = RidgeCV(alphas=np.logspace(-3,10, 20))
lin_reg.fit(X, y)
pred = lin_reg.predict(X)
print(mae(y, pred))
print(f'Ensemble Weights: {lin_reg.coef_}')
print(f'Sum of weights: {sum(lin_reg.coef_)}')

The MAE score printed above is the score of the combined models with the ensemble weights. It's 0.015 better than the best scoring model! And the best part of this process is that this score is cross validated; so we don't need to worry about LB overfitting!

Furthermore, the sum of the weights are 1, confirming that our method is working fine.

## Ensembling the Predictions

In [None]:
submission = pd.read_csv('../input/ventilator-pressure-prediction/sample_submission.csv')
for sub in zip([sub1, sub2, sub3, sub4], lin_reg.coef_):
    submission.pressure += sub[0].pressure * sub[1]

In [None]:
submission.to_csv('submission.csv', index=False)

## Post Processing

In [None]:
submission.pressure = post_process(submission.pressure)
submission.to_csv('submission_pp.csv', index=False)