Ensembling of predictions #59

andwurl · 2022-08-05T10:09:55Z

Dear xgbse-team,

what would be the correct way for ensembling of predictions? Let's say that I have 5 StackedWeibull models and would like to ensemble their predictions on a test dataset. Should I average the interval predictions?

Thank you very much.

davivieirab · 2022-08-07T21:56:31Z

Hello @andwurl . Are those 5 models trained using the same dataset (or bootstrap samples from the same dataset) - ex.: a bagging estimator?
If that is the case, I propose that you use our abstraction called XGBSEBootstrapEstimator. It is a bagging estimator that receives a base model and the number of models to train.
Since we already have it implemented, we suggest its usage.
If you are want to know further, the aggregation of predictions in a test dataset is made using the following:

Point estimate: take the mean of the survival curves predicted from all n_models used.
If you are interested in using confidence intervals (upper and lower intervals): take percentiles from the survival curves predicted from the n_models used. Ex.: you have 5 base models, you will have 5 survival values for each point in time. Retrieve percentile 20 and 80 as inferior and superior intervals for each point in time, for example.

You can find examples on how to use our module XGBSEBootstrapEstimator in our "how_xgbse_works" notebook.

Code example (read the beginning of the notebook to get the necessary import statements and constants/parameters used below) - the examples uses a XGBSEDebiasedBCE as base_model, but it is also available for the XGBSEStackedWeibull:

# base model as BCE
base_model = XGBSEDebiasedBCE(PARAMS_XGB_AFT, PARAMS_LR)

# bootstrap meta estimator
bootstrap_estimator = XGBSEBootstrapEstimator(base_model, n_estimators=20)

# fitting the meta estimator
bootstrap_estimator.fit(
    X_train,
    y_train,
    validation_data=(X_valid, y_valid),
    early_stopping_rounds=10,
    time_bins=TIME_BINS,
)

# predicting
mean, upper_ci, lower_ci = bootstrap_estimator.predict(X_test, return_ci=True)

andwurl added the documentation Improvements or additions to documentation label Aug 5, 2022

davivieirab added question Further information is requested and removed documentation Improvements or additions to documentation labels Aug 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensembling of predictions #59

Ensembling of predictions #59

andwurl commented Aug 5, 2022

davivieirab commented Aug 7, 2022 •

edited

Ensembling of predictions #59

Ensembling of predictions #59

Comments

andwurl commented Aug 5, 2022

davivieirab commented Aug 7, 2022 • edited

davivieirab commented Aug 7, 2022 •

edited