Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensembling of predictions #59

Open
andwurl opened this issue Aug 5, 2022 · 1 comment
Open

Ensembling of predictions #59

andwurl opened this issue Aug 5, 2022 · 1 comment
Labels
question Further information is requested

Comments

@andwurl
Copy link

andwurl commented Aug 5, 2022

Dear xgbse-team,

what would be the correct way for ensembling of predictions? Let's say that I have 5 StackedWeibull models and would like to ensemble their predictions on a test dataset. Should I average the interval predictions?

Thank you very much.

@andwurl andwurl added the documentation Improvements or additions to documentation label Aug 5, 2022
@davivieirab davivieirab added question Further information is requested and removed documentation Improvements or additions to documentation labels Aug 7, 2022
@davivieirab
Copy link
Contributor

davivieirab commented Aug 7, 2022

Hello @andwurl . Are those 5 models trained using the same dataset (or bootstrap samples from the same dataset) - ex.: a bagging estimator?
If that is the case, I propose that you use our abstraction called XGBSEBootstrapEstimator. It is a bagging estimator that receives a base model and the number of models to train.
Since we already have it implemented, we suggest its usage.
If you are want to know further, the aggregation of predictions in a test dataset is made using the following:

  • Point estimate: take the mean of the survival curves predicted from all n_models used.
  • If you are interested in using confidence intervals (upper and lower intervals): take percentiles from the survival curves predicted from the n_models used. Ex.: you have 5 base models, you will have 5 survival values for each point in time. Retrieve percentile 20 and 80 as inferior and superior intervals for each point in time, for example.

You can find examples on how to use our module XGBSEBootstrapEstimator in our "how_xgbse_works" notebook.

Code example (read the beginning of the notebook to get the necessary import statements and constants/parameters used below) - the examples uses a XGBSEDebiasedBCE as base_model, but it is also available for the XGBSEStackedWeibull:

# base model as BCE
base_model = XGBSEDebiasedBCE(PARAMS_XGB_AFT, PARAMS_LR)

# bootstrap meta estimator
bootstrap_estimator = XGBSEBootstrapEstimator(base_model, n_estimators=20)

# fitting the meta estimator
bootstrap_estimator.fit(
    X_train,
    y_train,
    validation_data=(X_valid, y_valid),
    early_stopping_rounds=10,
    time_bins=TIME_BINS,
)

# predicting
mean, upper_ci, lower_ci = bootstrap_estimator.predict(X_test, return_ci=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants