Ensemble model only using 2 models to ensemble #478

alitirmizi23 · 2021-10-20T12:27:26Z

When I look at the readme.md file of the Ensemble folder, it shows only 2 models out of so many others that it used to ensemble. Is there a reason for this? Also, when I look at the Ensemble_stacked, it shows just 1, "Ensemble" model as the one used for stack_ensemble.

alitirmizi23 · 2021-10-20T12:31:20Z

Here is the screenshot:

alitirmizi23 · 2021-10-20T12:32:08Z

Shouldn't it ensemble all trained models?

pplonski · 2021-10-20T14:13:44Z

Ensemble is trying all available models. If models don't improve the performance they are not added to the final ensemble. It is using this algorithm.

If only one model is selected in ensemble then the ensemble is not selected as the best model.

Two models in the ensemble are fine. In your example 75% of predictions are from 12_CatBoost and 25% from 4_Default_CatBoost.

alitirmizi23 · 2021-10-20T14:39:51Z

Ensemble is trying all available models. If models don't improve the performance they are not added to the final ensemble. It is using this algorithm.

If only one model is selected in ensemble then the ensemble is not selected as the best model.

Two models in the ensemble are fine. In your example 75% of predictions are from 12_CatBoost and 25% from 4_Default_CatBoost.

Okay that helps. Thanks. Could I also know which stacking/ensemble_stacking algo is being used? I also see only 1 model (Ensemble) in stacking reader.md

pplonski · 2021-10-20T15:49:58Z

For stacking it is using 5 best models from each algorithm (except baseline, linear model, decision tree). And prediction from 5 best models from each algorithm plus input data are the new input for stacked models.

If ensemble has only one model in it, it means that it couldn't build the ensemble of at least 2 models with better performance.

alitirmizi23 · 2021-10-20T15:52:28Z

Thanks @pplonski for your prompt feedback. Closing the issue.

alitirmizi23 · 2021-10-20T20:50:25Z

Hi @pplonski, just one more thing - the best model - at the end of training isn't actually the best. As you can see below, the best model is actually a stacked catboost one with F1 score of 0.73. But the output says its the "Ensemble" model which actually has F1 score of 0.65..

alitirmizi23 · 2021-10-20T21:17:35Z

Hi @pplonski, just one more thing - the best model - at the end of training isn't actually the best. As you can see below, the best model is actually a stacked catboost one with F1 score of 0.73. But the output says its the "Ensemble" model which actually has F1 score of 0.65..

Also, the predict() function seems to output class 1 even if probability for "prediction_1" is <0.5 (look at the below screenshot).

Is automl calibrating the classifier too?? Didn't read about it anywhere in the documentation. I just gave sample_weights to class-1 since its a minority class in my use case, but that's all.

pplonski · 2021-10-21T06:48:23Z

In your example, the model with highest F1 score is not selected as the best because you set the limit for maximum prediction time on single sample (maybe you are using Performance mode?). If you have the limit for prediction time, then there is selected model with highest performance and prediction time below the limit.

For computing the labels there is threshold used. It doesn't need to be 0.5, please check your model README.md to check the threshold value (the threshold which maximizes accuracy).

alitirmizi23 · 2021-10-21T08:34:31Z

Okay, thank you. That makes sense.
Could I also know why Catboost models do not have SHAP plots (importance, dependence plots) whereas all the others (Xgboost, LightGBM etc) have them? I understand all these could be different topics but appreciate your feedback

pplonski · 2021-10-21T08:43:18Z

I had problem with running SHAP + CatBoost - too long to compute or just throws errors (dont remember now), but I need to disable it.

alitirmizi23 closed this as completed Oct 20, 2021

alitirmizi23 reopened this Oct 20, 2021

alitirmizi23 closed this as completed Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensemble model only using 2 models to ensemble #478

Ensemble model only using 2 models to ensemble #478

alitirmizi23 commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

pplonski commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

pplonski commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021 •

edited

pplonski commented Oct 21, 2021

alitirmizi23 commented Oct 21, 2021 •

edited

pplonski commented Oct 21, 2021

Ensemble model only using 2 models to ensemble #478

Ensemble model only using 2 models to ensemble #478

Comments

alitirmizi23 commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

pplonski commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

pplonski commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021

alitirmizi23 commented Oct 20, 2021 • edited

pplonski commented Oct 21, 2021

alitirmizi23 commented Oct 21, 2021 • edited

pplonski commented Oct 21, 2021

alitirmizi23 commented Oct 20, 2021 •

edited

alitirmizi23 commented Oct 21, 2021 •

edited