What is the feature importance returned by 'gain' ? #1842

AlexandraBomane · 2018-11-13T15:15:19Z

Does the output of LGBMClassifier().booster_.feature_importance(importance_type='gain') is equivalent to gini importances which used by RandomForestClassifier provided by Scikit-Learn (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) ?

The text was updated successfully, but these errors were encountered:

julioasotodv · 2018-11-20T16:50:49Z

Well, they are roughly equivalent. The Random Forest implementation in sklearn is based on Breiman's paper (https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf); therefore, the loss (objective function) used in the model is the gini impurity (and the information gain is measured in terms of that loss).

On LGBM, however, you define the loss metric to minimize directly. It is not gini impurity, but rather logloss (cross-entropy loss) for instance if your objective='binary'. If you define a different objective in the model configuration, the loss to minimize will be another one (you can see which ones in https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective)

However, even for the same data, feature importance estimates between RandomForestClassifier and LGBM can be different; even if both models were to use the exact same loss (whether it is gini impurity or whatever). Don't forget that these estimates are what the model 'thinks' about the features and the dataset, which unless your model is a perfect predictor, will be not always right nonetheless

StrikerRUS · 2018-11-21T00:48:37Z

Thank you @julioasotodv for your answer!

Another worth reading article: https://medium.com/the-artificial-impostor/feature-importance-measures-for-tree-models-part-i-47f187c1a2c3.

AlexandraBomane changed the title ~~Feature importance~~ What is the feature importance returned by 'gain' ? Nov 13, 2018

StrikerRUS closed this as completed Nov 21, 2018

lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the feature importance returned by 'gain' ? #1842

What is the feature importance returned by 'gain' ? #1842

AlexandraBomane commented Nov 13, 2018

julioasotodv commented Nov 20, 2018 •

edited

StrikerRUS commented Nov 21, 2018

What is the feature importance returned by 'gain' ? #1842

What is the feature importance returned by 'gain' ? #1842

Comments

AlexandraBomane commented Nov 13, 2018

julioasotodv commented Nov 20, 2018 • edited

StrikerRUS commented Nov 21, 2018

julioasotodv commented Nov 20, 2018 •

edited