-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the feature importance returned by 'gain' ? #1842
Comments
Well, they are roughly equivalent. The Random Forest implementation in sklearn is based on Breiman's paper (https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf); therefore, the loss (objective function) used in the model is the gini impurity (and the information gain is measured in terms of that loss). On LGBM, however, you define the loss metric to minimize directly. It is not gini impurity, but rather logloss (cross-entropy loss) for instance if your However, even for the same data, feature importance estimates between RandomForestClassifier and LGBM can be different; even if both models were to use the exact same loss (whether it is gini impurity or whatever). Don't forget that these estimates are what the model 'thinks' about the features and the dataset, which unless your model is a perfect predictor, will be not always right nonetheless |
Thank you @julioasotodv for your answer! Another worth reading article: https://medium.com/the-artificial-impostor/feature-importance-measures-for-tree-models-part-i-47f187c1a2c3. |
Does the output of
LGBMClassifier().booster_.feature_importance(importance_type='gain')
is equivalent to gini importances which used byRandomForestClassifier
provided by Scikit-Learn (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) ?The text was updated successfully, but these errors were encountered: