New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX feature importances in random forests should sum up to 1 #13636
FIX feature importances in random forests should sum up to 1 #13636
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think we want to have a single entry in the what's new for all the ensemble
models, so I would suggest to update this PR or #13620, whichever gets merged last
n_classes=3) | ||
clf = RandomForestClassifier(min_samples_leaf=5, random_state=42, | ||
n_estimators=200).fit(X, y) | ||
assert math.isclose(1, clf.feature_importances_.sum(), abs_tol=1e-7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why not np.is_close()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy's isclose
is older than python's math.isclose
and has some issues compared to the python's. Here's a nice conversation on the topic: numpy/numpy#10161. Since now we support python>=3.5, seems like a good choice to me to use math.isclose
if comparing two floats is what we want to do.
sure, could do. But there's a slight difference that on GB{C/R} models the importances already do sum up to 1, which is not the case for forests. |
doc/whats_new/_contributors.rst
Outdated
@@ -169,3 +169,5 @@ | |||
.. _Roman Yurchak: https://github.com/rth | |||
|
|||
.. _Hanmin Qin: https://github.com/qinhanmin2014 | |||
|
|||
.. _Adrin Jalali: https://github.com/adrinjalali |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a separate PR adding our names to this list. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, yeah, I've added mine in a few PRs, it'll be there once one of them gets merged lol
Thanks Adrin |
…-learn#13636)" This reverts commit ae20ae6.
…-learn#13636)" This reverts commit ae20ae6.
Same as the discussion in #7406 and #13620, the feature importances of random forest based mdoels should sum up to 1, and root only trees should be ignored.