New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Support for Out-of-Bag cross validation: Add IdentitySplitter() and oob_score() #23391
Conversation
@glemaitre, @ogrisel, @thomasjpfan (I'm following this list: https://github.com/scikit-learn/scikit-learn/wiki/Available-reviewers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR.
Before going into a more technical review, we discussion the feature's inclusion in the original issue: #23382 (comment)
Okay, I've now added an example of this feature. It doesn't rely on the new classes I added, in case it is decided that these classes are not wanted in the repo. If this is decided I will remove them from the PR and keep only the example. |
Here's the rendered version of the example: https://output.circle-artifacts.com/output/job/3166928d-f107-471f-80c5-ea49d5d16d98/artifacts/0/doc/auto_examples/ensemble/plot_oob_cross_validation.html |
As mentioned here: #23382 (comment), I would be more in favor of repurposing the already existing example and making it narrative instead of adding a new example. One of the reasons is that we have already too many examples and we should avoid adding a new one when we can improve an existing one (and old one). |
Considering the reduced scope of what you would accept, I don't have any particular interest in finishing this PR, but I will keep the issue #23382 open because I believe it's still an unresolved need in sklearn. |
Reference Issues/PRs
Fixes #23382
What does this implement/fix? Explain your changes.
IdentitySplitter()
, which is a dummy cross-validation splitter that just returns the training data. This can be used to implement cross validation for OOBoob_score()
, a scorer function that delegates to theRandomForest
oob scoreAny other comments?
The intended usage of these new objects together is as follows: