Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Support for Out-of-Bag cross validation: Add IdentitySplitter() and oob_score() #23391

Closed
wants to merge 21 commits into from

Conversation

multimeric
Copy link

@multimeric multimeric commented May 17, 2022

Reference Issues/PRs

Fixes #23382

What does this implement/fix? Explain your changes.

  • Add IdentitySplitter(), which is a dummy cross-validation splitter that just returns the training data. This can be used to implement cross validation for OOB
  • Add oob_score(), a scorer function that delegates to the RandomForest oob score
  • Added general tests for both, and for the integration of the two together in a CV context

Any other comments?

The intended usage of these new objects together is as follows:

from sklearn.metrics import oob_score
from sklearn.model_selection import GridSearchCV, IdentitySplitter
from sklearn.ensemble import RandomForestClassifier

cv = GridSearchCV(
    RandomForestClassifier(oob_score=True, random_state=0),
    {"n_estimators": [1, 20, 100]},
    cv=IdentitySplitter(),
    scoring=oob_score,
)
results = cv.fit(X, y)

@multimeric multimeric changed the title Support for Out-of-Bag cross validation: Add IdentitySplitter() and oob_score() [ENH] Support for Out-of-Bag cross validation: Add IdentitySplitter() and oob_score() May 17, 2022
@multimeric multimeric changed the title [ENH] Support for Out-of-Bag cross validation: Add IdentitySplitter() and oob_score() ENH Support for Out-of-Bag cross validation: Add IdentitySplitter() and oob_score() May 17, 2022
@multimeric multimeric changed the title ENH Support for Out-of-Bag cross validation: Add IdentitySplitter() and oob_score() [MRG] Support for Out-of-Bag cross validation: Add IdentitySplitter() and oob_score() Jun 5, 2022
@multimeric
Copy link
Author

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR.

Before going into a more technical review, we discussion the feature's inclusion in the original issue: #23382 (comment)

sklearn/metrics/_scorer.py Outdated Show resolved Hide resolved
@multimeric
Copy link
Author

Okay, I've now added an example of this feature. It doesn't rely on the new classes I added, in case it is decided that these classes are not wanted in the repo. If this is decided I will remove them from the PR and keep only the example.

@multimeric
Copy link
Author

@glemaitre
Copy link
Member

As mentioned here: #23382 (comment), I would be more in favor of repurposing the already existing example and making it narrative instead of adding a new example. One of the reasons is that we have already too many examples and we should avoid adding a new one when we can improve an existing one (and old one).

@multimeric
Copy link
Author

multimeric commented Jun 25, 2022

Considering the reduced scope of what you would accept, I don't have any particular interest in finishing this PR, but I will keep the issue #23382 open because I believe it's still an unresolved need in sklearn.

@multimeric multimeric closed this Jun 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CV integration for OOB-scoring
3 participants