Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Choose out of bag scoring metric. Fixes #3455 #3723

Closed
wants to merge 3 commits into from

Conversation

staple
Copy link
Contributor

@staple staple commented Sep 30, 2014

Enhances the oob_score parameter of forest related predictors (e.g. RandomForestClassifier) to accept a string representing a scoring method or a callable scorer.

The existing out of bag scoring implementations use custom prediction logic in order to predict only out of bag input rows for each estimator. Additionally, per existing functionality the out of bag predictions are saved in documented fields of the predictor (oob_decision_function_, oob_prediction_). To simplify integration of this functionality with the existing scorer interface, the out of bag predictions are passed to a _DummyPredictor, which in turn makes them available to the scorer.

For backwards compatibility, the oob_score argument is not renamed, and it continues to accept True / False arguments (default still False). Passed True, the oob_score calculation is unchanged in the single output case (r2 for regression, accuracy for classification).

In the multi output case, the score calculations have changed. For regression, formerly individual r-squared metrics for each field were averaged across all output fields. Now a single r-squared metric is calculated across all outputs using the r2_score implementation. For classification, formerly the individual accuracy metrics were averaged across all output fields. Now a single accuracy metric is calculated based on complete accuracy of the set of outputs for a given input row, using the accuracy_score implementation.

This is a WIP, some known TODOS:

  • Is the proposed implementation using DummyPredictor reasonable?
  • Are the proposed scoring changes in the multi output case acceptable?
  • Ensure that the accuracy scorer handles combined multi class + multi output cases
  • Implement this for BaggingClassifier and BaggingRegressor as well?
  • Tests, example etc

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling 265ffc3 on staple:3455 into 6ea371a on scikit-learn:master.

class _DummyPredictor:
""" Private class returning a precomputed prediction. Used to provide out-
of-bag training predictions to a scorer.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the fact that you have to use this hack shows that we have a problem with our scorer API.

@mblondel
Copy link
Member

You might be interested in my PR #3720 which introduces OOB-aware grid search.

For a question of separation of concerns, I would be inclined to just deprecate oob_score_. It would be enough to have just oob_prediction_ (which is missing in the classification case but can be added). Computing scores from the OOB predictions would be role of GridSearchOOB.

@mblondel
Copy link
Member

In the classification case, the input type differs depending on the scorer: class predictions, decision function, probabilities. To handle this, a lot of scorer code will leak to the forest code. Hence my separation of concerns suggestion.

@staple
Copy link
Contributor Author

staple commented Sep 30, 2014

@mblondel, thanks for your comments. For the sake of completeness, I expanded the existing WIP implementation to provide prediction probabilities and the decision function to the classification scorer. Thanks for pointing out the additional data required by some classification scorers. The forest estimators now expose all the oob prediction data externally as well, via attributes. The implementation here is still using a ‘DummyPredictor’ to integrate with the scoring interface, which currently works with Predictors rather than precomputed values.

Since this is my first exposure to the scikit-learn codebase, I don’t really feel qualified to weigh in on the question of computing oob_score_ in a forest estimator vs exposing oob_prediction_ along with decision function and probabilities so that scoring can be computed externally. Is the oob score typically going to be used by an external component such as your GridSearchOOB, or do you think human users are interested in accessing this score easily as well? Would you be willing to discuss this design decision with @arjoly in the original ticket? (#3455)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) when pulling e179c76 on staple:3455 into 6ea371a on scikit-learn:master.

@mblondel
Copy link
Member

mblondel commented Oct 1, 2014

@arjoly @glouppe What's your opinion on this?

@mblondel
Copy link
Member

mblondel commented Oct 1, 2014

@staple I don't think it is difficult to do r2_score(y, clf.oob_prediction_) in user-land code.

@staple
Copy link
Contributor Author

staple commented Oct 1, 2014

@mblondel, Sure, understood about r2_score. I would guess that the metrics currently implemented using _ThresholdScorer (such as roc_auc) would be slightly more work for the user to figure out, though definitely still doable.

@glouppe
Copy link
Contributor

glouppe commented Oct 4, 2014

I like the idea of removing the scorer specific code in forests and rely instead on our scorer API.

To make user experience a bit easier though, we could add a OOB equivalent to cross_val_scores. Something like oob_score(estimator, X, y, scoring=...) provided the estimator provides clf.oob_predictions_ or something like that. This would replicate the functionality of oob_score_ but in a more flexible and integrated way.

@mblondel
Copy link
Member

mblondel commented Oct 4, 2014

+1 to @glouppe 's proposal.

One problem is that our current scorer API doesn't let us use pre-computed predictions (it recomputes the predictions based on estimator, X and y). I suggest we add a new function get_score along these lines:

def get_score(scoring, y_true, y_pred=None, y_decision=None, y_proba=None):
    if scoring == "roc_auc":
        if y_decision is None and y_proba is None:
            raise ValueError("roc_auc needs either y_decision or y_proba.")
        if y_decision is not None:
            return roc_auc_score(y_true, y_decision)
        else:
            return roc_auc_score(y_true, y_proba)
[...]

The advantage of this approach is that it raises a clear error message when the wrong type of prediction is provided.

@staple
Copy link
Contributor Author

staple commented Oct 7, 2014

Hi, thanks for the suggestions. I removed oob_score from forest, added an independent oob_score and also get_score, and refactored scoring a bit to facilitate get_score. Still in the sketch / POC phase. Please let me know what you think.

@staple
Copy link
Contributor Author

staple commented Oct 7, 2014

@mblondel The get_score I added tries to reuse the existing scoring infrastructure. If this approach looks promising, I can try to add more specific error messages per your suggestion. But let me know what you think about the overall approach, thanks.

Out of bag score of the estimator.
"""

estimator.oob_predict = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the set_params API.

@mblondel
Copy link
Member

mblondel commented Oct 7, 2014

I need more time to think about this design issue. One thing to keep in mind is that getting scores from pre-computed predictions is also useful for multiple-metric grid search (see #2759).

Perhaps @jnothman, @ogrisel, @larsmans and @amueller would like to pitch in.

@staple
Copy link
Contributor Author

staple commented Oct 9, 2014

@mblondel

Would it make sense for me to create a separate PR specifically for looking at scorer api changes, to facilitate discussion of these changes?

To summarize the situation, the current _BaseScorer interface accepts a predictor and some prediction data points X. The various _BaseScorer implementations compute a score by calling predict(X), predict_proba(X), or decision_function(X) on the predictor as needed and pass the results to a score function with possible transformations.

In the development of out of bag scoring and multiple-metric grid search we’ve seen that predictions may be made via special procedures that do not strictly adhere to the predictor interface. We would like to be able to score these externally computed predictions.

As I see it there are 5 choices for implementing scoring of externally computed predictions.

  1. Add a new implementation that’s completely separate from the existing _BaseScorer class.
  2. Use the existing _BaseScorer without changes. This means abusing the predictor interface and creating a DummyPredictor (or similar) that ignores X and returns the externally computed predictions - predictions not inherently based on the X variable, but externally computed based on X.
  3. Add a private api to _BaseScorer for scoring externally computed predictions. The private api can be called by a public helper function in scorer.py
  4. Change the public api of _BaseScorer to make scoring of externally computed predictions a first class operation along with the existing functionality. Also possibly rename _BaseScorer => BaseScorer.
  5. Change the public api of _BaseScorer so that it only handles externally computed predictions. The existing functionality would be implemented by the caller (as a callback, since the required type of prediction data is not known in advance).

Let me know if I missed any options :) In the most recent patch I went with #3. To my mind it might make sense to decide on which of these approaches 1-5 is most appropriate, and proceed with design from there.

@staple
Copy link
Contributor Author

staple commented Oct 15, 2014

It seems like the scorer design question will need to be answered in order to implement this oob scoring feature. Is there anything else I can do to help make progress on this issue, per my previous comment?

@mblondel
Copy link
Member

@staple It seems to me that option 4) would be the best solution if we want to add more flexibility while remaining backward compatible.

Perhaps bringing this discussion to the mailing-list will give it more attention. In particular, I would like to have @larsmans and @amueller's opinion, as they are the original designers of the scorer API.

@eickenberg
Copy link
Contributor

Can't comment on the globality of this topic, because it is too vast and central to this software project, and I don't have the full overview.

What I can point out is:

In linear_model we have the analytic LOO prediction in ridge regression as another example where predictions come from "elsewhere" (in this case through a shortcut via an analytic formula). It had seemed like an exception to me, but I guess this is now another example for the same issue.

As can be seen in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L792 it is solved via 2) and makes 2) look like quite an ugly hack ...

As for 5), up to the proposed callback structure, this is almost a reversion to the score_func type API. The question would be whether adding the callback structure solves the issues the old API had, whether it feels like it is the simple solution, and also whether such a "change of mind" would be perceived as indecisive "zig-zagging".

@jnothman
Copy link
Member

I agree with #3723 (comment) that this comes down to a separate issue about scoring API with precalculated predictions. I'm not sure what the correct solution is, but I personally don't see much point in this PR and the issue it is trying to solve (#3723), and would consider their closure.

@amueller amueller added the API label Aug 5, 2019
@haiatn
Copy link
Contributor

haiatn commented Jul 29, 2023

I agree with #3723 (comment) that this comes down to a separate issue about scoring API with precalculated predictions. I'm not sure what the correct solution is, but I personally don't see much point in this PR and the issue it is trying to solve (#3723), and would consider their closure.

I agree

@adrinjalali
Copy link
Member

Those parts of the code have changed a lot, a fresh look might be a better approach than trying to continue this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants