Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Gradient Boosting Out-of-bag Estimates #1802

Closed
pprett opened this Issue · 1 comment

1 participant

Peter Prettenhofer
Peter Prettenhofer
Owner

This issue was brought up on the ML by Yanir .

oob_score_[i] includes previous trees that were trained on the OOB instances of the i-th sample.

currently y_pred is updated for each sample (in and out bag) at each iteration. This way the OOB estimates are overly optimistic. When using OOB, we might use another y_pred (say y_oob_pred) that only contains the cumulative scores of those trees where the i-th sample was out-of-bag.

I've check gbm, it seems that they use the same implementation that we use ATM - I'll check the difference between the OOB scores on some toy examples...

yanirs yanirs referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
yanirs yanirs referenced this issue from a commit in yanirs/scikit-learn
yanirs yanirs Addressing issue #1802 (Gradient Boosting Out-of-bag Estimates): chan…
…ged oob_score_ to be calculated based only on trees where the OOB instances weren't used for training
c28aed7
Peter Prettenhofer pprett closed this
Peter Prettenhofer
Owner

fixed by #2188

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.