GBT fails with RF init #2691

agramfort · 2013-12-26T18:15:10Z

here is a tiny script which reproduces the crash.

from sklearn.datasets import load_iris
from sklearn import ensemble
from sklearn.cross_validation import train_test_split

iris = load_iris()
X, y = iris.data, iris.target
X, y = X[y < 2], y[y < 2]  # make it binary

X_train, X_test, y_train, y_test = train_test_split(X, y)

# Fit GBT init with RF
rf = ensemble.RandomForestClassifier()
clf = ensemble.GradientBoostingClassifier(init=rf)

clf.fit(X_train, y_train)
acc = clf.score(X_test, y_test)
print("Accuracy: {:.4f}".format(acc))

It also seems that the init param in GradientBoostingClassifier is
not really tested.

@pprett @glouppe @ogrisel

The text was updated successfully, but these errors were encountered:

pprett · 2013-12-26T19:53:20Z

@agramfort thanks - I'm aware of the issue - but I was cautious to get rid of it because handling this properly would incur quite a test-time performance degradation for single instance prediction (checking isinstance or some try-except block).
I'll solve this soon.

agramfort · 2013-12-26T21:36:30Z

ok thanks.

amueller · 2013-12-27T14:47:09Z

Hm I guess we should fix that before a release, right?

pprett · 2013-12-27T14:52:24Z

jep - lets put a milestone

agramfort · 2014-01-08T22:07:16Z

now that the big refactoring of GBRT is merged, what's needed here?

pprett · 2014-01-09T09:37:32Z

basically consolidating this check:

if (not hasattr(self.init, 'fit') or not hasattr(self.init, 'predict'))

to check on predict_proba for classification.
Then we need to make sure that if an init estimator has predict_proba we use the log-odds for binary classification and the output of predict_proba for multi-class.
Basically, the following lines have to be changed to accommodate this:

y_pred = self.init_.predict(X)  # in fit

and:

score = self.init_.predict(X).astype(np.float64)  # in _init_decision_function

ogrisel · 2014-01-09T09:49:22Z

I think we need a better implementation of astype somewhere under sklearn/utils. The current implementation of numpy.astype always makes a copy of the data even when it already has the right type.

GaelVaroquaux · 2014-01-09T10:18:37Z

I think we need a better implementation of astype somewhere under sklearn/
utils. The current implementation of numpy.astype always makes a copy of the
data even when it already has the right type.

check_arrays will do it, I believe.

kaushik94 · 2014-02-27T11:14:02Z

Is this the bug:

Traceback (most recent call last):
  File "x.py", line 15, in <module>
    clf.fit(X_train, y_train)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 1124, in fit
    return super(GradientBoostingClassifier, self).fit(X, y, monitor)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 782, in fit
    begin_at_stage, monitor)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 833, in _fit_stages
    criterion, splitter, random_state)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 575, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 194, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

pprett · 2014-02-27T11:18:32Z

@kaushik94 more context please - arguments and dataset characteristics in particular

agramfort · 2014-02-27T11:50:32Z

yes it's the bug I observed. See my gist above

abhishekkrthakur · 2014-05-01T13:35:16Z

Is there any workaround available for this?

On Thu, Feb 27, 2014 at 12:50 PM, Alexandre Gramfort <
notifications@github.com> wrote:

yes it's the bug I observed. See my gist above

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2691#issuecomment-36234269
.

Regards

Abhishek Thakur

de.linkedin.com/in/abhisvnit/

minnich49 · 2018-02-24T03:03:35Z

@abhishekkrthakur Did you ever find a workaround for this? I saw that a solution here existed but I was wondering if you had come across anything else.

jeremiedbb · 2020-03-02T13:55:19Z

Fixed by #12983. Closing.

ogrisel removed this from the 0.15 milestone Jun 4, 2014

amueller added this to the 0.15.1 milestone Jul 18, 2014

amueller mentioned this issue Jul 18, 2014

GradientBoostingClassifier with a BaseEstimator #2130

Closed

amueller mentioned this issue May 14, 2015

GradientBoostingClassifier np.nan_to_num(np.exp(pred[:, k] IndexError: too many indices for array #4721

Closed

amueller modified the milestones: 0.17, 0.16 May 14, 2015

jmschrei mentioned this issue Sep 7, 2015

[MRG] Support arbitrary init estimators for Gradient Boosting #5221

Closed

amueller modified the milestones: 0.18, 0.17 Sep 20, 2015

amueller modified the milestones: 0.18, 0.19 Sep 22, 2016

amueller modified the milestone: 0.19 Jun 12, 2017

stoddardg mentioned this issue Oct 21, 2018

GradientBoosting fails when using init estimator parameter. #12429

Closed

jeremiedbb mentioned this issue Oct 22, 2018

[MRG] FIX gradient boosting with sklearn estimator as init #12436

Closed

jeremiedbb closed this as completed Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GBT fails with RF init #2691

GBT fails with RF init #2691

agramfort commented Dec 26, 2013

pprett commented Dec 26, 2013

agramfort commented Dec 26, 2013

amueller commented Dec 27, 2013

pprett commented Dec 27, 2013

agramfort commented Jan 8, 2014

pprett commented Jan 9, 2014

ogrisel commented Jan 9, 2014

GaelVaroquaux commented Jan 9, 2014

kaushik94 commented Feb 27, 2014

pprett commented Feb 27, 2014

agramfort commented Feb 27, 2014

abhishekkrthakur commented May 1, 2014

minnich49 commented Feb 24, 2018 •

edited

Loading

jeremiedbb commented Mar 2, 2020

GBT fails with RF init #2691

GBT fails with RF init #2691

Comments

agramfort commented Dec 26, 2013

pprett commented Dec 26, 2013

agramfort commented Dec 26, 2013

amueller commented Dec 27, 2013

pprett commented Dec 27, 2013

agramfort commented Jan 8, 2014

pprett commented Jan 9, 2014

ogrisel commented Jan 9, 2014

GaelVaroquaux commented Jan 9, 2014

kaushik94 commented Feb 27, 2014

pprett commented Feb 27, 2014

agramfort commented Feb 27, 2014

abhishekkrthakur commented May 1, 2014

minnich49 commented Feb 24, 2018 • edited Loading

jeremiedbb commented Mar 2, 2020

minnich49 commented Feb 24, 2018 •

edited

Loading