Skip to content


Subversion checkout URL

You can clone with
Download ZIP


GBT fails with RF init #2691

agramfort opened this Issue · 12 comments

7 participants


here is a tiny script which reproduces the crash.

from sklearn.datasets import load_iris
from sklearn import ensemble
from sklearn.cross_validation import train_test_split

iris = load_iris()
X, y =,
X, y = X[y < 2], y[y < 2]  # make it binary

X_train, X_test, y_train, y_test = train_test_split(X, y)

# Fit GBT init with RF
rf = ensemble.RandomForestClassifier()
clf = ensemble.GradientBoostingClassifier(init=rf), y_train)
acc = clf.score(X_test, y_test)
print("Accuracy: {:.4f}".format(acc))

It also seems that the init param in GradientBoostingClassifier is
not really tested.

@pprett @glouppe @ogrisel


@agramfort thanks - I'm aware of the issue - but I was cautious to get rid of it because handling this properly would incur quite a test-time performance degradation for single instance prediction (checking isinstance or some try-except block).
I'll solve this soon.


ok thanks.


Hm I guess we should fix that before a release, right?


jep - lets put a milestone


now that the big refactoring of GBRT is merged, what's needed here?


basically consolidating this check:

if (not hasattr(self.init, 'fit') or not hasattr(self.init, 'predict'))

to check on predict_proba for classification.
Then we need to make sure that if an init estimator has predict_proba we use the log-odds for binary classification and the output of predict_proba for multi-class.
Basically, the following lines have to be changed to accommodate this:

y_pred = self.init_.predict(X)  # in fit


score = self.init_.predict(X).astype(np.float64)  # in _init_decision_function

I think we need a better implementation of astype somewhere under sklearn/utils. The current implementation of numpy.astype always makes a copy of the data even when it already has the right type.


Is this the bug:

Traceback (most recent call last):
  File "", line 15, in <module>, y_train)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/", line 1124, in fit
    return super(GradientBoostingClassifier, self).fit(X, y, monitor)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/", line 782, in fit
    begin_at_stage, monitor)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/", line 833, in _fit_stages
    criterion, splitter, random_state)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/", line 575, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/", line 194, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

@kaushik94 more context please - arguments and dataset characteristics in particular

@ogrisel ogrisel removed this from the 0.15 milestone
@amueller amueller added this to the 0.15.1 milestone
@amueller amueller modified the milestone: 0.17, 0.16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.