Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GBT fails with RF init #2691

Closed
agramfort opened this issue Dec 26, 2013 · 14 comments
Closed

GBT fails with RF init #2691

agramfort opened this issue Dec 26, 2013 · 14 comments
Labels

Comments

@agramfort
Copy link
Member

here is a tiny script which reproduces the crash.

from sklearn.datasets import load_iris
from sklearn import ensemble
from sklearn.cross_validation import train_test_split

iris = load_iris()
X, y = iris.data, iris.target
X, y = X[y < 2], y[y < 2]  # make it binary

X_train, X_test, y_train, y_test = train_test_split(X, y)

# Fit GBT init with RF
rf = ensemble.RandomForestClassifier()
clf = ensemble.GradientBoostingClassifier(init=rf)

clf.fit(X_train, y_train)
acc = clf.score(X_test, y_test)
print("Accuracy: {:.4f}".format(acc))

It also seems that the init param in GradientBoostingClassifier is
not really tested.

@pprett @glouppe @ogrisel

@pprett
Copy link
Member

pprett commented Dec 26, 2013

@agramfort thanks - I'm aware of the issue - but I was cautious to get rid of it because handling this properly would incur quite a test-time performance degradation for single instance prediction (checking isinstance or some try-except block).
I'll solve this soon.

@agramfort
Copy link
Member Author

ok thanks.

@amueller
Copy link
Member

Hm I guess we should fix that before a release, right?

@pprett
Copy link
Member

pprett commented Dec 27, 2013

jep - lets put a milestone

@agramfort
Copy link
Member Author

now that the big refactoring of GBRT is merged, what's needed here?

@pprett
Copy link
Member

pprett commented Jan 9, 2014

basically consolidating this check:

if (not hasattr(self.init, 'fit') or not hasattr(self.init, 'predict'))

to check on predict_proba for classification.
Then we need to make sure that if an init estimator has predict_proba we use the log-odds for binary classification and the output of predict_proba for multi-class.
Basically, the following lines have to be changed to accommodate this:

y_pred = self.init_.predict(X)  # in fit

and:

score = self.init_.predict(X).astype(np.float64)  # in _init_decision_function

@ogrisel
Copy link
Member

ogrisel commented Jan 9, 2014

I think we need a better implementation of astype somewhere under sklearn/utils. The current implementation of numpy.astype always makes a copy of the data even when it already has the right type.

@GaelVaroquaux
Copy link
Member

I think we need a better implementation of astype somewhere under sklearn/
utils. The current implementation of numpy.astype always makes a copy of the
data even when it already has the right type.

check_arrays will do it, I believe.

@kaushik94
Copy link
Contributor

Is this the bug:

Traceback (most recent call last):
  File "x.py", line 15, in <module>
    clf.fit(X_train, y_train)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 1124, in fit
    return super(GradientBoostingClassifier, self).fit(X, y, monitor)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 782, in fit
    begin_at_stage, monitor)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 833, in _fit_stages
    criterion, splitter, random_state)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 575, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "/home/kaushik/scikit-learn/sklearn/ensemble/gradient_boosting.py", line 194, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

@pprett
Copy link
Member

pprett commented Feb 27, 2014

@kaushik94 more context please - arguments and dataset characteristics in particular

@agramfort
Copy link
Member Author

yes it's the bug I observed. See my gist above

@abhishekkrthakur
Copy link
Contributor

Is there any workaround available for this?

On Thu, Feb 27, 2014 at 12:50 PM, Alexandre Gramfort <
notifications@github.com> wrote:

yes it's the bug I observed. See my gist above


Reply to this email directly or view it on GitHubhttps://github.com//issues/2691#issuecomment-36234269
.

Regards

Abhishek Thakur

  • de.linkedin.com/in/abhisvnit/

@minnich49
Copy link

minnich49 commented Feb 24, 2018

@abhishekkrthakur Did you ever find a workaround for this? I saw that a solution here existed but I was wondering if you had come across anything else.

@jeremiedbb
Copy link
Member

Fixed by #12983. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants