New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaBoost ZeroDivisionError #7501

Closed
mfeurer opened this Issue Sep 27, 2016 · 9 comments

Comments

Projects
None yet
6 participants
@mfeurer
Contributor

mfeurer commented Sep 27, 2016

Description

AdaBoostClassifier throws a ZeroDivisionError when calling predict_proba if the classifier has only been fit on samples from a single class.

Steps/Code to Reproduce

import sklearn.ensemble
import numpy as np
X = np.random.random((10, 10))
y = np.zeros((10, ))
ada = sklearn.ensemble.AdaBoostClassifier().fit(X, y)
ada.predict(X)
ada.predict_proba(X)

Expected Results

A ValueError when using fit.

Actual Results

In [10]: ada.predict_proba(X)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-10-492c02a5f340> in <module>()
----> 1 ada.predict_proba(X)

/home/feurerm/virtualenvs/2016_epm/lib/python3.4/site-packages/sklearn/ensemble/weight_boosting.py in predict_proba(self, X)
    765 
    766         proba /= self.estimator_weights_.sum()
--> 767         proba = np.exp((1. / (n_classes - 1)) * proba)
    768         normalizer = proba.sum(axis=1)[:, np.newaxis]
    769         normalizer[normalizer == 0.0] = 1.0

ZeroDivisionError: float division by zero

Versions

  • Linux-3.13.0-54-generic-x86_64-with-Ubuntu-14.04-trusty
  • Python 3.4.3 (default, Oct 14 2015, 20:28:29)
    [GCC 4.8.4]
  • NumPy 1.11.1
  • SciPy 0.18.0
  • Scikit-Learn 0.17.1

@amueller amueller added the Bug label Sep 29, 2016

@amueller amueller added this to the 0.19 milestone Sep 29, 2016

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 29, 2016

Member

thanks for the report.

Member

amueller commented Sep 29, 2016

thanks for the report.

@soniampub

This comment has been minimized.

Show comment
Hide comment
@soniampub

soniampub Sep 29, 2016

@amueller I would like to work on this issue, but I am new to Scikit learn also open source contribution. Can you please give me some thoughts on how to fix this issue and how I can contribute more and more on this project. I want to be a active contributor here.

soniampub commented Sep 29, 2016

@amueller I would like to work on this issue, but I am new to Scikit learn also open source contribution. Can you please give me some thoughts on how to fix this issue and how I can contribute more and more on this project. I want to be a active contributor here.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 30, 2016

Member

First, check if the error still exists on the current development version.
Then we need to think about a bugfix. If the number of classes is 1, we should probably just return ones as the probabilities.
Then add a test that the behavior is as desired and the error is not raised anymore.

Member

amueller commented Sep 30, 2016

First, check if the error still exists on the current development version.
Then we need to think about a bugfix. If the number of classes is 1, we should probably just return ones as the probabilities.
Then add a test that the behavior is as desired and the error is not raised anymore.

@mfeurer

This comment has been minimized.

Show comment
Hide comment
@mfeurer

mfeurer Sep 30, 2016

Contributor

Do you intend that classifiers work on datasets with a single class? As stated in the issue I would have actually liked to see an exception stating that I try to fit on a dataset with a single class only. This would prevent downstream errors.

Contributor

mfeurer commented Sep 30, 2016

Do you intend that classifiers work on datasets with a single class? As stated in the issue I would have actually liked to see an exception stating that I try to fit on a dataset with a single class only. This would prevent downstream errors.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 30, 2016

Member

IIRC, the behavior is inconsistent. Some classifiers work on a single class, others don't. It's a bit unfortunate. But I don't want to break working behavior for consistencies sake. There is a test that during fit, it either works or a sensible error is thrown:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L917

It seems we should add decision_function and predict_proba to this test.

For some models, like tree-based ones or knn, fitting to a single class actually has a legitimate result, for others, like linear models, it does not. We could start deprecating the support for single classes, but that's something that could happen within cross-validation. Arguably that will not give you a great result, but that's not really a reason to prohibit the user to do it.

Member

amueller commented Sep 30, 2016

IIRC, the behavior is inconsistent. Some classifiers work on a single class, others don't. It's a bit unfortunate. But I don't want to break working behavior for consistencies sake. There is a test that during fit, it either works or a sensible error is thrown:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L917

It seems we should add decision_function and predict_proba to this test.

For some models, like tree-based ones or knn, fitting to a single class actually has a legitimate result, for others, like linear models, it does not. We could start deprecating the support for single classes, but that's something that could happen within cross-validation. Arguably that will not give you a great result, but that's not really a reason to prohibit the user to do it.

@soniampub

This comment has been minimized.

Show comment
Hide comment
@soniampub

soniampub Sep 30, 2016

Sure, i will start looking into it.

soniampub commented Sep 30, 2016

Sure, i will start looking into it.

@floondi

This comment has been minimized.

Show comment
Hide comment
@floondi

floondi Jan 23, 2017

Hey, is it okay if I take this one?

floondi commented Jan 23, 2017

Hey, is it okay if I take this one?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jan 23, 2017

Member

Looks like it!

Member

jnothman commented Jan 23, 2017

Looks like it!

@dokato

This comment has been minimized.

Show comment
Hide comment
@dokato

dokato Feb 16, 2017

Contributor

Seemed easy to fix, please check if that's okay.

Contributor

dokato commented Feb 16, 2017

Seemed easy to fix, please check if that's okay.

dokato pushed a commit to dokato/scikit-learn that referenced this issue Feb 16, 2017

dokato pushed a commit to dokato/scikit-learn that referenced this issue Feb 16, 2017

dokato added a commit to dokato/scikit-learn that referenced this issue Feb 16, 2017

dokato added a commit to dokato/scikit-learn that referenced this issue Feb 16, 2017

dokato added a commit to dokato/scikit-learn that referenced this issue Feb 16, 2017

dokato added a commit to dokato/scikit-learn that referenced this issue Feb 16, 2017

dokato added a commit to dokato/scikit-learn that referenced this issue Feb 20, 2017

@jnothman jnothman closed this in #8371 Feb 20, 2017

jnothman added a commit that referenced this issue Feb 20, 2017

[MRG+1] FIX AdaBoost ZeroDivisionError in proba #7501 (#8371)
* FIX AdaBoost ZeroDivisionError in proba #7501

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX #7501 improvements suggested by lesteve introduced

* FIX #7501 whats_new file updated

* Tweak in rst

sergeyf added a commit to sergeyf/scikit-learn that referenced this issue Feb 28, 2017

[MRG+1] FIX AdaBoost ZeroDivisionError in proba #7501 (#8371)
* FIX AdaBoost ZeroDivisionError in proba #7501

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX #7501 improvements suggested by lesteve introduced

* FIX #7501 whats_new file updated

* Tweak in rst

Sundrique added a commit to Sundrique/scikit-learn that referenced this issue Jun 14, 2017

[MRG+1] FIX AdaBoost ZeroDivisionError in proba #7501 (#8371)
* FIX AdaBoost ZeroDivisionError in proba #7501

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX #7501 improvements suggested by lesteve introduced

* FIX #7501 whats_new file updated

* Tweak in rst

NelleV added a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017

[MRG+1] FIX AdaBoost ZeroDivisionError in proba #7501 (#8371)
* FIX AdaBoost ZeroDivisionError in proba #7501

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX #7501 improvements suggested by lesteve introduced

* FIX #7501 whats_new file updated

* Tweak in rst

paulha added a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017

[MRG+1] FIX AdaBoost ZeroDivisionError in proba #7501 (#8371)
* FIX AdaBoost ZeroDivisionError in proba #7501

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX #7501 improvements suggested by lesteve introduced

* FIX #7501 whats_new file updated

* Tweak in rst

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017

[MRG+1] FIX AdaBoost ZeroDivisionError in proba #7501 (#8371)
* FIX AdaBoost ZeroDivisionError in proba #7501

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX AdaBoost ZeroDivisionError in proba #7501 - tests corrected

* FIX #7501 improvements suggested by lesteve introduced

* FIX #7501 whats_new file updated

* Tweak in rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment