Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BaggingClassifier uses Class Label as Index to Array when Voting # #105

Closed
hamelsmu opened this issue Apr 6, 2019 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@hamelsmu
Copy link
Owner

hamelsmu commented Apr 6, 2019

Test issue copied from scikit-learn/scikit-learn#13587

Description
BaggingClassifier uses Class Label as Index to Array when Voting

Steps/Code to Reproduce
Provide a base estimator to BaggingClassifier that does not define the function predict_proba. This results in BaggingClassifier resorting to voting. It appears the code for performing voting uses class labels as array indices instead of looking up the index of the class label in the classes_ member.

Example:

import numpy as np
from sklearn.ensemble import BaggingClassifier

class Foo:
    
    def __init__(self):
        pass
    
    def fit(self, X, Y, W=None):
        return self
    
    def predict(self, X):
        return np.full(X.shape[0], True, np.bool)
    
    def score(self, X, Y):
        YH = self.predict(X)
        return (Y == YH).mean()
    
    def get_params(self, deep=True):
        return {}
    
    def set_params(self, **params):
        for k, v in params:
            setattr(self, k, v)
        return self
    
# %%
A = np.random.rand(10, 4)
Y = np.random.randint(2, size=10, dtype=np.bool)
bc = BaggingClassifier(Foo())
bc.fit(A, Y)
YH = bc.predict(A)
print('BaggingClassifier Voting Result: ')
print(YH)
print('Ensemble Member Predictions: ')
for Ei in bc.estimators_:
    print(Ei.predict(A))

Expected Results
In the above code snippet, BaggingClassifier should return an array of True since it is the majority prediction of all ensemble members.

Actual Results
BaggingClassifier returns an array of False. This issue only occurs when the base estimator does not define the function predict_proba.

The issue appears to be due to lines 137 and 140 in ensemble/bagging.py.

scikit-learn/sklearn/ensemble/bagging.py

Line 137 in e14ac6d

predictions = estimator.predict(X[:, features])
scikit-learn/sklearn/ensemble/bagging.py

Line 140 in e14ac6d

proba[i, predictions[i]] += 1
The predictions of the ensemble members are directly used as indices into the original array. I'm guessing the prediction labels need to be converted into class labels using estimator.classes_.

Versions
System:
python: 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
executable: C:\Users\XXXXXX\Anaconda3\pythonw.exe
machine: Windows 2012 ServerR2

BLAS:
macros:
lib_dirs:
cblas_libs: cblas

Python deps:
pip: 18.1
setuptools: 40.6.3
sklearn: 0.20.1
numpy: 1.15.4
scipy: 1.1.0
Cython: 0.29.2
pandas: 0.23.4

testing: scikit-learn/scikit-learn#13587

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.95. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: dashboard, app homepage and code for this bot.

@issue-label-bot issue-label-bot bot added the bug Something isn't working label Apr 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant