BaggingClassifier uses Class Label as Index to Array when Voting # #105

hamelsmu · 2019-04-06T22:31:17Z

Test issue copied from scikit-learn/scikit-learn#13587

Description
BaggingClassifier uses Class Label as Index to Array when Voting

Steps/Code to Reproduce
Provide a base estimator to BaggingClassifier that does not define the function predict_proba. This results in BaggingClassifier resorting to voting. It appears the code for performing voting uses class labels as array indices instead of looking up the index of the class label in the classes_ member.

Example:

import numpy as np
from sklearn.ensemble import BaggingClassifier

class Foo:
    
    def __init__(self):
        pass
    
    def fit(self, X, Y, W=None):
        return self
    
    def predict(self, X):
        return np.full(X.shape[0], True, np.bool)
    
    def score(self, X, Y):
        YH = self.predict(X)
        return (Y == YH).mean()
    
    def get_params(self, deep=True):
        return {}
    
    def set_params(self, **params):
        for k, v in params:
            setattr(self, k, v)
        return self
    
# %%
A = np.random.rand(10, 4)
Y = np.random.randint(2, size=10, dtype=np.bool)
bc = BaggingClassifier(Foo())
bc.fit(A, Y)
YH = bc.predict(A)
print('BaggingClassifier Voting Result: ')
print(YH)
print('Ensemble Member Predictions: ')
for Ei in bc.estimators_:
    print(Ei.predict(A))

Expected Results
In the above code snippet, BaggingClassifier should return an array of True since it is the majority prediction of all ensemble members.

Actual Results
BaggingClassifier returns an array of False. This issue only occurs when the base estimator does not define the function predict_proba.

The issue appears to be due to lines 137 and 140 in ensemble/bagging.py.

scikit-learn/sklearn/ensemble/bagging.py

Line 137 in e14ac6d

predictions = estimator.predict(X[:, features])
scikit-learn/sklearn/ensemble/bagging.py

Line 140 in e14ac6d

proba[i, predictions[i]] += 1
The predictions of the ensemble members are directly used as indices into the original array. I'm guessing the prediction labels need to be converted into class labels using estimator.classes_.

Versions
System:
python: 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
executable: C:\Users\XXXXXX\Anaconda3\pythonw.exe
machine: Windows 2012 ServerR2

BLAS:
macros:
lib_dirs:
cblas_libs: cblas

Python deps:
pip: 18.1
setuptools: 40.6.3
sklearn: 0.20.1
numpy: 1.15.4
scipy: 1.1.0
Cython: 0.29.2
pandas: 0.23.4

testing: scikit-learn/scikit-learn#13587

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2019-04-06T22:31:19Z

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.95. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: dashboard, app homepage and code for this bot.

issue-label-bot bot added the bug Something isn't working label Apr 6, 2019

hamelsmu closed this as completed Nov 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BaggingClassifier uses Class Label as Index to Array when Voting # #105

BaggingClassifier uses Class Label as Index to Array when Voting # #105

hamelsmu commented Apr 6, 2019 •

edited

issue-label-bot bot commented Apr 6, 2019

BaggingClassifier uses Class Label as Index to Array when Voting # #105

BaggingClassifier uses Class Label as Index to Array when Voting # #105

Comments

hamelsmu commented Apr 6, 2019 • edited

issue-label-bot bot commented Apr 6, 2019

hamelsmu commented Apr 6, 2019 •

edited