You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
BaggingClassifier uses Class Label as Index to Array when Voting
Steps/Code to Reproduce
Provide a base estimator to BaggingClassifier that does not define the function predict_proba. This results in BaggingClassifier resorting to voting. It appears the code for performing voting uses class labels as array indices instead of looking up the index of the class label in the classes_ member.
Example:
import numpy as np
from sklearn.ensemble import BaggingClassifier
class Foo:
def __init__(self):
pass
def fit(self, X, Y, W=None):
return self
def predict(self, X):
return np.full(X.shape[0], True, np.bool)
def score(self, X, Y):
YH = self.predict(X)
return (Y == YH).mean()
def get_params(self, deep=True):
return {}
def set_params(self, **params):
for k, v in params:
setattr(self, k, v)
return self
# %%
A = np.random.rand(10, 4)
Y = np.random.randint(2, size=10, dtype=np.bool)
bc = BaggingClassifier(Foo())
bc.fit(A, Y)
YH = bc.predict(A)
print('BaggingClassifier Voting Result: ')
print(YH)
print('Ensemble Member Predictions: ')
for Ei in bc.estimators_:
print(Ei.predict(A))
Expected Results
In the above code snippet, BaggingClassifier should return an array of True since it is the majority prediction of all ensemble members.
Actual Results
BaggingClassifier returns an array of False. This issue only occurs when the base estimator does not define the function predict_proba.
The issue appears to be due to lines 137 and 140 in ensemble/bagging.py.
proba[i, predictions[i]] += 1
The predictions of the ensemble members are directly used as indices into the original array. I'm guessing the prediction labels need to be converted into class labels using estimator.classes_.
Versions
System:
python: 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
executable: C:\Users\XXXXXX\Anaconda3\pythonw.exe
machine: Windows 2012 ServerR2
Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.95. Please mark this comment with 👍 or 👎 to give our bot feedback!
Test issue copied from scikit-learn/scikit-learn#13587
Description
BaggingClassifier uses Class Label as Index to Array when Voting
Steps/Code to Reproduce
Provide a base estimator to BaggingClassifier that does not define the function predict_proba. This results in BaggingClassifier resorting to voting. It appears the code for performing voting uses class labels as array indices instead of looking up the index of the class label in the classes_ member.
Example:
Expected Results
In the above code snippet, BaggingClassifier should return an array of True since it is the majority prediction of all ensemble members.
Actual Results
BaggingClassifier returns an array of False. This issue only occurs when the base estimator does not define the function predict_proba.
The issue appears to be due to lines 137 and 140 in ensemble/bagging.py.
scikit-learn/sklearn/ensemble/bagging.py
Line 137 in e14ac6d
predictions = estimator.predict(X[:, features])
scikit-learn/sklearn/ensemble/bagging.py
Line 140 in e14ac6d
proba[i, predictions[i]] += 1
The predictions of the ensemble members are directly used as indices into the original array. I'm guessing the prediction labels need to be converted into class labels using estimator.classes_.
Versions
System:
python: 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
executable: C:\Users\XXXXXX\Anaconda3\pythonw.exe
machine: Windows 2012 ServerR2
BLAS:
macros:
lib_dirs:
cblas_libs: cblas
Python deps:
pip: 18.1
setuptools: 40.6.3
sklearn: 0.20.1
numpy: 1.15.4
scipy: 1.1.0
Cython: 0.29.2
pandas: 0.23.4
testing: scikit-learn/scikit-learn#13587
The text was updated successfully, but these errors were encountered: