Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross_val_predict returns bad prediction when evaluated on a dataset with very few samples #13366

Closed
gfournier opened this Issue Mar 1, 2019 · 0 comments

Comments

Projects
None yet
2 participants
@gfournier
Copy link
Contributor

gfournier commented Mar 1, 2019

Description

cross_val_predict returns bad prediction when evaluated on a dataset with very few samples on 1 class, causing class being ignored on some CV splits.

Steps/Code to Reproduce

from sklearn.datasets import *
from sklearn.linear_model import *
from sklearn.model_selection import *
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2,
                           random_state=1, n_clusters_per_class=1)
# Change the first sample to a new class
y[0] = 2
clf = LogisticRegression()
cv = StratifiedKFold(n_splits=2, random_state=1)
train, test = list(cv.split(X, y))
yhat_proba = cross_val_predict(clf, X, y, cv=cv, method="predict_proba")
print(yhat_proba)

Expected Results

[[0.06105412 0.93894588 0.        ]
 [0.92512247 0.07487753 0.        ]
 [0.93896471 0.06103529 0.        ]
 [0.04345507 0.95654493 0.        ]

Actual Results

[[0. 0. 0.        ]
 [0. 0. 0.        ]
 [0. 0. 0.        ]
 [0. 0. 0.        ]

Versions

Verified on the scikit latest dev version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.