You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using GridsearchCV with multimetric scoring and a callable as refit, the GridsearchCV.score function doesn't works since score = self.scorer_[self.refit] if self.multimetric_ else self.scorer_ seems to wait only for a string in case of multimetric scoring
Steps/Code to Reproduce
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
import numpy as np
def get_best_index(cv_results):
best_rank_mask = cv_results['rank_test_roc_auc'] == cv_results['rank_test_roc_auc'].min()
params_best_score = np.array(cv_results['params'])[best_rank_mask]
params_name = params_best_score[0].keys()
classifier_params_names = [_ for _ in params_name if 'classifier' in _]
if 'classifier__C' in classifier_params_names or 'classifier__base_estimator__C' in classifier_params_names:
classifier_params = [_['classifier__C' if 'classifier__C' in classifier_params_names else 'classifier__base_estimator__C'] for _ in params_best_score]
params_best_score = params_best_score[classifier_params == min(classifier_params)]
best_params = params_best_score[0]
best_index = int(np.where(np.array(cv_results['params']) == best_params)[0])
return best_index
breast = load_breast_cancer()
X = breast.data
y = breast.target
cv = RepeatedStratifiedKFold(5, 2, random_state=111)
params_dic = {'C': np.arange(0.1, 1.1, 0.1)}
clf = GridSearchCV(LogisticRegression(penalty='l2', max_iter=1e5, solver='saga'), params_dic, scoring=['roc_auc', 'accuracy'], cv=cv, refit=get_best_index, n_jobs=4)
clf.fit(X,y)
clf.score(X,y)
Actual Results
File "C:\Users\Tim\Anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 447, in score
score = self.scorer_[self.refit] if self.multimetric_ else self.scorer_
KeyError: <function get_best_index at 0x0000028DD66BDBF8>
Expected Results
Since refit is a callable I don't know how he could know which metric to choose for scoring., However, if I give a string with the metric I chose, i.e. 'roc_auc', to refit argument the best index won't be chosen in the way I want. Maybe in case of multimetric scoring and callable refit, ask for dictionnary instead like {score: callable} and the score will be used in GridsearchCV.score ?
You're right, I can confirm this is a bug. But it seems there's no way for score to work if refit is a callable. I suppose that was under-thought on my part.
Describe the bug
When using GridsearchCV with multimetric scoring and a callable as refit, the
GridsearchCV.score
function doesn't works sincescore = self.scorer_[self.refit] if self.multimetric_ else self.scorer_
seems to wait only for a string in case of multimetric scoringSteps/Code to Reproduce
Actual Results
File "C:\Users\Tim\Anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 447, in score
score = self.scorer_[self.refit] if self.multimetric_ else self.scorer_
KeyError: <function get_best_index at 0x0000028DD66BDBF8>
Expected Results
Since refit is a callable I don't know how he could know which metric to choose for scoring., However, if I give a string with the metric I chose, i.e. 'roc_auc', to refit argument the best index won't be chosen in the way I want. Maybe in case of multimetric scoring and callable refit, ask for dictionnary instead like {score: callable} and the score will be used in GridsearchCV.score ?
Versions
System:
python: 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
executable: C:\Users\Tim\Anaconda3\python.exe
machine: Windows-10-10.0.18362-SP0
Python dependencies:
pip: 20.0.2
setuptools: 39.1.0
sklearn: 0.22.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.28.2
pandas: 1.0.0
matplotlib: 2.2.2
joblib: 0.14.1
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: