GridsearchCV.score with multimetric scoring and callable refit #17058

TimZaragori · 2020-04-27T10:54:52Z

Describe the bug

When using GridsearchCV with multimetric scoring and a callable as refit, the GridsearchCV.score function doesn't works since score = self.scorer_[self.refit] if self.multimetric_ else self.scorer_ seems to wait only for a string in case of multimetric scoring

Steps/Code to Reproduce

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
import numpy as np

def get_best_index(cv_results):
    best_rank_mask = cv_results['rank_test_roc_auc'] == cv_results['rank_test_roc_auc'].min()
    params_best_score = np.array(cv_results['params'])[best_rank_mask]
    params_name = params_best_score[0].keys()
    classifier_params_names = [_ for _ in params_name if 'classifier' in _]
    if 'classifier__C' in classifier_params_names or 'classifier__base_estimator__C' in classifier_params_names:
        classifier_params = [_['classifier__C' if 'classifier__C' in classifier_params_names else 'classifier__base_estimator__C'] for _ in params_best_score]
        params_best_score = params_best_score[classifier_params == min(classifier_params)]
    best_params = params_best_score[0]
    best_index = int(np.where(np.array(cv_results['params']) == best_params)[0])
    return best_index

breast = load_breast_cancer()
X = breast.data
y = breast.target
cv = RepeatedStratifiedKFold(5, 2, random_state=111)
params_dic = {'C': np.arange(0.1, 1.1, 0.1)}
clf = GridSearchCV(LogisticRegression(penalty='l2', max_iter=1e5, solver='saga'), params_dic, scoring=['roc_auc', 'accuracy'], cv=cv, refit=get_best_index, n_jobs=4)
clf.fit(X,y)
clf.score(X,y)

Actual Results

File "C:\Users\Tim\Anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 447, in score
score = self.scorer_[self.refit] if self.multimetric_ else self.scorer_
KeyError: <function get_best_index at 0x0000028DD66BDBF8>

Expected Results

Since refit is a callable I don't know how he could know which metric to choose for scoring., However, if I give a string with the metric I chose, i.e. 'roc_auc', to refit argument the best index won't be chosen in the way I want. Maybe in case of multimetric scoring and callable refit, ask for dictionnary instead like {score: callable} and the score will be used in GridsearchCV.score ?

Versions

System:
python: 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
executable: C:\Users\Tim\Anaconda3\python.exe
machine: Windows-10-10.0.18362-SP0
Python dependencies:
pip: 20.0.2
setuptools: 39.1.0
sklearn: 0.22.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.28.2
pandas: 1.0.0
matplotlib: 2.2.2
joblib: 0.14.1
Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

jnothman · 2020-04-27T12:02:14Z

You're right, I can confirm this is a bug. But it seems there's no way for score to work if refit is a callable. I suppose that was under-thought on my part.

TimZaragori · 2020-04-27T12:46:56Z

For my personal use (sklearn wrapper for nested cross validation) I tried to implement it this way : https://github.com/TimZaragori/Sklearn_NestedCV/blob/master/Statistical_analysis/nested_cv.py#L283 with the scoring function (score of model after refit on all data as GridsearchCV but after whole nested cross validation) : https://github.com/TimZaragori/Sklearn_NestedCV/blob/master/Statistical_analysis/nested_cv.py#L368
and in inner loops where I directly use GridsearchCV I am retrieving the scores like that : https://github.com/TimZaragori/Sklearn_NestedCV/blob/master/Statistical_analysis/nested_cv.py#L337

However I don't know if it can help you and in which extend it can be implemented in GridsearchCV

TimZaragori added the Bug: triage label Apr 27, 2020

jnothman added Bug and removed Bug: triage labels Apr 27, 2020

cmarmo added the module:model_selection label May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridsearchCV.score with multimetric scoring and callable refit #17058

GridsearchCV.score with multimetric scoring and callable refit #17058

TimZaragori commented Apr 27, 2020 •

edited

jnothman commented Apr 27, 2020

TimZaragori commented Apr 27, 2020

GridsearchCV.score with multimetric scoring and callable refit #17058

GridsearchCV.score with multimetric scoring and callable refit #17058

Comments

TimZaragori commented Apr 27, 2020 • edited

Describe the bug

Steps/Code to Reproduce

Actual Results

Expected Results

Versions

jnothman commented Apr 27, 2020

TimZaragori commented Apr 27, 2020

TimZaragori commented Apr 27, 2020 •

edited