Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Adds _MultimetricScorer for Optimized Scoring #14593

Merged
merged 40 commits into from Sep 10, 2019

Conversation

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 7, 2019

Reference Issues/PRs

Fixes #10802
Alternative to #10979

What does this implement/fix? Explain your changes.

  1. This PR creates a _MultimetricScorer that subclasses dict which is used to reduce the number of calls to predict, predict_proba, and decision_function.

  2. The public interface of objects and functions using scoring are unchanged.

  3. The cache is only used when it is beneficial to use, as defined in _MultimetricScorer._use_cache.

  4. Users can not create a _MultimetricScorer and pass it into scoring.

Any other comments?

I do have plans to support custom callables that return dictionaries from the user. This was not included in this PR to narrow the scope of this PR to _MultimetricScorer.

thomasjpfan added 22 commits Aug 5, 2019
WIP
WIP
…c_no_dict
…c_no_dict
Copy link
Contributor

NicolasHug left a comment

Looks good, I think this could use a few more comments to describe the logic.

I'm not a huge fan of using None as _method_cacher for it to revert to the default method cacher of _BaseScorer. (Maybe with my suggestions I'd find it clearer, IDK)

Maybe add a sanity check that makes sure that passing another X gives different results even when caching is involved.

scorers = {"score": check_scoring(estimator, scoring=scoring)}
return scorers, False
return _MultimetricScorer(**scorers), False

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

Why return a MultiMetricScorer when there is only one scorer?

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 15, 2019

Author Member

_check_multimetric_scoring always returned a multimetric scorer. (On master it returned a dictionary which was the data structure used to denote "mutlimetric scoring".

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 15, 2019

Contributor

I disagree, a dict with only one key (as here) denotes a single metric scorer. That's the reason is_multimetric is False here.

Since no caching happens with a single-metric scorer, I think we should not change this part and still return the dict, to avoid the confusion.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 15, 2019

Contributor

Or else, MultiMetricScorer should have a whole different name. It doesn't make sense to return a MultiMetricScorer instance while is_multimetric is False

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 15, 2019

Author Member

A user can pass a dictionary to scoring with one key and is_multimetric will be true.

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 15, 2019

Author Member

I do plan on removing is_multimetric. And have "anything that returns a dictionary" as multimetric.

'll2': 'neg_log_loss',
'ra1': 'roc_auc',
'ra2': 'roc_auc'
}, 1, 1, 1), (['roc_auc', 'accuracy'], 1, 0, 1)],

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

for readability maybe separate both cases with a line break

@@ -543,3 +544,41 @@ def test_scoring_is_not_metric():
Ridge(), r2_score)
assert_raises_regexp(ValueError, 'make_scorer', check_scoring,
KMeans(), cluster_module.adjusted_rand_score)


@pytest.mark.parametrize("scorers,predicts,predict_probas,decision_funcs",

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

expected_predict_count, expected_predict_proba_count, ... ?

Long names, I know :/

scorer, _ = _check_multimetric_scoring(LogisticRegression(), scorers)
scores = scorer(mock_est, X, y)

assert set(scorers) == set(scores)

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

Just because I was slightly confused at first:

Suggested change
assert set(scorers) == set(scores)
assert set(scorers) == set(scores) # compare dict keys
return True

if counter[_ThresholdScorer] > 0 and (counter[_PredictScorer] or
counter[_ThresholdScorer]):

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

This is equivalent to

counter[_ThresholdScorer] and (counter[_PredictScorer]

(the or isn't useful)

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 15, 2019

Author Member

This should have been:

        if counter[_ThresholdScorer] and (counter[_PredictScorer] or
                                          counter[_ProbaScorer]):
return scores

def _use_cache(self):
"""Return True if using a cache is desired."""

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

Short description of "desired" please ;)

return self._score(estimator, X, y_true, sample_weight=sample_weight)

def _method_cacher(self, estimator, method, *args, **kwargs):
"""Call estimator directly."""

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor
Suggested change
"""Call estimator directly."""
"""Call estimator's method directly, without caching."""
@@ -44,7 +45,54 @@
from ..base import is_regressor


class _BaseScorer(metaclass=ABCMeta):
class _MultimetricScorer(dict):
"""Callable dictionary for multimetric scoring."""

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

Please briefly describe keys being strings and values being instances of _BaseScorer

I first thought (without looking) that this was also a _BaseScorer instance

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 15, 2019

Author Member

Would have been nice, the passthrough scorer and custom scorers may have weird interfaces, so _MultimetricScorer.__call__ needed to be as generic as possible.

"""
return self._score(estimator, X, y_true, sample_weight=sample_weight)

def _method_cacher(self, estimator, method, *args, **kwargs):

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

I'm confused that this is called _method_cacher. Makes me things that it overrides the MultimetricScorer's _method_cacher, but the logic is slightly different.

Call this _passthrough_method_cacher?

fit_and_score_args = [None, None, None, two_params_scorer]

scorer = _MultimetricScorer(score=two_params_scorer)
fit_and_score_args = [None, None, None, scorer]

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 13, 2019

Contributor

Shouldn't the original list [None, None, None, two_params_scorer] still be tested?

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 15, 2019

Author Member

This is testing a private method _score. On master, a multimetric scoring was represented with a dictionary, which _score used to call the scorers independently. This PR moves this responsibility from _score to _MultimetricScorer. Now _score only needs to call _MultimetricScorer.__call__.

thomasjpfan added 4 commits Aug 15, 2019
…c_no_dict
Copy link
Contributor

NicolasHug left a comment

Mostly nitpicks about comments but LGTM, thanks @thomasjpfan

The whole scoring logic is becoming quite convoluted by now... Might be worth some re-thinking one day.

`_MultimetricScorer` will return a dictionary of scores corresponding to
the scorers in the dictionary. Note `_MultimetricScorer` can be created
with a dictionary with one key.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 16, 2019

Contributor
Suggested change
with a dictionary with one key.
with a dictionary with one key (i.e. only one actual scorer).
return scores

def _use_cache(self, estimator):
"""Return True if using a cache it is beneficial.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 16, 2019

Contributor
Suggested change
"""Return True if using a cache it is beneficial.
"""Return True if using a cache is beneficial.
- `decision_function` and `predict_proba` is called.
"""
if len(self) == 1:

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 16, 2019

Contributor
Suggested change
if len(self) == 1:
if len(self) == 1: # Only one scorer
score : float
Score function applied to prediction of estimator on X.
"""
return self._score(partial(_method_caller, None), estimator, X, y_true,

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 16, 2019

Contributor
Suggested change
return self._score(partial(_method_caller, None), estimator, X, y_true,
return self._score(partial(_method_caller, cache=None), estimator, X, y_true,

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 16, 2019

Author Member

Since cache is a positional argument, partial needs to accept it as a positional argument.

"""Evaluate predicted target values for X relative to y_true.
Parameters
----------
method_caller: callable
Call estimator with method and args and kwargs.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 16, 2019

Contributor
Suggested change
Call estimator with method and args and kwargs.
Call estimator's method with args and kwargs, potentially caching results.

Or anything else that indicates this is used for caching

if is_multimetric:
return _multimetric_score(estimator, X_test, y_test, scorer)
def _score(estimator, X_test, y_test, scorer):
"""Compute the score(s) of an estimator on a given test set."""

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 16, 2019

Contributor

Let's keep the comment about what is returned.

IIUC a dict is returned iff scorer is a dict?

sklearn/model_selection/_validation.py Show resolved Hide resolved

scorer_dict, _ = _check_multimetric_scoring(LogisticRegression(), scorers)
scorer = _MultimetricScorer(**scorer_dict)
scores = scorer(mock_est, X, y)

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 16, 2019

Contributor

I don't think this is possible but it'd be cool to assert that the cache only exists during __call__().

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 16, 2019

Author Member

Since it is scoped in __call__ I do not think it is possible.

Copy link
Member

jnothman left a comment

I find this design of a callable dict that generates a dict uncomfortable. That duplicate use of dicts makes the documentation confusing, apart from anything else.

Is it really justified to make _MultimetricScorer a dict? What functionality of a dict is used? I understand that this may reduce the amount of code here, but I suspect it makes it a little more obfuscated.

@thomasjpfan

This comment has been minimized.

Copy link
Member Author

thomasjpfan commented Aug 25, 2019

The dict feature was needed when _check_multimetric_scoring returned a _MultimetricScorer. Since this was removed, it is not needed anymore.

PR was updated such that _MultimetricScorer is not a dict.

Copy link
Member

jnothman left a comment

Please add a whatsnew

Copy link
Member

jnothman left a comment

Otherwise LGTM



def test_multimetric_scorer_sanity_check():
# scoring dictionary returned is the same as calling each scroer seperately

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 25, 2019

Member
Suggested change
# scoring dictionary returned is the same as calling each scroer seperately
# scoring dictionary returned is the same as calling each scorer seperately
if not isinstance(score, numbers.Number):
raise ValueError(error_msg % (score, type(score), name))
scores[name] = score
else: # scaler

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 25, 2019

Member
Suggested change
else: # scaler
else: # scalar

error_msg = ("scoring must return a number, got %s (%s) "
"instead. (scorer=%s)")
if isinstance(scores, dict):

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 25, 2019

Member

This can return a number or a dict. Can we make all cases return a dict, and delete some code paths? we could just use:

if not isinstance(scores, 'dict'):
    scores = {'score': scores}

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 26, 2019

Member

Okay, I've looked into this and it might be better to consider this as a separate clean-up change.

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Aug 26, 2019

Author Member

This type of change would reduce quite a few code paths. (It would most likely make it nicer to support custom callables that return dictionaries.

@@ -257,6 +257,11 @@ Changelog
- |Enhancement| Allow computing averaged metrics in the case of no true positives.
:pr:`14595` by `Andreas Müller`_.

- |Enhancement| Improved performance of multimetric scoring in

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 25, 2019

Member

Can use |Efficiency|?


error_msg = ("scoring must return a number, got %s (%s) "
"instead. (scorer=%s)")
if isinstance(scores, dict):

This comment has been minimized.

Copy link
@jnothman

jnothman Aug 26, 2019

Member

Okay, I've looked into this and it might be better to consider this as a separate clean-up change.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 26, 2019

oh nice, this is even cleaner :) still lgtm from my side.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 26, 2019

fixes #10823, closes #9326

Copy link
Contributor

NicolasHug left a comment

@thomasjpfan please address minor typos so we can merge ;)

sklearn/metrics/scorer.py Show resolved Hide resolved
@amueller amueller added this to PR phase in Andy's pets Sep 4, 2019
…c_no_dict
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Sep 10, 2019

@thomasjpfan can you fix the merge conflicts again?
@jnothman does this still look good? I'd love to mere this.

…c_no_dict
@jnothman jnothman merged commit fbb2c7c into scikit-learn:master Sep 10, 2019
14 of 16 checks passed
14 of 16 checks passed
ci/circleci: doc Your tests are queued behind your running builds
Details
ci/circleci: doc-min-dependencies Your tests are queued behind your running builds
Details
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 99.27% of diff hit (target 96.71%)
Details
codecov/project 96.72% (+<.01%) compared to 66b0f5f
Details
scikit-learn.scikit-learn Build #20190910.62 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 10, 2019

Thank you @thomasjpfan!!

I look forward to some of the things this enables along the lines of #12385

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Sep 11, 2019

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Andy's pets
PR phase
4 participants
You can’t perform that action at this time.