Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

Closed
wants to merge 8 commits into from

Conversation

@raghavrv
Copy link
Member

raghavrv commented Jul 11, 2017

Attempt to memoize the prediction result as currently it calls predict for each scorer.

Also address previous comments of @jnothman at #7388

cc: @amueller @jnothman @agramfort

@raghavrv

This comment has been minimized.

Copy link
Member Author

raghavrv commented Jul 16, 2017

Memoization done. Speedups are quite good when the prediction is time consuming.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, random_state=0)
gs = GridSearchCV(KNeighborsClassifier(),
                  param_grid={'n_neighbors': [4, 5, 6],
                              'p': [2, 3],
                              'algorithm': ['kd_tree', 'ball_tree']},
                  scoring=['accuracy', 'precision'], refit='accuracy')

Master

>>> %timeit gs.fit(X, y)
1 loop, best of 3: 31.5 s per loop

this branch

>>> %timeit gs.fit(X, y)
1 loop, best of 3: 15.6 s per loop
def _multimetric_score(estimator, X_test, y_test, scorers):
"""Return a dict of score for multimetric scoring"""
scores = {}

# Try wrapping the estimator in _MemoizedPredictEstimator if we don't use
# the pass-through scorer
uses_score_method = any([scorer is _passthrough_scorer

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 16, 2017

Member

Why do we not do it in the passthrough case?

This comment has been minimized.

Copy link
@raghavrv

raghavrv Jul 18, 2017

Author Member

For passthrough it uses the score method, which can't be (trivially) memoized.

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 19, 2017

Member

Yes, but is there any harm? Why is this any, not all?

This comment has been minimized.

Copy link
@raghavrv

raghavrv Jul 19, 2017

Author Member

Oh yea if the score function is going to be used, then it can be wrapped by this _MemoizedPredictEstimator safely.

if not hasattr(self, '_decisions'):
self._decisions = self.estimator.decision_function(X)
return self._decisions

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 16, 2017

Member

No predict_proba or predict_log_proba? Are you these call decidion_function internally? Probabilistic non-linear predictors will be among those most benefiting from memoization, and I don't think they tend to implement decision_function.

Copy link
Member

jnothman left a comment

Well your memoized predictor then needs to delegate score. In fact, to be compatible with every scorer someone might conceive of, it needs to delegate everything with __getattr__ magic.

In a similar vein you will need to support kwargs like predict_std in delegating.

@amueller amueller added this to PR phase in Andy's pets Jul 21, 2017
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 21, 2017

If there are any unrelated things inhere as you indicate, please put them in a separate PR.

@raghavrv

This comment has been minimized.

Copy link
Member Author

raghavrv commented Jul 24, 2017

@amueller This is ready for merge as such. No unrelated stuff. Just a few comments from last PR addressed.

@raghavrv raghavrv changed the title [WIP] Multimetric GridSearch - Memoize prediction results (and address some previous comments) [MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) Jul 24, 2017
@@ -242,14 +242,14 @@ permitted and will require a wrapper to return a single metric::
>>> # A sample toy binary classification dataset
>>> X, y = datasets.make_classification(n_classes=2, random_state=0)
>>> svm = LinearSVC(random_state=0)
>>> tp = lambda y_true, y_pred: confusion_matrix(y_true, y_pred)[0, 0]

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 24, 2017

Member

I think the point was that if you want to change this for 0.19, make it a separate PR (or just commit to master).

The multi-metric stuff won't go into 0.19 as features are frozen.

This comment has been minimized.

Copy link
@raghavrv

raghavrv Jul 24, 2017

Author Member

But this need not go into 0.19. We can release this for 0.20?

This comment has been minimized.

Copy link
@jnothman

jnothman Jul 24, 2017

Member

Sure, but isn't it unrelated to the rest of the pr?

This comment has been minimized.

Copy link
@raghavrv

raghavrv Jul 24, 2017

Author Member

I'll raise as separate PR then.

@raghavrv

This comment has been minimized.

Copy link
Member Author

raghavrv commented Jul 24, 2017

In fact, to be compatible with every scorer someone might conceive of, it needs to delegate everything with getattr magic. In a similar vein you will need to support kwargs like predict_std in delegating.

this is in addition to memoizing the results for predict_proba, predict_log_proba, predict, decision_function correct?

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 24, 2017

I suppose so!

@amueller amueller added the Superseded label Aug 5, 2019
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 5, 2019

referencing #10802 for posterity

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Sep 11, 2019

fixed in #14593.

@amueller amueller closed this Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Andy's pets
PR phase
3 participants
You can’t perform that action at this time.