Make it possible to pickle a fit `GridSearchCV` and `RandomizedSearchCV` #1801

wants to merge 2 commits into


None yet
4 participants

jnothman commented Mar 22, 2013

Fix for #1789:

  • don't make best_estimator_'s methods available by copying its methods. Instead link to its attributes by properties (also, added properties for decision_function and transform where predict, predict_proba and score had been available).
  • and don't declare CVScoresTuple in a closure.

jnothman added some commits Mar 22, 2013

ENH/FIX make best_estimator_'s predict functions available in paramet…
…er search

Avoids copying an unpicklable method to parameter search's __dict__.

Also adds decision_function and transform where only predict and predict_proba were available before.
FIX make *SearchCV picklable
Define CVScoreTuple in the module namespace.

amueller commented Mar 22, 2013

Great, I think this is the right solution. Not sure we need the underscore in front of the CVScoresTuple, though. Could you please make the test not only smoke tests? For example unpickle and see if the results of predict are the same.
Also you can assert that the results of transform and decision_function are the same as the ones of best_estimator_. Which estimator did you test transform with? That only works with LDA ,right?

One downside of your solution is that it is no longer possible to ducktype the presence of decision_function (for example). Not sure this would be possible, though. If an base estimator doesn't provide a given function, an AttributeError is rased, correct?


jnothman commented Mar 23, 2013

I'll add some non-smoke testing of the pickle soon.

I'm only smoke-testing transform, etc. with test_grid_search.MockClassifier. And estimators with L1 regularisation also implement transform.

To answer your second point (I hadn't been sure of the answer myself):

class O(object):
  def a(self):
    return self.b
>>> o = O()
>>> hasattr(o, 'a')
>>> o.b = 5
>>> hasattr(o, 'a')

I.e. if AttributeError is raised in property.__get__, hasattr will return False.

This too should be tested somehow, but I'm not sure if it's for this patch, or generally an issue for meta-classifiers and functions only available after fit().

This suggests that anything relying on best_estimator_ is best implemented as a property (including score)!


jnothman commented Mar 23, 2013

Actually it looks like GridSearchCV.score is broken: it uses a deprecated definition of scorer, and only when the classifier doesn't have a score function. Is that correct, or useful for any particular purpose?

And rethinking, I'm not really sure how/why to test pickling: isn't it enough to assume that if pickle works and is tested, as long as no error occurs it's fine?


larsmans commented Mar 27, 2013

LGTM, 👍 for merge.

(@GaelVaroquaux might like to give his opinion too, since this involves properties...)


GaelVaroquaux commented Mar 27, 2013

(@GaelVaroquaux might like to give his opinion too, since this involves

Valid usecase for properties: adaption pattern, which is what they are
for IMHO.


larsmans commented Mar 27, 2013

@amueller Merge?


amueller commented Mar 31, 2013

Sorry, have been offline for week as you might have noticed. LGTM.

@jnothman If the runtime of a real test is basically the same as the smoke test, I don't see why we shouldn't do it. No idea what could break if you don't test for it ;)


amueller commented Apr 2, 2013

Merged by rebase. Thanks :)

@amueller amueller closed this Apr 2, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment