-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grid search keeps all estimators in memory #565
Comments
+1 or maybe even having an option |
Ideally, Parallel would be an iterator so that we could consume it on the I am wondering if returning the clf_params, rather than the classifier Gael |
I think we can fix the problem by changing the order of the loops: for param in params:
for train, test in cv: instead of for train, test in cv:
for param in params: This way, we can compute the score expectation with respect to a given parameter setting and throw away the estimator right away if the score expectation is worse than the best so far. We need to be able to restart the cv iterator though. |
On Tue, Jan 17, 2012 at 08:44:59AM -0800, Mathieu Blondel wrote:
Actually, we are doing neither: the two loops are sent to the Parallel Gaël |
Returning just |
I think it still would make sens: if we are interested in the grid_search I am +1 for such modification, but I think that it should be discussed on Gael |
+1 for the lightweight params only mode. |
Also +1 for keeping parameters and scores. Maybe there was a reason because previously the score was not saved? |
I had an old branch that kept the scores of each CV fold of the params grid. Which is interesting to be able to estimate the variances. However that branch old had other unrelated stuff and I did not maintain it. |
----- Original message -----
I think that that part has been merged. The rest hasn't. |
Apparently, (thanks @amueller), keeping the estimators also effects pickling GridSearchCV. See mailing list post and below: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/<ipython-input-7-687becba8c7c> in <module>()
----> 1 cPickle.dump(gs, te)
/opt/epd-7.2/lib/python2.7/copy_reg.pyc in _reduce_ex(self, proto)
68 else:
69 if base is self.__class__:
---> 70 raise TypeError, "can't pickle %s objects" % base.__name__
71 state = base(self)
72 args = (self.__class__, base, state)
TypeError: can't pickle instancemethod objects The method causing the problem: > /opt/epd-7.2/lib/python2.7/copy_reg.py(70)_reduce_ex()
69 if base is self.__class__:
---> 70 raise TypeError, "can't pickle %s objects" % base.__name__
71 state = base(self)
ipdb> self
<bound method LinearSVC.predict of LinearSVC(C=1, dual=True,
fit_intercept=True, intercept_scaling=1, loss='l2',
multi_class=False, penalty='l2', tol=0.0001)> and some info about the GridSearchCV itself: print GridSearchCV
GridSearchCV(cv=sklearn.cross_validation.StratifiedKFold(labels=[ 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.], k=10),
estimator=LinearSVC(C=1.0, dual=True, fit_intercept=True,
intercept_scaling=1,
loss='l2', multi_class=False, penalty='l2', tol=0.0001),
fit_params={}, iid=True, loss_func=None, n_jobs=4,
param_grid={'C': [1, 10, 100, 1000, 10000, 100000]},
pre_dispatch='2*n_jobs', refit=True, score_func=None, verbose=1) |
Addressed in #770. |
VisibleDeprecationWarning from Numpy is being ignored already and numpy seems to be moving to FutureWarning as recommended in PEP scikit-learn#565.
I mentioned this issue on the ML a while back but since it hasn't been addressed yet, I create this report to track it.
Basically, grid search uses
Parallel
to computeout
, which is a list of triplets (score, estimator, n_test_samples). Unfortunately, some estimators take a lot of space in memory (e.g., SVCs because the support vectors are copied in each estimator). It would be better to return only the best estimator directly and delete the other ones as they are computed.The text was updated successfully, but these errors were encountered: