Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG+2] Deprecate n_iter in SGDClassifier and implement max_iter #5036
About your remark in #5022, you suggested that we could avoid deprecating
I am +0 for this convenience feature. @amueller do you have an opinion in this regard?
I though it as deprecation, a temporary way not to break any user code, before removing completely
Do you mean to give a validation set with a performance goal as a stopping criterion, directly in the SGD solver?
Classic deprecation is what you did in this PR: raise a DeprecationWarning now while still behaving the same if the user is passing a
Yes early stopping on the lack of improvement as measured on a validation set. The validation set can be specified as number between 0 and 1 (typically 0.1 by default) and the model extracts internally in the fit method by randomly splitting the user provided data into train and validation folds. But this is outside of the scope of this PR.
The handling of
>>> from sklearn.datasets import load_boston >>> from sklearn.linear_model import SGDRegressor >>> from sklearn.utils import gen_batches >>> boston = load_boston() >>> n_samples, n_features = boston.data.shape >>> n_samples, n_features (506, 13) >>> all_batches = list(gen_batches(n_samples, 100)) >>> m = SGDRegressor(max_iter=2) >>> for batch in all_batches: ... m.fit(boston.data[batch], boston.target[batch]) ... >>> m.t_ 13.0
In particular, calling
m = SGDRegressor(max_iter=1, tol=0, shuffle=False, random_state=0) m.fit(boston.data, boston.target)
should be equivalent (same
# max_iter should not impact incremental fitting at all m = SGDRegressor(max_iter=42, shuffle=False, random_state=0) for batch in all_batches: m.partial_fit(boston.data[batch], boston.target[batch])
m = SGDRegressor(max_iter=10, tol=0, shuffle=False, random_state=0) m.fit(boston.data, boston.target)
should be equivalent to:
m = SGDRegressor(max_iter=42, shuffle=False, random_state=0) for i in 10: for batch in all_batches: m.partial_fit(boston.data[batch], boston.target[batch])
The fact that the tests do not fail means that
Edit: solved by scaling the data (#5036 (comment))
Thanks for the review.
About your last comment, it comes from the fact that the SGD converges quite slowly with the boston dataset, as you can see in the following plot (tested on master):
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_boston from sklearn.linear_model import SGDRegressor boston = load_boston() X, y = boston.data, boston.target n_features = X.shape iter_range = np.arange(1, 11) * 1000 coefs = np.zeros((n_features, iter_range.size)) for i, n_iter in enumerate(iter_range): reg = SGDRegressor(n_iter=n_iter).fit(X, y) coefs[:, i] = reg.coef_ for i in range(n_features): plt.plot(iter_range, coefs[i, :]) plt.xlabel("n_iter") plt.ylabel("coefs") plt.show()
I changed the default
Actually I think the code is OK about
>>>all_batches [slice(0, 100, None), slice(100, 200, None), slice(200, 300, None), slice(300, 400, None), slice(400, 500, None), slice(500, 506, None)] >>> for batch in all_batches: ... m.fit(boston.data[batch], boston.target[batch]) ... >>> m.t_ 13.0
from sklearn.datasets import load_boston from sklearn.linear_model import SGDRegressor from sklearn.utils import gen_batches boston = load_boston() n_samples, n_features = boston.data.shape all_batches = list(gen_batches(n_samples, 100)) for max_iter in range(1, 11): # one full pass with fit m1 = SGDRegressor(max_iter=max_iter, tol=0, shuffle=False, random_state=0) m1.fit(boston.data, boston.target) # batches with partial_fit m2 = SGDRegressor(max_iter=42, shuffle=False, random_state=0) for _ in range(max_iter): for batch in all_batches: m2.partial_fit(boston.data[batch], boston.target[batch]) print("%d, %f" % (m1.t_, m1.coef_)) print("%d, %f" % (m2.t_, m2.coef_))
Indeed, adding a
The problem of scaling is rather not linked to
I think this PR is OK, except if I missed your point @ogrisel
Default values are now
Feel free to review :)
I don't see any validation that
tol is working (large
tol should yield smaller
n_iter_ than small
tol), that it validates correctly, or that the warnings are raised appropriately.
I have the following comments. In particular I find the use of
-np.inf and negative tolerance disturbing. Do you really think negative tolerance is useful feature? If not I would enforce positivity and 0 to disable convergence checks.
It seems that something is broken when I try on this binary version of iris:
from sklearn.datasets import load_iris from sklearn.preprocessing import scale from sklearn.linear_model import SGDClassifier iris = load_iris() X, y = scale(iris.data), iris.target y = y != 0 m = SGDClassifier(max_iter=1000, tol=1e-3, verbose=1).fit(X, y) print('effective n_iter:', m.n_iter_)
I get the following output:
I would have expected a larger number of iterations because the loss has decreased by more than
I tweaked the tolerance to no avail.
Good point, the loss accumulator was no reset after each epoch, and was not scaled by
Jun 23, 2017
5 checks passed
well done! it's been a long haul!…
On 24 Jun 2017 5:49 am, "Olivier Grisel" ***@***.***> wrote: Merged