New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] MLPRegressor
quits fitting too soon due to self._no_improvement_count
#9457
Conversation
MLPRegressor
quits fitting too soon due to self._no_improvement_count
MLPRegressor
quits fitting too soon due to self._no_improvement_count
In the issue you also express concern that current performance is being
compared to the best, not the most recent. Is that something you hope to
address?
In #7071 we are considering the parameter name n_iter_no_change for
detecting convergence in gradient boosting. (I'm not sure if this is better
or worse than no_improvement_count.) If the functionality is similar we
should adopt the same parameter name. It will stop optimisation early only
if the current performance is not better than any of the previous
n_iter_no_change evaluations.
Otherwise, please add a test.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add documentation and tests, and adhere to PEP8.
@jnothman, regarding the |
MLPRegressor
quits fitting too soon due to self._no_improvement_count
MLPRegressor
quits fitting too soon due to self._no_improvement_count
MLPRegressor
quits fitting too soon due to self._no_improvement_count
MLPRegressor
quits fitting too soon due to self._no_improvement_count
MLPRegressor
quits fitting too soon due to self._no_improvement_count
MLPRegressor
quits fitting too soon due to self._no_improvement_count
you should activate flake8 integration in your editor ;) |
Given that 2 is quite low, should we change the default? That would change behavior, though probably not in a bad way. And I don't think we can guarantee that mlp optimization will stay unchanged between versions anyhow. |
PyCharm didn't catch the doctest flake8 errors :( Otherwise, yes, 2 is quite low for a default (1% of the default Regardless, having the |
@nnadeau if you don't want early stopping, you can also just do |
looks good but I'm not sure how to set the default. Maybe @ogrisel has an opinion? |
@@ -536,15 +538,17 @@ def _fit_stochastic(self, X, y, activations, deltas, coef_grads, | |||
# for learning rate that needs to be updated at iteration end | |||
self._optimizer.iteration_ends(self.t_) | |||
|
|||
if self._no_improvement_count > 2: | |||
if self._no_improvement_count > self.n_iter_no_change: | |||
# not better than last two iterations by tol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nitpick: this comment is out of date.
@amueller, my issue arises with the default
|
@nnadeau oh wow... right... never mind.. Then I think we should definitely do 5 or 10. |
@@ -1200,6 +1209,10 @@ class MLPRegressor(BaseMultilayerPerceptron, RegressorMixin): | |||
epsilon : float, optional, default 1e-8 | |||
Value for numerical stability in adam. Only used when solver='adam' | |||
|
|||
n_iter_no_change : int, optional, default 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default documented is wrong (as the default has changed). Can you update this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost good for me.
- Can you please add an entry in
doc/whats_new/v0.20.rst
? - Can you add a parameter check in
_validate_hyperparameters
and a test intest_params_errors
to make sure the check is present.
assert_greater(max_iter, clf.n_iter_) | ||
|
||
|
||
def test_n_iter_no_change_inf(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add @ignore_warnings(category=ConvergenceWarning)
to ignore the warning as max_iter
is reached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
but
I just want to stress that this will break user's code. So either:
- we consider it as a bug fix
- we keep previous behavior (
n_iter_no_change=2
) and warn for future change in default value.
I am not a fan of future warnings, but I wonder if it was really a bug...
@TomDLT, good point. Considering we will now have an explicit With that in mind, I'd vote to reduce |
This reverts commit 0c1f5b6.
MLPRegressor
quits fitting too soon due to self._no_improvement_count
MLPRegressor
quits fitting too soon due to self._no_improvement_count
I think it would be ok considering it a bugfix and going to 10. The heuristic doesn't really work that well with 2. And yes, we're changing results, but having the weights learned by neural networks be stable over releases is not really something I want to promise.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm apart from nitpicks
doc/whats_new/v0.20.rst
Outdated
:class:`multilayer_perceptron.BaseMultilayerPerceptron`, | ||
:class:`multilayer_perceptron.MLPRegressor`, and | ||
:class:`multilayer_perceptron.MLPClassifier` to give control over | ||
maximum number of epochs to not meet `tol` improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks for tol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be neural_network
everywhere instead of multilayer_perceptron
doc/whats_new/v0.20.rst
Outdated
@@ -96,6 +103,16 @@ Classifiers and regressors | |||
identical X values. | |||
:issue:`9432` by :user:`Dallas Card <dallascard>` | |||
|
|||
- Fixed a bug in :class:`multilayer_perceptron.BaseMultilayerPerceptron`, | |||
:class:`multilayer_perceptron.MLPRegressor`, and | |||
:class:`multilayer_perceptron.MLPClassifier` with new `n_iter_no_change` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backtick for n_iter_no_change
.
by at least tol for two consecutive iterations, unless `learning_rate` | ||
is set to 'adaptive', convergence is considered to be reached and | ||
training stops. | ||
by at least tol for `n_iter_no_change` consecutive iterations, unless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks or no backticks?
@@ -1201,6 +1215,12 @@ class MLPRegressor(BaseMultilayerPerceptron, RegressorMixin): | |||
epsilon : float, optional, default 1e-8 | |||
Value for numerical stability in adam. Only used when solver='adam' | |||
|
|||
n_iter_no_change : int, optional, default 10 | |||
Maximum number of epochs to not meet `tol` improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks
Final remark: as we changed the default value to 10, we could warn this behavior change in the |
Thanks @nnadeau ! |
@TomDLT thanks to you! Keep up the great work everyone! |
Reference Issue
MLPRegressor
quits fitting too soon due toself._no_improvement_count
#9456What does this implement/fix? Explain your changes.
self._no_improvement_count
.self._no_improvement_count
by settingself.no_improvement_limit
tonp.inf
.MLPRegressor
will not quit fitting unexpectedly early due to local minima or fluctuations.Any other comments?
self.no_improvement_limit
was not set as an__init__
argument since this might cause unknown feature breaks.