[MRG+1] `MLPRegressor` quits fitting too soon due to `self._no_improvement_count` #9457

engnadeau · 2017-07-27T13:21:26Z

Reference Issue

Fixes MLPRegressor quits fitting too soon due to self._no_improvement_count #9456

What does this implement/fix? Explain your changes.

The ability to set/tune the limit of self._no_improvement_count.
The ability to ignore self._no_improvement_count by setting self.no_improvement_limit to np.inf.
MLPRegressor will not quit fitting unexpectedly early due to local minima or fluctuations.

Any other comments?

~~self.no_improvement_limit was not set as an __init__ argument since this might cause unknown feature breaks.~~
A magic number was removed from the code.
Thanks for the awesome package and hard work! :)

jnothman · 2017-07-27T13:39:15Z

In the issue you also express concern that current performance is being compared to the best, not the most recent. Is that something you hope to address? In #7071 we are considering the parameter name n_iter_no_change for detecting convergence in gradient boosting. (I'm not sure if this is better or worse than no_improvement_count.) If the functionality is similar we should adopt the same parameter name. It will stop optimisation early only if the current performance is not better than any of the previous n_iter_no_change evaluations. Otherwise, please add a test.

jnothman

Please add documentation and tests, and adhere to PEP8.

engnadeau · 2017-08-02T12:38:04Z

@jnothman, regarding the recent vs. best performance comparison, I believe this should potentially be discussed/addressed in a separate (but related) PR. For now, the small change in allowing the modification of n_iter_no_change is enough.

amueller · 2017-08-04T21:46:50Z

you should activate flake8 integration in your editor ;)

amueller · 2017-08-04T21:48:47Z

Given that 2 is quite low, should we change the default? That would change behavior, though probably not in a bad way. And I don't think we can guarantee that mlp optimization will stay unchanged between versions anyhow.

engnadeau · 2017-08-05T22:07:05Z

PyCharm didn't catch the doctest flake8 errors :(

Otherwise, yes, 2 is quite low for a default (1% of the default max_iter=200). I would personally bump the default value to 5 or 10 (average PC performance should be considered).

Regardless, having the n_iter_no_change hyperparameter as settable, specifically when equal to np.inf, allows the MLP to behave like keras, where the KerasRegressor runs until epochs is reached.

engnadeau · 2017-08-15T14:17:54Z

@amueller @jnothman, anything else you'd like me to look over in this PR?

amueller · 2017-08-15T16:59:43Z

@nnadeau if you don't want early stopping, you can also just do early_stopping=False, which is the default. Why would you set early_stopping=True and n_iter_no_change=np.inf? That just makes your training set smaller, right? I guess you just want to monitor validation set error?

amueller · 2017-08-15T17:01:25Z

looks good but I'm not sure how to set the default. Maybe @ogrisel has an opinion?

NelleV · 2017-08-15T16:33:59Z

sklearn/neural_network/multilayer_perceptron.py

@@ -536,15 +538,17 @@ def _fit_stochastic(self, X, y, activations, deltas, coef_grads,
                # for learning rate that needs to be updated at iteration end
                self._optimizer.iteration_ends(self.t_)

-                if self._no_improvement_count > 2:
+                if self._no_improvement_count > self.n_iter_no_change:
                    # not better than last two iterations by tol.


Tiny nitpick: this comment is out of date.

engnadeau · 2017-08-15T17:18:39Z

@amueller, my issue arises with the default early_stopping=False.

early_stopping=False does not change the behaviour of _update_no_improvement_count() and self._no_improvement_count. The real issue is the if self._no_improvement_count > 2 check which completely disregards early_stopping.

amueller · 2017-08-15T17:21:03Z

@nnadeau oh wow... right... never mind.. Then I think we should definitely do 5 or 10.

engnadeau · 2017-08-15T17:23:10Z

@NelleV see 4ae5e2f for touch-up :)

engnadeau · 2017-08-16T00:36:23Z

@amueller see 6a04585 for new default value of 10

NelleV · 2017-08-16T21:05:53Z

sklearn/neural_network/multilayer_perceptron.py

@@ -1200,6 +1209,10 @@ class MLPRegressor(BaseMultilayerPerceptron, RegressorMixin):
    epsilon : float, optional, default 1e-8
        Value for numerical stability in adam. Only used when solver='adam'

+    n_iter_no_change : int, optional, default 2


The default documented is wrong (as the default has changed). Can you update this?

@NelleV engnadeau@524f384

engnadeau · 2017-08-23T15:16:52Z

@NelleV documentation fixed in 524f384

engnadeau · 2017-09-27T11:47:51Z

@NelleV @amueller, I'm back from vacation 🌴; anything else needed for this PR?

TomDLT

Almost good for me.

Can you please add an entry in doc/whats_new/v0.20.rst?
Can you add a parameter check in _validate_hyperparameters and a test in test_params_errors to make sure the check is present.

TomDLT · 2017-10-12T12:26:15Z

sklearn/neural_network/tests/test_mlp.py

+        assert_greater(max_iter, clf.n_iter_)
+
+
+def test_n_iter_no_change_inf():


please add @ignore_warnings(category=ConvergenceWarning) to ignore the warning as max_iter is reached

…-learn#7071

engnadeau · 2017-10-18T12:41:47Z

@TomDLT see the following commits for the requested changes:

whats_new -> 5414867
Parameters -> 5ef758e
versionadded -> a8d2931
ignore_warnings -> 97aabfc

TomDLT

LGTM

but

I just want to stress that this will break user's code. So either:

we consider it as a bug fix
we keep previous behavior (n_iter_no_change=2) and warn for future change in default value.

I am not a fan of future warnings, but I wonder if it was really a bug...

engnadeau · 2017-10-18T22:28:15Z

@TomDLT, good point. Considering we will now have an explicit n_iter_no_change hyperparameter, users can easily see and be aware of the potential limitations of n_iter_no_change=2.

With that in mind, I'd vote to reduce n_iter_no_change back to the original value of 2 in order to avoid code breaks. Whether or not to increase the hyperparameter in future releases may be considered a separate issue.

This reverts commit 0c1f5b6.

engnadeau · 2017-10-19T15:19:09Z

@TomDLT see e60701a for revert back to n_iter_no_change=2

amueller · 2017-10-27T15:21:45Z

I think it would be ok considering it a bugfix and going to 10. The heuristic doesn't really work that well with 2. And yes, we're changing results, but having the weights learned by neural networks be stable over releases is not really something I want to promise....

This reverts commit e60701a.

amueller

lgtm apart from nitpicks

amueller · 2017-10-27T19:38:56Z

doc/whats_new/v0.20.rst

+  :class:`multilayer_perceptron.BaseMultilayerPerceptron`,
+  :class:`multilayer_perceptron.MLPRegressor`, and
+  :class:`multilayer_perceptron.MLPClassifier` to give control over
+  maximum number of epochs to not meet `tol` improvement.


double backticks for tol

Should be neural_network everywhere instead of multilayer_perceptron

amueller · 2017-10-27T19:39:55Z

doc/whats_new/v0.20.rst

@@ -96,6 +103,16 @@ Classifiers and regressors
  identical X values.
  :issue:`9432` by :user:`Dallas Card <dallascard>`

+- Fixed a bug in :class:`multilayer_perceptron.BaseMultilayerPerceptron`,
+  :class:`multilayer_perceptron.MLPRegressor`, and
+  :class:`multilayer_perceptron.MLPClassifier` with new `n_iter_no_change`


double backtick for n_iter_no_change.

amueller · 2017-10-27T19:40:22Z

sklearn/neural_network/multilayer_perceptron.py

-        by at least tol for two consecutive iterations, unless `learning_rate`
-        is set to 'adaptive', convergence is considered to be reached and
-        training stops.
+        by at least tol for `n_iter_no_change` consecutive iterations, unless


double backticks or no backticks?

amueller · 2017-10-27T19:40:32Z

sklearn/neural_network/multilayer_perceptron.py

@@ -1201,6 +1215,12 @@ class MLPRegressor(BaseMultilayerPerceptron, RegressorMixin):
    epsilon : float, optional, default 1e-8
        Value for numerical stability in adam. Only used when solver='adam'

+    n_iter_no_change : int, optional, default 10
+        Maximum number of epochs to not meet `tol` improvement.


double backticks

engnadeau · 2017-10-28T05:59:34Z

@amueller & @TomDLT

n_iter_no_change: 2 -> 10 in cc85e6a
bug fix labelled in d913172
backticks and nitpicks in e78fb78

TomDLT · 2017-10-28T10:44:46Z

Final remark: as we changed the default value to 10, we could warn this behavior change in the Changed models section of whats_new.

engnadeau · 2017-10-28T21:37:20Z

@TomDLT see 443c348 for Changed Models update

TomDLT · 2017-10-29T16:16:10Z

Thanks @nnadeau !

engnadeau · 2017-10-29T22:54:21Z

@TomDLT thanks to you! Keep up the great work everyone!

…ement_count` (scikit-learn#9457)

engnadeau changed the title ~~[MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count~~ [WIP] MLPRegressor quits fitting too soon due to self._no_improvement_count Jul 27, 2017

jnothman reviewed Jul 30, 2017

View reviewed changes

engnadeau changed the title ~~[WIP] MLPRegressor quits fitting too soon due to self._no_improvement_count~~ [MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count Aug 2, 2017

engnadeau changed the title ~~[MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count~~ [WIP] MLPRegressor quits fitting too soon due to self._no_improvement_count Aug 3, 2017

engnadeau changed the title ~~[WIP] MLPRegressor quits fitting too soon due to self._no_improvement_count~~ [MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count Aug 4, 2017

NelleV reviewed Aug 15, 2017

View reviewed changes

NelleV reviewed Aug 16, 2017

View reviewed changes

This was referenced Oct 12, 2017

[MRG] add n_iter_no_change parameter in MLP #9914

Closed

[MRG+1] Add a stopping criterion in SGD, based on the score on a validation set #9043

Merged

TomDLT reviewed Oct 12, 2017

View reviewed changes

nnadeau and others added 6 commits October 17, 2017 13:17

_no_improvement_count limit not a magic number

c32075e

added no_improvement_limit as __init__ arguments

04e568a

fixed early quitting messages with respect to mutable limit

011facd

renamed no_improvement_limit to n_iter_no_change to align with scikit…

a2aab9c

…-learn#7071

added n_iter_no_change tests

905f64c

added test_n_iter_no_change_inf()

a9fbfd9

nnadeau added 3 commits October 17, 2017 13:17

updated test_params_errors and _validate_hyperparameters

5ef758e

added .. versionadded:: 0.20

a8d2931

added @ignore_warnings(category=ConvergenceWarning)

97aabfc

engnadeau force-pushed the patch-1 branch from 524f384 to 97aabfc Compare October 17, 2017 17:19

fixed flake8 error

c289791

TomDLT reviewed Oct 18, 2017

View reviewed changes

Revert "updated default value of n_iter_no_change: 2 -> 10"

e60701a

This reverts commit 0c1f5b6.

TomDLT approved these changes Oct 19, 2017

View reviewed changes

TomDLT changed the title ~~[MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count~~ [MRG+1] MLPRegressor quits fitting too soon due to self._no_improvement_count Oct 19, 2017

Merge branch 'master' into patch-1

558ed2e

Nicholas Nadeau and others added 3 commits October 27, 2017 14:58

Revert "Revert "updated default value of n_iter_no_change: 2 -> 10""

cc85e6a

This reverts commit e60701a.

Merge branch 'master' into patch-1

ab0e5a7

added n_iter_no_change bugfix to whats_new

d913172

amueller reviewed Oct 27, 2017

View reviewed changes

fixing double backticks for documentation

e78fb78

added bugfix to changed models section

443c348

TomDLT merged commit de29f3f into scikit-learn:master Oct 29, 2017

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] MLPRegressor quits fitting too soon due to `self._no_improv…

f0574b9

…ement_count` (scikit-learn#9457)

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] MLPRegressor quits fitting too soon due to `self._no_improv…

bd7bda2

…ement_count` (scikit-learn#9457)

jnothman mentioned this pull request Jan 31, 2018

[MRG] added variation_stop parameter to multylayer perceptron #6518

Closed

amueller mentioned this pull request May 22, 2018

MLP calculate improvement on more than two consecutive epochs #6512

Closed

		assert_greater(max_iter, clf.n_iter_)


		def test_n_iter_no_change_inf():

[MRG+1] MLPRegressor quits fitting too soon due to self._no_improvement_count #9457

[MRG+1] MLPRegressor quits fitting too soon due to self._no_improvement_count #9457

Conversation

engnadeau commented Jul 27, 2017 • edited

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

jnothman commented Jul 27, 2017 via email • edited by TomDLT

jnothman left a comment

Choose a reason for hiding this comment

engnadeau commented Aug 2, 2017

amueller commented Aug 4, 2017

amueller commented Aug 4, 2017

engnadeau commented Aug 5, 2017

engnadeau commented Aug 15, 2017

amueller commented Aug 15, 2017

amueller commented Aug 15, 2017

Choose a reason for hiding this comment

engnadeau commented Aug 15, 2017

amueller commented Aug 15, 2017

engnadeau commented Aug 15, 2017

engnadeau commented Aug 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

engnadeau commented Aug 23, 2017

engnadeau commented Sep 27, 2017

TomDLT left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

engnadeau commented Oct 18, 2017

TomDLT left a comment

Choose a reason for hiding this comment

engnadeau commented Oct 18, 2017

engnadeau commented Oct 19, 2017

amueller commented Oct 27, 2017

amueller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

engnadeau commented Oct 28, 2017

TomDLT commented Oct 28, 2017

engnadeau commented Oct 28, 2017

TomDLT commented Oct 29, 2017

engnadeau commented Oct 29, 2017

[MRG+1] `MLPRegressor` quits fitting too soon due to `self._no_improvement_count` #9457

[MRG+1] `MLPRegressor` quits fitting too soon due to `self._no_improvement_count` #9457

engnadeau commented Jul 27, 2017 •

edited

jnothman commented Jul 27, 2017 via email •

edited by TomDLT