Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] MLPRegressor quits fitting too soon due to self._no_improvement_count #9457

Merged
merged 30 commits into from Oct 29, 2017

Conversation

engnadeau
Copy link
Contributor

@engnadeau engnadeau commented Jul 27, 2017

Reference Issue

What does this implement/fix? Explain your changes.

  • The ability to set/tune the limit of self._no_improvement_count.
  • The ability to ignore self._no_improvement_count by setting self.no_improvement_limit to np.inf.
  • MLPRegressor will not quit fitting unexpectedly early due to local minima or fluctuations.

Any other comments?

  • self.no_improvement_limit was not set as an __init__ argument since this might cause unknown feature breaks.
  • A magic number was removed from the code.
  • Thanks for the awesome package and hard work! :)

@engnadeau engnadeau changed the title [MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count [WIP] MLPRegressor quits fitting too soon due to self._no_improvement_count Jul 27, 2017
@jnothman
Copy link
Member

jnothman commented Jul 27, 2017 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation and tests, and adhere to PEP8.

@engnadeau
Copy link
Contributor Author

@jnothman, regarding the recent vs. best performance comparison, I believe this should potentially be discussed/addressed in a separate (but related) PR. For now, the small change in allowing the modification of n_iter_no_change is enough.

@engnadeau engnadeau changed the title [WIP] MLPRegressor quits fitting too soon due to self._no_improvement_count [MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count Aug 2, 2017
@engnadeau engnadeau changed the title [MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count [WIP] MLPRegressor quits fitting too soon due to self._no_improvement_count Aug 3, 2017
@engnadeau engnadeau changed the title [WIP] MLPRegressor quits fitting too soon due to self._no_improvement_count [MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count Aug 4, 2017
@amueller
Copy link
Member

amueller commented Aug 4, 2017

you should activate flake8 integration in your editor ;)

@amueller
Copy link
Member

amueller commented Aug 4, 2017

Given that 2 is quite low, should we change the default? That would change behavior, though probably not in a bad way. And I don't think we can guarantee that mlp optimization will stay unchanged between versions anyhow.

@engnadeau
Copy link
Contributor Author

PyCharm didn't catch the doctest flake8 errors :(

Otherwise, yes, 2 is quite low for a default (1% of the default max_iter=200). I would personally bump the default value to 5 or 10 (average PC performance should be considered).

Regardless, having the n_iter_no_change hyperparameter as settable, specifically when equal to np.inf, allows the MLP to behave like keras, where the KerasRegressor runs until epochs is reached.

@engnadeau
Copy link
Contributor Author

@amueller @jnothman, anything else you'd like me to look over in this PR?

@amueller
Copy link
Member

@nnadeau if you don't want early stopping, you can also just do early_stopping=False, which is the default. Why would you set early_stopping=True and n_iter_no_change=np.inf? That just makes your training set smaller, right? I guess you just want to monitor validation set error?

@amueller
Copy link
Member

looks good but I'm not sure how to set the default. Maybe @ogrisel has an opinion?

@@ -536,15 +538,17 @@ def _fit_stochastic(self, X, y, activations, deltas, coef_grads,
# for learning rate that needs to be updated at iteration end
self._optimizer.iteration_ends(self.t_)

if self._no_improvement_count > 2:
if self._no_improvement_count > self.n_iter_no_change:
# not better than last two iterations by tol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nitpick: this comment is out of date.

@engnadeau
Copy link
Contributor Author

@amueller, my issue arises with the default early_stopping=False.

early_stopping=False does not change the behaviour of _update_no_improvement_count() and self._no_improvement_count. The real issue is the if self._no_improvement_count > 2 check which completely disregards early_stopping.

@amueller
Copy link
Member

@nnadeau oh wow... right... never mind.. Then I think we should definitely do 5 or 10.

@engnadeau
Copy link
Contributor Author

@NelleV see 4ae5e2f for touch-up :)

@engnadeau
Copy link
Contributor Author

@amueller see 6a04585 for new default value of 10

@@ -1200,6 +1209,10 @@ class MLPRegressor(BaseMultilayerPerceptron, RegressorMixin):
epsilon : float, optional, default 1e-8
Value for numerical stability in adam. Only used when solver='adam'

n_iter_no_change : int, optional, default 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default documented is wrong (as the default has changed). Can you update this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@engnadeau
Copy link
Contributor Author

@NelleV documentation fixed in 524f384

@engnadeau
Copy link
Contributor Author

@NelleV @amueller, I'm back from vacation 🌴; anything else needed for this PR?

Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good for me.

  • Can you please add an entry in doc/whats_new/v0.20.rst?
  • Can you add a parameter check in _validate_hyperparameters and a test in test_params_errors to make sure the check is present.

assert_greater(max_iter, clf.n_iter_)


def test_n_iter_no_change_inf():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add @ignore_warnings(category=ConvergenceWarning) to ignore the warning as max_iter is reached

@engnadeau
Copy link
Contributor Author

@TomDLT see the following commits for the requested changes:

Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

but

I just want to stress that this will break user's code. So either:

  • we consider it as a bug fix
  • we keep previous behavior (n_iter_no_change=2) and warn for future change in default value.

I am not a fan of future warnings, but I wonder if it was really a bug...

@engnadeau
Copy link
Contributor Author

@TomDLT, good point. Considering we will now have an explicit n_iter_no_change hyperparameter, users can easily see and be aware of the potential limitations of n_iter_no_change=2.

With that in mind, I'd vote to reduce n_iter_no_change back to the original value of 2 in order to avoid code breaks. Whether or not to increase the hyperparameter in future releases may be considered a separate issue.

@engnadeau
Copy link
Contributor Author

@TomDLT see e60701a for revert back to n_iter_no_change=2

@TomDLT TomDLT changed the title [MRG] MLPRegressor quits fitting too soon due to self._no_improvement_count [MRG+1] MLPRegressor quits fitting too soon due to self._no_improvement_count Oct 19, 2017
@amueller
Copy link
Member

I think it would be ok considering it a bugfix and going to 10. The heuristic doesn't really work that well with 2. And yes, we're changing results, but having the weights learned by neural networks be stable over releases is not really something I want to promise....

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm apart from nitpicks

:class:`multilayer_perceptron.BaseMultilayerPerceptron`,
:class:`multilayer_perceptron.MLPRegressor`, and
:class:`multilayer_perceptron.MLPClassifier` to give control over
maximum number of epochs to not meet `tol` improvement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks for tol

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be neural_network everywhere instead of multilayer_perceptron

@@ -96,6 +103,16 @@ Classifiers and regressors
identical X values.
:issue:`9432` by :user:`Dallas Card <dallascard>`

- Fixed a bug in :class:`multilayer_perceptron.BaseMultilayerPerceptron`,
:class:`multilayer_perceptron.MLPRegressor`, and
:class:`multilayer_perceptron.MLPClassifier` with new `n_iter_no_change`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backtick for n_iter_no_change.

by at least tol for two consecutive iterations, unless `learning_rate`
is set to 'adaptive', convergence is considered to be reached and
training stops.
by at least tol for `n_iter_no_change` consecutive iterations, unless
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks or no backticks?

@@ -1201,6 +1215,12 @@ class MLPRegressor(BaseMultilayerPerceptron, RegressorMixin):
epsilon : float, optional, default 1e-8
Value for numerical stability in adam. Only used when solver='adam'

n_iter_no_change : int, optional, default 10
Maximum number of epochs to not meet `tol` improvement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks

@engnadeau
Copy link
Contributor Author

@amueller & @TomDLT

@TomDLT
Copy link
Member

TomDLT commented Oct 28, 2017

Final remark: as we changed the default value to 10, we could warn this behavior change in the Changed models section of whats_new.

@engnadeau
Copy link
Contributor Author

@TomDLT see 443c348 for Changed Models update

@TomDLT
Copy link
Member

TomDLT commented Oct 29, 2017

Thanks @nnadeau !

@TomDLT TomDLT merged commit de29f3f into scikit-learn:master Oct 29, 2017
@engnadeau
Copy link
Contributor Author

@TomDLT thanks to you! Keep up the great work everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MLPRegressor quits fitting too soon due to self._no_improvement_count
6 participants