New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Adding max_fun parameter to MLP for use in lbfgs optimization #9274
Conversation
You say this, but then you seem to be setting both. I assume there's no straightforward way to write a non-regression test for this. I think the premise is correct, although more generally, what's the relationship between the number of function evaluations and the number of iterations? Is the number of function evaluations bounded from above by the number of iterations? |
I can add a max_fun parameter to the MLP constructor?
But that seemed overkill since lbfgs is the only optimizer that would use it.. Happy to do it if you think that'd be better .. I guess there are beta1 and beta2 which are only used for Adam so maybe that isn't overkill after all.
I had trouble creating a dataset from the available sklearn.dataset options that requires more than 1k iterations... I will see if I can package up the datat I have for an example others can try...
As far as num iter vs num fun , num fun >= num iter because there is at least one call per iteration, many more if bfgs uses a finite difference approximation for the gradient.
…-----Original Message-----
From: "Joel Nothman" <notifications@github.com>
Sent: 7/3/2017 5:21 PM
To: "scikit-learn/scikit-learn" <scikit-learn@noreply.github.com>
Cc: "Daniel Perry" <daniel.perry@gmail.com>; "Author" <author@noreply.github.com>
Subject: Re: [scikit-learn/scikit-learn] Setting both maxiter and maxfun incall to lbfgs. (#9274)
Ideally you would want to pass in both a 'max_iter' and 'max_fun' argument to MLP, however 'max_fun' doesn't make sense for anything but l-bfgs, so using 'max_iter' to control both seems a reasonable compromise.
You say this, but then you seem to be setting both.
I assume there's no straightforward way to write a non-regression test for this. I think the premise is correct, although more generally, what's the relationship between the number of function evaluations and the number of iterations? Is the number of function evaluations bounded from above by the number of iterations?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
So if num fun >= num iter, and we have already set maxfun, surely num iter
can't exceed max fun in lbfgs. So now I'm not sure I understand the premise
of the issue.
I should say, I don't know the MLP code intimately, nor scipy.optimize. So
the misunderstanding is very possibly on my part.
…On 4 July 2017 at 09:35, Daniel Perry ***@***.***> wrote:
I can add a max_fun parameter to the MLP constructor?
But that seemed overkill since lbfgs is the only optimizer that would use
it.. Happy to do it if you think that'd be better .. I guess there are
beta1 and beta2 which are only used for Adam so maybe that isn't overkill
after all.
I had trouble creating a dataset from the available sklearn.dataset
options that requires more than 1k iterations... I will see if I can
package up the datat I have for an example others can try...
As far as num iter vs num fun , num fun >= num iter because there is at
least one call per iteration, many more if bfgs uses a finite difference
approximation for the gradient.
-----Original Message-----
From: "Joel Nothman" ***@***.***>
Sent: 7/3/2017 5:21 PM
To: "scikit-learn/scikit-learn" ***@***.***>
Cc: "Daniel Perry" ***@***.***>; "Author" <
***@***.***>
Subject: Re: [scikit-learn/scikit-learn] Setting both maxiter and maxfun
incall to lbfgs. (#9274)
Ideally you would want to pass in both a 'max_iter' and 'max_fun' argument
to MLP, however 'max_fun' doesn't make sense for anything but l-bfgs, so
using 'max_iter' to control both seems a reasonable compromise.
You say this, but then you seem to be setting both.
I assume there's no straightforward way to write a non-regression test for
this. I think the premise is correct, although more generally, what's the
relationship between the number of function evaluations and the number of
iterations? Is the number of function evaluations bounded from above by the
number of iterations?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#9274 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wvIi6tLCS-gujYbAn7Yh3HWr6DEks5sKXqpgaJpZM4OMu8Z>
.
|
While num fun >= num iter, there is apparently a test to stop if either 'maxiter' or 'maxfun' is reached. You can see these separate tests in a few lines in the l-bfgs code: One way to address this is to set both 'maxiter' and 'maxfun' to the MLP parameter 'max_iters' (what I've done here). Another way is to provide MLP parameters for both. But what do you think, is it worth adding another parameter to MLP for this case? |
I've added an example of this happening to the bug #9273 |
So if I understand correctly, what you've done is equivalent to set Another option could be to set |
I think the problem here may be subtle, so I'm going to back up and redefine things in a clear way: Let
It's written differently (separate tests) but this is the essence of the test. Note that if either is satisfied it will return. Now, let The current code (before this patch), does the following when calling the bfgs solver:
This is because the default This results in the following test:
Hopefully, you can see the problem now. This interface works great as long as for your problem the solver converges in under 10,000 iterations (admittedly this is probably fine 95% of the time). The patch I've provided changes the original code slightly to do the following:
So that with this patch, if I provide a value for Your comment about this being the same as setting Alternatively, we can modify the MLP interface to allow for both a "mlp_maxiter" and a Personally, I think the current patch is sufficient, but if others disagree, it would not take me long to put together an alternative patch with the new parameter. Thanks for any more comments/discussion, hoping we can settle on the right fix for this subtle bug in MLP soon. |
Thanks for the in-depth explanation. Could you please answer the last two points in my post? ie are you in knowledge of situations where we want to control the maximum number of iterations The last point was that, since right now we can only control one of those params (as MLP has only a |
Great, so here are a few answers to your questions:
Ok, so after answering your questions I now think I should add a Let me know what you think... I'll wait for a reply (or some time for a chance to reply) before I make the changes, but I'm thinking that would probably be the better way to go with this patch. Thanks for your comments and questions, etc. |
To summarize, there are two options:
I'd be more for 1. since in 2., the MLP doc would not be exact when the solver is bfgs. Also, there already are specific parameters for sgd and adam solvers. |
Also @daniel-perry, any idea why we don't need to distinguish maxfun from maxiter in the other solvers? |
if it's clearly documented and provides sensible defaults and reasonable
flexibility, I don't mind either option
…On 9 Aug 2017 2:40 am, "Nicolas Goix" ***@***.***> wrote:
Also @daniel-perry <https://github.com/daniel-perry>, any idea why we
don't need to distinguish maxfun from maxiter in the other solvers?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9274 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69eSNyle0xEp8Q6y7jRyeOzuUCvvks5sWI9lgaJpZM4OMu8Z>
.
|
@daniel-perry if you think that controlling both maxfun and maxiter is better than controlling only maxfun for lbfgs, then let's go with option 1. ! PS: For adam and sgd solvers, I think that we have |
Hi, sorry for the delay, had other things come up. I've just added some commits which add a "max_fun" paramter to the MLP classifier and regressor. As this is my first time committing to sklearn, I walked through the steps on http://scikit-learn.org/stable/developers/contributing.html to make sure everything as in order, but let me know if you notice anything is awry. Some notes:
|
@ngoix I believe the above changes address the issues we discussed. Do I need to change the status of the ticket somehow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to add a test which fails before your fix and passes now.
Ok, I think the comments address your points now? I can modify further if not. As far as providing a test that failed before and passes now, the only way I can get it to fail is to:
The first option requires adding the dataset somewhere so it's available, where is that typically done? Does this change merit that kind of an addition? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with not including a regression test.
Apologies again for the slow turnaround, life got busy. I believe I've addressed the review comments, any other suggestions to change? |
@daniel-perry any idea why travis is not happy? |
Issue of mismatched repr in doctests:
need to add max_fun to existing doctests in doc/modules/neural_networks_supervised.rst |
Thanks for the tip @thomasjpfan that fixed the failures. |
@adrinjalali I believe I've addressed the concerns you brought up above, please take a look to see if everything is in order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM, sorry for the late review.
you also need to merge master and solve the conflict. |
Co-Authored-By: Adrin Jalali <adrin.jalali@gmail.com>
Co-Authored-By: Adrin Jalali <adrin.jalali@gmail.com>
Co-Authored-By: Adrin Jalali <adrin.jalali@gmail.com>
Ok, changes made, merged with mainline, and resolved conflicts. @adrinjalali Once the unit tests finish, let me know if you need anything else before merging the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few last comments.
@rth if you can take a look after the unit tests pass and let me know if you have any concerns before merging the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, merging !
I think the way we check for convergence in lbgfs can be be done more consistently between different estimators; I'll look into it in a separate PR.
Yay, thanks!
|
Reference Issue
Fixes #9273
Closes #10724
What does this implement/fix? Explain your changes.
Running l-bfgs to optimize the MLP regressor/classifier is limited to 15000 iterations because that is the default value for the 'maxiter' parameter in l-bfgs, and MLP doesn't allow over-riding it.
To address this, I've set 'maxiter' equal to the 'max_iter' MLP argument
Any other comments?
Ideally you would want to pass in both a 'max_iter' and 'max_fun' argument to MLP, however 'max_fun' doesn't make sense for anything but l-bfgs, so using 'max_iter' to control both seems a reasonable compromise.
Edit: see #9274 (comment) for the motivation for adding
max_fun