-
Notifications
You must be signed in to change notification settings - Fork 9
Remove extra check for conclusion of Bisection. #57
Conversation
This will fix #53 . PTAL @vladimir-ch |
// TODO: Should iterate be updated? Maybe find a function where it needs it. | ||
return math.Abs(l.Derivative) < b.GradConst*math.Abs(b.initGrad) | ||
} | ||
|
||
return StrongWolfeConditionsMet(l.F, l.Derivative, b.initF, b.initGrad, b.currStep, 0, b.GradConst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the proposed changes, but why is the Armijo constant equal to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly, the strong Wolfe conditions only hold if FunConst > 0. However, it can be any number greater than zero, including very small values. In finite precision world, when we get close to the minimum, there are large regions where the function value is "constant" as far as representable float64 are concerned. In this region, the line search has to fail because there are no locations with a smaller function value, even though with infinite precision there would be many steps that satisfy the condition. Setting C1 to zero avoids those problems. On a conceptual level, we care about the function not increasing and the gradient getting smaller. Since these methods assume smooth and continuous functions, driving the gradient to zero is the same as finding the location with minimum F. We're happy stopping the linesearch with a sufficient decrease in gradient norm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks.
LGTM. You may already know that, but you can close the issue #53 by modifying the commit message. It is described in https://help.github.com/articles/closing-issues-via-commit-messages/ It is quite useful and leaves an explicit reference in the commit message to the issue. Basically, you would have to do something like this:
|
The original code for Bisection included a secondary check to the strong Wolfe conditions to see if the optmiization was finished. The idea was to help mitigate floating point noise and allow for stronger convergence to the gradient. Unfortunately, all this does is add complexity. The parameters are ad hoc, and trade off floating point noise for actual function modulation. For more complicated functions (especially concurrent ones) the noise will be higher, while other functions may have modulations that are very small. It is impossible to design a tradeoff that is good for all functions. Instead, keep the code simple. This also fixes issues with Bisection and the outer OptLoc disagreeing on the optimum location. Closes #53
Remove extra check for conclusion of Bisection.
The original code for Bisection included a secondary check to the strong Wolfe conditions to see if the optmiization was finished. The idea was to help mitigate floating point noise and allow for stronger convergence to the gradient. Unfortunately, all this does is add complexity. The parameters are ad hoc, and trade off floating point noise for actual function modulation. For more complicated functions (especially concurrent ones) the noise will be higher, while other functions may have modulations that are very small. It is impossible to design a tradeoff that is good for all functions. Instead, keep the code simple. This also fixes issues with Bisection and the outer OptLoc disagreeing on the optimum location