Skip to content

[ML] Avoid zero size steps in L-BFGS #2078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 20, 2021

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Oct 19, 2021

Our implementation could generate zero size steps and fail to converge. There are a couple of different scenarios in which this was possible:

  1. The function gradient was smaller than double epsilon times the norm of argument vector in which case x - g(x) = x to working precision
  2. The gradient function returned zero at the initial point

We were also checking a strict inequality for convergence, which failed to identify we'd converged if we were taking zero sized steps.

To handle both cases I've added a fallback to perform a more elaborate line search if we try to take a too small step which ensures we test steps which are larger than epsilon * x. We also now try some random probes to see if we can find a direction in which the function decreases if the gradient function returns zero.

Copy link
Contributor

@edsavage edsavage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tveasey tveasey merged commit b5dcc59 into elastic:main Oct 20, 2021
@tveasey tveasey deleted the lbfgs-divide-by-zero branch October 20, 2021 17:46
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Oct 28, 2021
Our implementation could generate zero size steps and fail to converge. There are a couple of different scenarios
in which this was possible:

1. The function gradient was smaller than double epsilon times the norm of argument vector in which case x - g(x)
    = x to working precision,
2. The gradient function returned zero at the initial point.

We were also checking a strict inequality for convergence, which failed to identify we'd converged if we were
taking zero sized steps.

To handle both cases I've added a fallback to perform a more elaborate line search if we try to take a too small
step which ensures we test steps which are larger than epsilon * x. We also now try some random probes to see if
we can find a direction in which the function decreases if the gradient function returns zero (this can be useful if
the function is used for finding local minimum of non-convex functions).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants