Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient descent fails to converge (to the right result). #45

Open
echatav opened this issue Jul 11, 2015 · 7 comments
Open

Gradient descent fails to converge (to the right result). #45

echatav opened this issue Jul 11, 2015 · 7 comments

Comments

@echatav
Copy link

echatav commented Jul 11, 2015

Consider the function,

let f [m,b] = (6020.272727m+b-23680.30303)^(2::Int) + (7254.196429m+b-28807.61607)^(2::Int) + (6738.575342m+b-26582.76712)^(2::Int) + (5464.658537m+b-23894.34756)^(2::Int)

It's a smooth convex function and gradient descent should converge to [m,b] = [2.911525576,7196.512447]. See Wolfram Alpha.

Instead,

(!! 10) $ gradientDescent f [0,0]

[6.503543385615585,1.0522541316691529e-3]

(!! 100) $ gradientDescent f [0,0]

[4.074543700217947,1.0251230667415617e-3]

(!! 1000) $ gradientDescent f [0,0]

[4.028579379271611,4.621971888645496e-3]

(!! 10000) $ gradientDescent f [0,0]

[4.02857388945333,4.100383031651195e-2]

@barak
Copy link
Collaborator

barak commented Jul 11, 2015

Do you have any reason to think that it isn't converging to the minimum? What you show is that in 10000 iterations it hasn't made much progress. But it does seem to be going in the right direction, albeit very slowly!

Seriously, this gradientDescent code was not intended for serious use, it was written as a placeholder and is completely untuned, designed to be robust but potentially very slow. The conjugateGradientDescent routine looks to be in better shape (although its line search does 20 Newton steps instead of the more appropriate single Newton step, and there is no code to check for the objective changing in the right direction or trust regions or anything like that) and barring bugs should converge in just a handful of line searches for a two-dimensional quadratic objective function like this.

@echatav
Copy link
Author

echatav commented Jul 11, 2015

Seriously, this gradientDescent code was not intended for serious use

I didn't realize that :-/ I've been trying to make serious use of it.

@echatav
Copy link
Author

echatav commented Jul 11, 2015

@barak : Do you know what the best way to calculate argmins of convex objective functions in Haskell is?

@barak
Copy link
Collaborator

barak commented Jul 12, 2015

Sorry. I blame ... not sure but I'll think of someone.

As to a better routine, the present conjugate gradient routine looks reasonable for a high dimensional system, although the line search could use a bit of robustness-enhancing love. And it works well here, moves rapidly to a point close to the optimum:

Prelude Numeric.AD> take 4 $ conjugateGradientDescent f [0,0]
[[0.0,0.0],[4.0285774585177165,4.7048094968615557e-4],[2.910697209479748,7196.113511627576],[2.911504913045699,7196.645421944485]]

For something low dimensional like the test problem you give, a simple Newton's method would also make sense. Or for robustness one with trust regions, or for something of intermediate dimensionality, BFGS or Levenberg-Marquardt. There is a curated collection of optimization routines in http://lmfit.github.io/lmfit-py/ and I think it would make a lot of sense to translate them from Python to Haskell, using the ad library for the necessary gradients of course.

@echatav
Copy link
Author

echatav commented Jul 12, 2015

I've found this library optimization which seems designed to fit with ad and linear.

@TomMD
Copy link
Contributor

TomMD commented Nov 5, 2015

On this topic, what properties hold for conjugate descent? I'm noticing the progression is actually ascending, not descending. This is for a parabolic function with a log barrier so I'm guessing there is an issue with infinity.

:{
let obj2 [x] = x^2 +
              let bar = (-1/1000) * log (- (5-x) )
              in if isNaN bar then 1/0 else bar
              -- obj2 is asymptotically approaching x=5
              -- obj2 = inf when x <= 5  
:}
conjugateGradientDescent obj2 [20]

~~> [[20.0],[0.0]]

Notice the lack of descent:

map obj2 it
[399.9972919497989,Infinity]

@TomMD
Copy link
Contributor

TomMD commented Nov 25, 2015

As a matter of update / hijacking of this issue. The above objective function, obj2, retains no information about the gradient past the asymptote. Changing the numerator of 1/0 to what is commonly named µ - (5-x)/0 in this case - yields a differentiable value (via Kahn.grad) but in general the objective function must be twice differentiable for conjugateGradient.

Perhaps I should send a pull request indicating

  1. gradientDescent is not considered 'industrial strength'.
  2. conjugateGradient* objective functions must be twice differentiable.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants