Gradient descent fails to converge (to the right result). #45

echatav · 2015-07-11T20:55:53Z

Consider the function,

let f [m,b] = (6020.272727m+b-23680.30303)^(2::Int) + (7254.196429m+b-28807.61607)^(2::Int) + (6738.575342m+b-26582.76712)^(2::Int) + (5464.658537m+b-23894.34756)^(2::Int)

It's a smooth convex function and gradient descent should converge to [m,b] = [2.911525576,7196.512447]. See Wolfram Alpha.

Instead,

(!! 10) $ gradientDescent f [0,0]

[6.503543385615585,1.0522541316691529e-3]

(!! 100) $ gradientDescent f [0,0]

[4.074543700217947,1.0251230667415617e-3]

(!! 1000) $ gradientDescent f [0,0]

[4.028579379271611,4.621971888645496e-3]

(!! 10000) $ gradientDescent f [0,0]

[4.02857388945333,4.100383031651195e-2]

The text was updated successfully, but these errors were encountered:

barak · 2015-07-11T21:34:49Z

Do you have any reason to think that it isn't converging to the minimum? What you show is that in 10000 iterations it hasn't made much progress. But it does seem to be going in the right direction, albeit very slowly!

Seriously, this gradientDescent code was not intended for serious use, it was written as a placeholder and is completely untuned, designed to be robust but potentially very slow. The conjugateGradientDescent routine looks to be in better shape (although its line search does 20 Newton steps instead of the more appropriate single Newton step, and there is no code to check for the objective changing in the right direction or trust regions or anything like that) and barring bugs should converge in just a handful of line searches for a two-dimensional quadratic objective function like this.

echatav · 2015-07-11T22:01:48Z

Seriously, this gradientDescent code was not intended for serious use

I didn't realize that :-/ I've been trying to make serious use of it.

echatav · 2015-07-11T22:25:25Z

@barak : Do you know what the best way to calculate argmins of convex objective functions in Haskell is?

barak · 2015-07-12T12:37:31Z

Sorry. I blame ... not sure but I'll think of someone.

As to a better routine, the present conjugate gradient routine looks reasonable for a high dimensional system, although the line search could use a bit of robustness-enhancing love. And it works well here, moves rapidly to a point close to the optimum:

Prelude Numeric.AD> take 4 $ conjugateGradientDescent f [0,0]
[[0.0,0.0],[4.0285774585177165,4.7048094968615557e-4],[2.910697209479748,7196.113511627576],[2.911504913045699,7196.645421944485]]

For something low dimensional like the test problem you give, a simple Newton's method would also make sense. Or for robustness one with trust regions, or for something of intermediate dimensionality, BFGS or Levenberg-Marquardt. There is a curated collection of optimization routines in http://lmfit.github.io/lmfit-py/ and I think it would make a lot of sense to translate them from Python to Haskell, using the ad library for the necessary gradients of course.

echatav · 2015-07-12T14:17:14Z

I've found this library optimization which seems designed to fit with ad and linear.

TomMD · 2015-11-05T13:36:38Z

On this topic, what properties hold for conjugate descent? I'm noticing the progression is actually ascending, not descending. This is for a parabolic function with a log barrier so I'm guessing there is an issue with infinity.

:{
let obj2 [x] = x^2 +
              let bar = (-1/1000) * log (- (5-x) )
              in if isNaN bar then 1/0 else bar
              -- obj2 is asymptotically approaching x=5
              -- obj2 = inf when x <= 5  
:}
conjugateGradientDescent obj2 [20]

~~> [[20.0],[0.0]]

Notice the lack of descent:

map obj2 it
[399.9972919497989,Infinity]

TomMD · 2015-11-25T21:27:17Z

As a matter of update / hijacking of this issue. The above objective function, obj2, retains no information about the gradient past the asymptote. Changing the numerator of 1/0 to what is commonly named µ - (5-x)/0 in this case - yields a differentiable value (via Kahn.grad) but in general the objective function must be twice differentiable for conjugateGradient.

Perhaps I should send a pull request indicating

gradientDescent is not considered 'industrial strength'.
conjugateGradient* objective functions must be twice differentiable.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient descent fails to converge (to the right result). #45

Gradient descent fails to converge (to the right result). #45

echatav commented Jul 11, 2015

barak commented Jul 11, 2015

echatav commented Jul 11, 2015

echatav commented Jul 11, 2015

barak commented Jul 12, 2015

echatav commented Jul 12, 2015

TomMD commented Nov 5, 2015

TomMD commented Nov 25, 2015

Gradient descent fails to converge (to the right result). #45

Gradient descent fails to converge (to the right result). #45

Comments

echatav commented Jul 11, 2015

barak commented Jul 11, 2015

echatav commented Jul 11, 2015

echatav commented Jul 11, 2015

barak commented Jul 12, 2015

echatav commented Jul 12, 2015

TomMD commented Nov 5, 2015

TomMD commented Nov 25, 2015