You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
In BacktrackingLineSearch, the initialisation is done in 2 steps:
during construction of the object, we give the parameters of the line search, and the "initial value" initfval
when calling minimize(), we give it the "initial point" init
It seems that initfval is always supposed to be f(init), as is written in the comment line 31. I also think this is how the algorithm is supposed to be working
However this is not enforced at all in the code, as the caller of BacktrackingLineSearch can input whatever values for initfval and init.
Indeed, in OWLQN, during the line search, we are passing state.value as initial value, which is not the value you'd get by computing ff(1.0) . So it seems that the line search being done in OWLQN is broken, at least its initial state.
Is there any reason why was done like this? AFAIK it's incorrect to do this and the correct way to do it (which is what is commented on line 31) requires the same amount of computation, since calling calculate() will return both the gradient (that we use) and the value of the function (that is discarded)
The text was updated successfully, but these errors were encountered:
Hi!
In
BacktrackingLineSearch
, the initialisation is done in 2 steps:initfval
minimize()
, we give it the "initial point"init
It seems that
initfval
is always supposed to be f(init), as is written in the comment line 31. I also think this is how the algorithm is supposed to be workingHowever this is not enforced at all in the code, as the caller of
BacktrackingLineSearch
can input whatever values for initfval and init.Indeed, in OWLQN, during the line search, we are passing
state.value
as initial value, which is not the value you'd get by computingff(1.0)
. So it seems that the line search being done in OWLQN is broken, at least its initial state.Is there any reason why was done like this? AFAIK it's incorrect to do this and the correct way to do it (which is what is commented on line 31) requires the same amount of computation, since calling
calculate()
will return both the gradient (that we use) and the value of the function (that is discarded)The text was updated successfully, but these errors were encountered: