Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remarks regarding optimize interface #1

Closed
mpizenberg opened this issue Jan 4, 2020 · 2 comments
Closed

Remarks regarding optimize interface #1

mpizenberg opened this issue Jan 4, 2020 · 2 comments

Comments

@mpizenberg
Copy link
Member

Hi, I just read the LM doc and function interface you put in the src/lib.rs file.

pub fn optimize<N: Scalar, R: Dim, C: Dim, VS: Storage<N, R, C>, MS: Storage<N, R>>(
    max_iterations: usize,
    initial_lambda: N,
    lambda_scale: N,
    threshold: N,
    init: Vector<N, R, VS>,
    jacobian_and_residuals: impl Fn(Vector<N, R, VS>) -> (Matrix<N, R, C, MS>, Matrix<N, R, C, MS>),
) -> Vector<N, R, VS> { ... }

The set of parameters chosen for the interface looks great for simple usage. I'll give some feedback however regarding advanced control I have already needed in the past. Let me know what you think of those. Maybe there can be two levels of abstraction, a "simple to use" one, and an "advance" one.

Nalgebra types

The types here are tied to nalgebra. This might be fine, but also might not be needed.

Regarding updates of the LM parameter lambda

I have already needed faster backoff of the lambda parameter. For example cases where convergence would divide by 10 but divergence would multiply by 50. Also cases where divergence would reset lambda to a given (startup for example) value.

Regarding convergence criteria

Depending on your modelization, there are multiple things that can make sense to check for convergence, including (but not exhaustive).

  • max number of iterations as suggested.
  • mse of residuals as suggested. If your model includes noise for example, a given threshold under which things are interpreted as noise makes sense.
  • variation of residual mse. That is also useful to increase lambda.
  • model variation, how much your model changes.
  • relative model variation. Did it change faster than before, slower?

The last 3 for example are not controllable with the current interface.

Regarding Hessian and Jacobian full computation

Let's note p the number of parameters and N the number of observations. The Jacobian is then of size (Nxp), Hessian is (pxp) and residuals "r" is (Nx1). If there are a lot of observations, i.e. N >> p this might induce high memory usage, and reduced precision depending on the ordering of the residuals. The steepest descent J^T * r and the hessian H = J^T * J can actually be computed incrementally, summing one jacobian observation at a time, which reduces memory usage. This summation can also be done in a smart way with accumulated values, for example every 100 observations accumulated independently and then those accumulated values are summed together.

So asking to provide the full Jacobian matrix prevents this kind of memory optimization.

Regarding non necessary Jacobian evaluation

Once the residuals are computed, it is usually enough to evaluate the current model. In case it is better than the previous one, we will eventually need the Jacobian. But in case it is not, actually computing the Jacobian may be a waste of time because that model will not be used further (we probably back track to use the previous model).

@vadixidav
Copy link
Member

This is some excellent feedback. I have almost finished my initial interface implementation, but I absolutely want to support the points mentioned here. I think that even for the simple API, I should probably support evaluating the Jacobian for each sample rather than evaluating the Jacobian of the whole dataset, so I will definitely make that change.

I have read all the points mentioned, and I will extend the API to support these use cases. I am going to try and make the API simple to use in the simple case and as simple as possible in the complicated case.

As for the use of nalgebra, I believe that standardizing on the use of nalgebra will be beneficial for us. It performs several optimizations automatically without us needing to, and it also provides several conveniences. It doesn't support tensors above a dimensionality of 2, but that won't be a problem for us.

I am on my phone currently, so I won't respond to all the points, but generally I will aim to support everything mentioned. Again, thank you for the writeup!

@vadixidav
Copy link
Member

I have currently modified the API so it now allows the Jacobian matrices to be passed in pieces. It is up to the caller as to what size of pieces they would like to use. It could be per sample or on a multiple-sample basis. The residuals are kept in memory the whole time because I had to get the residuals to compute the sum-of-squares to determine which lambda was better. Once I already had the residuals, I didn't want the caller to recompute them, and so it didn't seem like there was any way to avoid storing the residuals. I didn't have any issue with the Jacobian matrices though, since that isn't needed to determine the sum-of-squares, so it now only heap allocates memory for the residuals (stored as a matrix now for convenience).

I also modified it so that the lambda has a separate convergence and divergence multiplier. I previously had it so that it actually multiplied lambda by the scale factor squared when diverging since it checks lambda / scale and lambda, and thus if you only back off to lambda * scale then it will test the old lambda again, so I just decided to make it back off twice as fast as it converged. I talk about this in the documentation so that a caller understands that the divergence multiplier must be higher than the inverse of the convergence multiplier.

As for your last point about not computing the Jacobian, it already only computes it when necessary. As mentioned above, the residuals are requested about twice as often as the Jacobian, since it needs to determine what lambda was more effective. Right now it only does this with two different lambdas, but perhaps it should do it with as many lambdas as the user would like to try per iteration. I only did it with two lambdas at this time since that was what Marquardt originally recommended, but I am sure that is quite outdated information at this point.

More work needs to be done on termination criteria, but I want to save that for later. I am going to add unit tests now.

I will close this ticket and open a new one with your suggestions about convergence criteria.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants