Skip to content

Conversation

@ogrisel
Copy link
Collaborator

@ogrisel ogrisel commented Jun 30, 2022

This PR aims at better testing the lbfgs_step fallback mechanism of scikit-learn#23314 (introduced in c9b1200 and subsequent commits).

This test is currently failing and highlights the fact that newton-cholesky with an inner solver that switches to 4 steps of LBFGS whenever the Hessian is found the ill-conditioned does not work as expected: it can be very slow because this problem can happen many times in a fit call and furthermore repeatedly calling 'L-BFGS-B' with "maxiter": 4 via scipy.optimize.minimize is not equivalent to calling it once with a large maxiter because each time we loose the memory of the previous gradients.

I would therefore recommend to stop attempting to implement a fined grained fallback mechanism in the for the inner_solve step of the Newton solvers. Instead I would suggest to let the Newton solver raise an exception when this happens and use a coarse grained fallback to the "lbfgs" solver until convergence (possibly warm starting from the previous coef in case the newton solver had a chance to update them successfully for a few iterations). This coarse fallback should be much simpler to implement and maintain and should be guaranteed to converge to solution as good as the ones if the user had chosen LBFGS originally.

The scikit-learn-level warning message should be adapted accordingly.

What do you think about this plan @lorentzenchr?

Credits: this plan was originally suggested by @GaelVaroquaux IRL.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Jun 30, 2022

I you agree with this plan, let me know and I can help you implement it to get the test in this PR to pass. But feel free to do it yourself if you prefer.

@lorentzenchr
Copy link
Owner

I would therefore recommend to stop attempting to implement a fined grained fallback mechanism in the for the inner_solve step of the Newton solvers. [...] What do you think about this plan?

I'm very happy with it. I thought a lot about the best action (in a user's perspective) of a solver in case of "dynamically detected" convergence problems:

  1. Stop and raise error
  2. Warn and try different solver, e.g. lbfgs (your suggestion)
  3. Warn and temporarily use a different step in this iteration
    a. gradient step
    b. lbfgs step(s)
    c. step with modified hessian (e.g. adding a multiple of the identity=penalty, LDL decomposition, QR, SVD, ...)
    ...

@lorentzenchr
Copy link
Owner

I you agree with this plan, let me know and I can help you implement it

I would appreciate your help very much. I added you as collaborator to this fork to make it easier to work together.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Jul 1, 2022

I will try to move this forward this afternoon. My plan to:

EDIT: @lorentzenchr was too fast. Let me update this PR.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Jul 1, 2022

I merged and now the updated collinear data test passes!

@ogrisel ogrisel merged commit 83944aa into lorentzenchr:glm_newton_cholesky Jul 1, 2022
@ogrisel ogrisel deleted the test-glm-collinear-data branch July 1, 2022 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants