ENH add verbosity to newton-cg solver #27526

lorentzenchr · 2023-10-03T20:39:14Z

Reference Issues/PRs

This PR is meant to be merged after #26721.

What does this implement/fix? Explain your changes.

This PR adds verbosity to _newton_cg solver in our private sklearn.utils.optimize module. It is used, e.g., in LogisticRegression.

Any other comments?

This reverts commit 52d63d5.

This is needed for more reliable convergence. Tests like test_logistic_regressioncv_class_weights then don't raise a convergence error.

github-actions · 2023-10-03T20:41:15Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 1aa4996. Link to the linter CI: here}

adrinjalali · 2023-10-06T08:14:31Z

I wonder if we should add these verbosity PRs or add callbacks instead (cc @jeremiedbb )

jeremiedbb · 2023-10-06T08:26:42Z

verbose through callbacks is a bit limited because it will be triggered only at places where callbacks are called so if we really need to have advanced low level verbosity we have to do it manually. I'm not convinced though that we really need advanced low level verbosity but no strong opinion here.

…cikit-learn into newton_cg_verbose

lorentzenchr · 2024-04-05T06:42:39Z

@adrinjalali @jeremiedbb @ogrisel I would like to merge this. Newton-CG might be our best solver for multiclass logistic regression and the other solvers have verbose output, too. It doesn't change the API and is just convenience for a small group of users. Review should be super easy.

ogrisel · 2024-04-05T13:39:22Z

Let me update this branch with main to make it easier to test it with my local meson build.

ogrisel · 2024-04-05T14:00:47Z

For other reviewers here is how a typical verbose run would look like:

>>> from sklearn.linear_model import LogisticRegression
>>> LogisticRegression(solver="newton-cg", verbose=100).fit(*make_classification())
Newton-CG iter = 0
  Check Convergence
    1. max |gradient| 0.4887044022364469 <= 0.0001
  Inner CG solver iteration 1 stopped with
    sum(|residuals|) <= tol: 0.7554386095540608 <= 0.9662352699764765
  Line Search
    eps=16 * finfo.eps=3.552713678800501e-15
    try line search wolfe1
    wolfe1 line search was successful
Newton-CG iter = 1
  Check Convergence
    1. max |gradient| 0.14592025445011966 <= 0.0001
  Inner CG solver iteration 1 stopped with
    sum(|residuals|) <= tol: 0.2851651569422557 <= 0.34548568271404273
  Line Search
    eps=16 * finfo.eps=3.552713678800501e-15
    try line search wolfe1
    wolfe1 line search was successful
Newton-CG iter = 2
  Check Convergence
    1. max |gradient| 0.0540671431111577 <= 0.0001
  Inner CG solver iteration 2 stopped with
    sum(|residuals|) <= tol: 0.09220421429635615 <= 0.15390480184192634
  Line Search
    eps=16 * finfo.eps=3.552713678800501e-15
    try line search wolfe1
    wolfe1 line search was successful
Newton-CG iter = 3
  Check Convergence
    1. max |gradient| 0.015163843559910496 <= 0.0001
  Inner CG solver iteration 2 stopped with
    sum(|residuals|) <= tol: 0.026244178729999727 <= 0.037938424290546426
  Line Search
    eps=16 * finfo.eps=3.552713678800501e-15
    try line search wolfe1
    wolfe1 line search was successful
Newton-CG iter = 4
  Check Convergence
    1. max |gradient| 0.004579527123434936 <= 0.0001
  Inner CG solver iteration 4 stopped with
    sum(|residuals|) <= tol: 0.004167589412564355 <= 0.006848570879928111
  Line Search
    eps=16 * finfo.eps=3.552713678800501e-15
    try line search wolfe1
    wolfe1 line search was successful
Newton-CG iter = 5
  Check Convergence
    1. max |gradient| 0.0008076794452485875 <= 0.0001
  Inner CG solver iteration 5 stopped with
    sum(|residuals|) <= tol: 0.00029939950485330754 <= 0.0004458638755984877
  Line Search
    eps=16 * finfo.eps=3.552713678800501e-15
    try line search wolfe1
    wolfe1 line search was successful
Newton-CG iter = 6
  Check Convergence
    1. max |gradient| 3.631012151112875e-05 <= 0.0001
  Solver did converge at loss = 0.22370577235383718.
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   1 tasks      | elapsed:    0.0s
LogisticRegression(solver='newton-cg', verbose=100)

ogrisel · 2024-04-05T14:01:43Z

@lorentzenchr would it be possible to get more details on the line search iterations?

lorentzenchr · 2024-04-05T14:16:43Z

would it be possible to get more details on the line search iterations?

Not really. We use private line search functions from scipy. They do not have a verbose option. Those functions call, e.g., DCSRCH which very recently ported from Fortran minpack2. If someone is interested, I would point to contributing upstream in scipy.

ogrisel

One more quick pass but I have to go offline soon. I'll keep the firefox tab open to finalize my review soon this time.

sklearn/linear_model/_glm/_newton_solver.py

doc/whats_new/v1.4.rst

sklearn/utils/optimize.py

sklearn/utils/tests/test_optimize.py

…cikit-learn into newton_cg_verbose

adrinjalali · 2024-04-08T09:55:31Z

CI failing.

ogrisel

Another pass, LGTM.

ogrisel · 2024-04-07T20:55:40Z

sklearn/utils/optimize.py

@@ -71,6 +93,8 @@ def _line_search_wolfe12(f, fprime, xk, pk, gfk, old_fval, old_old_fval, **kwarg
        # TODO: It seems that the new check for the sum of absolute gradients above
        # catches all cases that, earlier, ended up here. In fact, our tests never
        # trigger this "if branch" here and we can consider to remove it.
+        if is_verbose:
+            print("    last resort: try line search wolfe2")
        ret = line_search_wolfe2(
            f, fprime, xk, pk, gfk, old_fval, old_old_fval, **kwargs
        )


Isn't there anything interesting to print based on the result of wolfe2? Maybe we should at least print that the wolfe2 line search was successful when ret[0] is not None to be consistent with what we print for the wolfe1 line search.

Isn't there anything interesting to print based on the result of wolfe2?

🤷

ogrisel · 2024-04-08T09:48:43Z

sklearn/utils/optimize.py

+                print(
+                    "    check sum(|gradient|) < sum(|gradient_old|): "
+                    f"{sum_abs_grad} < {sum_abs_grad_old} {check}"
+                )
            if check:
                ret = (


Shall we make it explicit that we perform an update with unit step size in this case?

The current message is 1-1 with NewtonSolver.

ogrisel · 2024-04-08T09:53:36Z

sklearn/utils/optimize.py

+    eps = 16 * np.finfo(np.asarray(old_fval).dtype).eps
+    if is_verbose:
+        print("  Line Search")
+        print(f"    eps=16 * finfo.eps={eps}")


I would defer the print of eps to when it's actually used:

Suggested change

print(f" eps=16 * finfo.eps={eps}")

otherwise one gets the impression that it's used by wolfe1, especially when it's successful.

The current message is 1-1 with NewtonSolver.

ogrisel · 2024-04-08T09:54:49Z

sklearn/utils/optimize.py

+            print(
+                "    check loss |improvement| <= eps * |loss_old|:"
+                f" {np.abs(loss_improvement)} <= {tiny_loss} {check}"
+            )


Suggested change

print(

" check loss |improvement| <= eps * |loss_old|:"

f" {np.abs(loss_improvement)} <= {tiny_loss} {check}"

)

print(f" eps=16 * finfo.eps={eps}")

print(

" check loss |improvement| <= eps * |loss_old|:"

f" {np.abs(loss_improvement)} <= {tiny_loss} {check}"

)

sklearn/utils/tests/test_optimize.py

sklearn/utils/optimize.py

adrinjalali

There are some untested lines which it seems they're legit and should be tested, otherwise LGTM.

I think some of @ogrisel 's comments are valid in terms of moving messages or being more explicit, but I don't mind them being a separate PR since they'd touch multiple solvers.

lorentzenchr · 2024-04-11T14:09:36Z

There are some untested lines which it seems they're legit and should be tested

Except for about one occasion, the lines not tested according to codecov are just hard to trigger corner cases. There are currently no tests that trigger them, to my knowledge (so it’s a more than 10 year old liability). I would much prefer to not put that burden on this PR.

lorentzenchr and others added 21 commits June 28, 2023 08:59

WIP to be continued

fc0f0e7

Merge branch 'main' into linear_loss_normalize

ab75fae

ENH improve line search of newton_cg for tiny loss improvements

e1c2128

ENH sample weight rescaling after class weights

ead068f

ENH fix curvature condition in CG

fdc0fa3

ENH add verbose to _newton_cg

ef26813

Merge branch 'main' into linear_loss_normalize

a6f0989

DOC add whatsnew

b1aae34

DOC add changed models entry

fc96f0d

Merge branch 'main' into linear_loss_normalize

514f605

CLN isort _logistic.py

6e55929

CLN black _logistic.py

175ee5f

CI ruff --config pyproject.toml

52d63d5

Revert "CI ruff --config pyproject.toml"

1a499f7

This reverts commit 52d63d5.

TST fix doctest failures

ff42439

TST fix doctest failures 2nd try

97d7468

TST test_multinomial_loss in test_sag.py

433c2ca

apply pre-commit on _logistic.py

fce33e6

ENH increase maxls in lbfgs like in GLMs

3a4b7b4

This is needed for more reliable convergence. Tests like test_logistic_regressioncv_class_weights then don't raise a convergence error.

TST increase tol of LogisticRegressionCV in test_balance_property

573fea1

Merge branch 'main' into linear_loss_normalize

4962c9f

github-actions bot added module:feature_selection module:linear_model module:utils labels Oct 3, 2023

DOC add whatsnew

5fec722

lorentzenchr marked this pull request as draft October 5, 2023 19:52

Merge branch 'main' into newton_cg_verbose

52c5f7a

lorentzenchr added 2 commits December 8, 2023 16:42

Fix typo

3905e6e

Merge branch 'newton_cg_verbose' of https://github.com/lorentzenchr/s…

c6af4ca

…cikit-learn into newton_cg_verbose

jeremiedbb modified the milestones: 1.4, 1.5 Dec 21, 2023

Merge branch 'main' into newton_cg_verbose

169f70b

Merge branch 'main' into newton_cg_verbose

c0f1a28

ogrisel reviewed Apr 5, 2024

View reviewed changes

lorentzenchr added 5 commits April 5, 2024 18:34

DOC move whatsnew to 1.5

4a50170

CLN apply review suggestions

7d57cae

Merge branch 'main' into newton_cg_verbose

f017b11

Merge branch 'newton_cg_verbose' of https://github.com/lorentzenchr/s…

34b5a9a

…cikit-learn into newton_cg_verbose

CLN one more review comment

92af251

ogrisel approved these changes Apr 8, 2024

View reviewed changes

lorentzenchr added 2 commits April 9, 2024 21:22

CLN address review comments

c8c3b2e

FIX test_max_iter

3c69ada

adrinjalali reviewed Apr 11, 2024

View reviewed changes

TST add reaching maxiter in inner CG

1aa4996

adrinjalali approved these changes Apr 12, 2024

View reviewed changes

adrinjalali merged commit 3ee60a7 into scikit-learn:main Apr 12, 2024
29 of 30 checks passed

ogrisel deleted the newton_cg_verbose branch April 12, 2024 13:45

lorentzenchr restored the newton_cg_verbose branch April 12, 2024 13:56

lorentzenchr deleted the newton_cg_verbose branch April 12, 2024 13:56

jeremiedbb mentioned this pull request Apr 17, 2024

API Proposal for Logging #17439

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH add verbosity to newton-cg solver #27526

ENH add verbosity to newton-cg solver #27526

lorentzenchr commented Oct 3, 2023

github-actions bot commented Oct 3, 2023 •

edited

adrinjalali commented Oct 6, 2023

jeremiedbb commented Oct 6, 2023

lorentzenchr commented Apr 5, 2024

ogrisel commented Apr 5, 2024

ogrisel commented Apr 5, 2024 •

edited

ogrisel commented Apr 5, 2024

lorentzenchr commented Apr 5, 2024

ogrisel left a comment

adrinjalali commented Apr 8, 2024

ogrisel left a comment

ogrisel Apr 7, 2024

lorentzenchr Apr 11, 2024

ogrisel Apr 8, 2024

lorentzenchr Apr 9, 2024

ogrisel Apr 8, 2024

lorentzenchr Apr 9, 2024

ogrisel Apr 8, 2024

adrinjalali left a comment

lorentzenchr commented Apr 11, 2024 •

edited

ENH add verbosity to newton-cg solver #27526

ENH add verbosity to newton-cg solver #27526

Conversation

lorentzenchr commented Oct 3, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Oct 3, 2023 • edited

✔️ Linting Passed

adrinjalali commented Oct 6, 2023

jeremiedbb commented Oct 6, 2023

lorentzenchr commented Apr 5, 2024

ogrisel commented Apr 5, 2024

ogrisel commented Apr 5, 2024 • edited

ogrisel commented Apr 5, 2024

lorentzenchr commented Apr 5, 2024

ogrisel left a comment

Choose a reason for hiding this comment

adrinjalali commented Apr 8, 2024

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel Apr 7, 2024

Choose a reason for hiding this comment

lorentzenchr Apr 11, 2024

Choose a reason for hiding this comment

ogrisel Apr 8, 2024

Choose a reason for hiding this comment

lorentzenchr Apr 9, 2024

Choose a reason for hiding this comment

ogrisel Apr 8, 2024

Choose a reason for hiding this comment

lorentzenchr Apr 9, 2024

Choose a reason for hiding this comment

ogrisel Apr 8, 2024

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

lorentzenchr commented Apr 11, 2024 • edited

github-actions bot commented Oct 3, 2023 •

edited

ogrisel commented Apr 5, 2024 •

edited

lorentzenchr commented Apr 11, 2024 •

edited