Fix float32 cond linear regression by antoinebaker · Pull Request #33249 · scikit-learn/scikit-learn

antoinebaker · 2026-02-09T14:06:36Z

Fixes #33032 but will reopen #26164.

What does this implement/fix? Explain your changes.

As explained in #33032, the cond choice in PR #30040 to solve #26164 introduced new bugs for instance on float32 data. This PR reverts the changes made in #30040 and adds a reproducer for the bugs reported in #33032.

Comments / next steps

This a temporary "fix" following the plan outlined here. We need to reopen #26164.

In follow-up PRs we should either:

find a "good" choice for cond that ideally work on any data shape, dtype, and passes the sample weight consistency checks
expose cond as a parameter in LinearRegression as done for the sparse case in Add tol to LinearRegression #30521, we could actually re-use the tol parameter for this (which would mean cond in the dense case).

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Junteng Li <JasonLiJT@users.noreply.github.com>

antoinebaker · 2026-02-16T14:24:35Z

Thanks for the quick review @JasonLiJT @ogrisel. I think this small PR can be quickly merged, I've marked it ready for review.

What are your thoughts on the follow up PR, do you prefer option 1 (find a good default for cond) or 2 (exposing cond)?

I personally prefer option 2, I think it will be easier (finding a good default cond seems to be quite finicky).

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

antoinebaker · 2026-02-25T16:45:43Z

Rah, this is annoying :(

The cond=None option (scipy default) is failing the recently added tests in #33020 (test_regularization_limits_ridge* which check that Ridge* estimators recovers LinearRegression when alpha is near zero).

I think the most probable cause is the cutoff for "zero eigenvalue" in the various solvers (for Ridge, _RidgeGCV, LinearRegression). Ideally we should come up with a consistent choice for all of them. In practice this means that for a given dataset, the number of nonzero eigenvalues, in other words the rank, should be the same for all solvers. This might be difficult as some solvers have the cutoff hardcoded, some use the scipy default, some expose it as a parameter.

I will try to investigate to be sure. But I have the intuition that we will probably need a more involved PR, either exposing consistently this cutoff or finding a good default.

cc @lorentzenchr @ogrisel

antoinebaker · 2026-02-25T17:16:58Z

I think this small PR can be quickly merged, I've marked it ready for review.

I changed my mind :)

antoinebaker · 2026-03-18T08:42:34Z

Closing in favor of #33565

antoinebaker · 2026-03-18T09:03:10Z

I think the most probable cause is the cutoff for "zero eigenvalue" in the various solvers (for Ridge, _RidgeGCV, LinearRegression). Ideally we should come up with a consistent choice for all of them. In practice this means that for a given dataset, the number of nonzero eigenvalues, in other words the rank, should be the same for all solvers. This might be difficult as some solvers have the cutoff hardcoded, some use the scipy default, some expose it as a parameter.

I will try to investigate to be sure. But I have the intuition that we will probably need a more involved PR, either exposing consistently this cutoff or finding a good default.

TLDR: After investigation, finding a consistent choice for all solvers is not possible or too difficult. For some solvers we do not have any control, for the other solvers the tol argument refer to quite different stopping criteria, see the docstring of tol in Ridge for a quick summary.

antoinebaker added 4 commits February 4, 2026 17:13

add float32 test

ee19b5d

adapt rtol float32

26483bc

remove dtype param

8046494

typo

fe26b65

github-actions Bot added the module:linear_model label Feb 9, 2026

antoinebaker added 2 commits February 9, 2026 17:33

xfail test

53f2618

changelog

6cee4e3

ogrisel added the Numerical Stability label Feb 12, 2026

ogrisel reviewed Feb 12, 2026

View reviewed changes

Comment thread doc/whats_new/upcoming_changes/sklearn.linear_model/33249.fix.rst Outdated

Comment thread doc/whats_new/upcoming_changes/sklearn.linear_model/33249.fix.rst Outdated

JasonLiJT reviewed Feb 13, 2026

View reviewed changes

Comment thread doc/whats_new/upcoming_changes/sklearn.linear_model/33249.fix.rst Outdated

antoinebaker and others added 2 commits February 16, 2026 14:35

Apply suggestions from code review

77b0012

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Junteng Li <JasonLiJT@users.noreply.github.com>

rewrap

77a40cb

antoinebaker marked this pull request as ready for review February 16, 2026 14:23

lorentzenchr reviewed Feb 18, 2026

View reviewed changes

antoinebaker and others added 4 commits February 23, 2026 16:05

Apply suggestions from code review

fac8700

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

changelog

ce44083

name test

744a118

Merge branch 'main' into fix_float32_cond_linear_regression

58a17b0

antoinebaker marked this pull request as draft February 25, 2026 16:56

adrinjalali added this to Labs Mar 3, 2026

github-project-automation Bot moved this to Todo in Labs Mar 3, 2026

adrinjalali moved this from Todo to In progress in Labs Mar 3, 2026

ogrisel mentioned this pull request Mar 4, 2026

BUG: LinearRegression is wrong on float32 and large n_samples (> 500k rows) due to cond param to scipy.linalg.lstsq #33032

Closed

antoinebaker mentioned this pull request Mar 17, 2026

FIX Expose cond of lstsq solver as tol in LinearRegression #33565

Merged

antoinebaker closed this Mar 18, 2026

github-project-automation Bot moved this from In progress to Done in Labs Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix float32 cond linear regression#33249

Fix float32 cond linear regression#33249
antoinebaker wants to merge 12 commits intoscikit-learn:mainfrom
antoinebaker:fix_float32_cond_linear_regression

antoinebaker commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antoinebaker commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antoinebaker commented Feb 25, 2026 •

edited

Loading

Uh oh!

antoinebaker commented Feb 25, 2026

Uh oh!

antoinebaker commented Mar 18, 2026

Uh oh!

antoinebaker commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

antoinebaker commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this implement/fix? Explain your changes.

Comments / next steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antoinebaker commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antoinebaker commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoinebaker commented Feb 25, 2026

Uh oh!

antoinebaker commented Mar 18, 2026

Uh oh!

antoinebaker commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

antoinebaker commented Feb 9, 2026 •

edited

Loading

antoinebaker commented Feb 25, 2026 •

edited

Loading

antoinebaker commented Mar 18, 2026 •

edited

Loading