Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Ridge sparse + sample_weight + intercept #22899

Merged
merged 8 commits into from
Mar 22, 2022

Conversation

jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Mar 19, 2022

Fixes #15438

Same issue as in LinearRegression: in sparse X_offset needs the sample_weight rescaling.
The issue appears with solver = 'sparse-cg' and solver = 'lbfgs'.

I noticed that there's also an issue for 'sag' solver, but I'm not familiar with the code at all. Any help would be greatly appreciated. We can also leave that for a separate PR.

ping @lorentzenchr

@jeremiedbb jeremiedbb added this to the 1.1 milestone Mar 19, 2022
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also parametrize test_ridge_sample_weights and test_ridge_sample_weight_invariance for sparse input?

sklearn/linear_model/_ridge.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_base.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_ridge.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_ridge.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_ridge.py Outdated Show resolved Hide resolved


@pytest.mark.parametrize("solver", ["sparse_cg", "sag"])
def test_ridge_sample_weights_dense_sparse(solver, global_random_seed):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be merged with test_ridge_fit_intercept_sparse (and maybe also test_ridge_fit_intercept_sparse)? (take the best part of both).
It seems kind of duplicate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I merged them into a single test. It allowed to find out that lbfgs also had the same issue. I fixed it as well

@lorentzenchr
Copy link
Member

This is great as it will resolve a long existing, severe bug!

@@ -1362,33 +1362,41 @@ def test_n_iter():


@pytest.mark.parametrize("solver", ["sparse_cg", "lbfgs", "auto"])
def test_ridge_fit_intercept_sparse(solver):
@pytest.mark.parametrize("with_sample_weight", [True, False])
def test_ridge_fit_intercept_sparse(solver, with_sample_weight, global_random_seed):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested with "all" seeds

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sklearn/linear_model/_ridge.py Show resolved Hide resolved
@lorentzenchr
Copy link
Member

@agramfort @rth @TomDLT may be interested.

For now only sparse_cg and lbfgs can correctly fit an intercept
with sparse X with default tol and max_iter.
'sag' is tested separately in test_ridge_fit_intercept_sparse_sag because it
requires more iterations and should raise a warning if default max_iter is used.
Copy link
Member

@ogrisel ogrisel Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a new commit to make this true with sample_weight != None. I tested it with "all" global_random_seed.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a shallow look at the code changes and it looks ok-ish but I did not check the maths. I just trust the updated tests.

+1 for merge once the CI is green.

@jeremiedbb jeremiedbb merged commit d76f87c into scikit-learn:main Mar 22, 2022
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in sparse in Ridge with sample weights
3 participants