Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST add test to check that all ridge solver give the same results #13914

Merged

Conversation

jeromedockes
Copy link
Contributor

There lacks a test checking that all the ridge solvers (including the gcv) give similar results for a few different datasets (varying shapes, float32/64, and sparsity)

this test:

def test_ridge_individual_penalties():

compares solvers but for only one dense dataset with null coefficients and no intercept, and is meant to test individual penalties on outputs, so RidgeCV cannot be included since it does not support individual penalties.

after IRL with @ogrisel

@agramfort
Copy link
Member

@jeromedockes do you time to also look at #13246

there is also there a problem with Ridge that is still different between dense and sparse

@jeromedockes
Copy link
Contributor Author

@jeromedockes do you time to also look at #13246

there is also there a problem with Ridge that is still different between dense and sparse

yes, as far as I can tell the Ridge class (or other estimators that rely on it such as RANSAC with a Ridge as base estimator) is the only remaining problem for #13246 ?

It seems that at the moment, only the sparse_cg solver in Ridge can correctly fit an intercept when the design matrix is sparse, but it doesn't get selected when the solver is auto. SAG gets selected, and the documentation says it can fit an intercept with sparse X, but in most cases when X is sparse (hence not centered in preprocessing) sag requires many more iterations than the defaut max to find the right intercept, and the tolerance needs to be lowered as well.

the test that currently should check that Ridge fits an intercept uses a dataset where X has zero mean, which is why this problem is not caught by the tests on master.

I am planning to work on this and open a PR soon.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests run pretty fast (about 1sec) so LGTM

sklearn/linear_model/tests/test_ridge.py Outdated Show resolved Hide resolved
'n_samples,dtype,proportion_nonzero',
[(20, 'float32', .1), (40, 'float32', 1.), (20, 'float64', .2)])
@pytest.mark.parametrize('sparse_X', [True, False])
@pytest.mark.parametrize('seed', np.arange(300))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

300 seeds seems a bit too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my apologies about that, I had wanted to check with many seeds locally and
forgot to reduce the number of seeds afterwards. I set it to 3 now

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides maybe a too ambitious/slow test LGTM. +1 for MRG

@agramfort agramfort changed the title add test to check that all ridge solver give the same results [MRG+1] add test to check that all ridge solver give the same results Jun 10, 2019
@agramfort
Copy link
Member

Can I merge this one?

svd_ridge = Ridge(
solver='svd', normalize=True, alpha=alpha).fit(X, y)
X = X.astype(dtype)
y = y.astype(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add copy=False

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeromedockes !

@rth rth changed the title [MRG+1] add test to check that all ridge solver give the same results TST add test to check that all ridge solver give the same results Jun 11, 2019
@rth rth merged commit 61f6f5b into scikit-learn:master Jun 11, 2019
@jeromedockes jeromedockes deleted the add_test_ridge_solvers_consistency branch June 11, 2019 11:30
@jeromedockes
Copy link
Contributor Author

Thanks @jeromedockes !

thanks for your help!

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants