Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix selection of solver in ridge_regression when solver=='auto' #13363

Merged
merged 26 commits into from Apr 18, 2019
Merged
Changes from 18 commits
Commits
File filter...
Filter file types
Jump to…
Jump to file or symbol
Failed to load files and symbols.

Always

Just for now

@@ -177,9 +177,9 @@ Support for Python 3.4 and below has been officially dropped.
- |Enhancement| Minimized the validation of X in
:class:`ensemble.AdaBoostClassifier` and :class:`ensemble.AdaBoostRegressor`
:issue:`13174` by :user:`Christos Aridas <chkoar>`.

- |Enhancement| :class:`ensemble.IsolationForest` now exposes ``warm_start``
parameter, allowing iterative addition of trees to an isolation
parameter, allowing iterative addition of trees to an isolation
forest. :issue:`13496` by :user:`Peter Marko <petibear>`.

- |Efficiency| Make :class:`ensemble.IsolationForest` more memory efficient
@@ -340,6 +340,16 @@ Support for Python 3.4 and below has been officially dropped.
deterministic when trained in a multi-class setting on several threads.
:issue:`13422` by :user:`Clément Doumouro <ClemDoum>`.

- |Fix| Fixed bug in :func:`linear_model.ridge.ridge_regression` that

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Apr 10, 2019

Contributor

This is also a bugfix for RidgeClassifier at least

This comment has been minimized.

Copy link
@btel

btel Apr 16, 2019

Author Contributor

True, I updated the entry

caused unhandled exception for arguments ``return_intercept=True`` and
``solver=auto`` (default) or any other solver different from ``sag``.
:issue:`13363` by :user:`Bartosz Telenczuk <btel>`

- |API| :func:`linear_model.ridge.ridge_regression` will choose ``sparse_cg``
solver for sparse inputs when ``solver=auto`` and ``sample_weight``
is provided (previously `cholesky` solver was selected). :issue:`13363`
by :user:`Bartosz Telenczuk <btel>`

:mod:`sklearn.manifold`
............................

@@ -368,11 +368,25 @@ def _ridge_regression(X, y, alpha, sample_weight=None, solver='auto',
return_n_iter=False, return_intercept=False,
X_scale=None, X_offset=None):

if return_intercept and sparse.issparse(X) and solver != 'sag':
if solver != 'auto':
warnings.warn("In Ridge, only 'sag' solver can currently fit the "
"intercept when X is sparse. Solver has been "
"automatically changed into 'sag'.")
has_sw = sample_weight is not None
This conversation was marked as resolved by GaelVaroquaux

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Mar 2, 2019

Member

Now that this test is does only once, I think that it would be more readable to inline it.

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Mar 2, 2019

Member

Sorry, stupid comment: it is used inside a closure.

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Mar 2, 2019

Member

Actually, I would prefer the test to be inlined, and the function too.


if solver == 'auto':
if return_intercept:
# only sag supports fitting intercept directly
solver = "sag"
elif not sparse.issparse(X):
solver = "cholesky"
else:
solver = "sparse_cg"

if solver not in ('sparse_cg', 'cholesky', 'svd', 'lsqr', 'sag', 'saga'):
raise ValueError("Known solvers are 'sparse_cg', 'cholesky', 'svd'"
" 'lsqr', 'sag' or 'saga'. Got %s." % solver)

if return_intercept and solver != 'sag':
warnings.warn("In Ridge, only 'sag' solver can currently fit the "
"intercept. Solver has been "
"automatically changed into 'sag'.")

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Mar 2, 2019

Member

Is this still true: I though that now ridge supports fitting the intercept via other solvers.

This comment has been minimized.

Copy link
@btel

btel Mar 2, 2019

Author Contributor

Only the Ridge.fit method supports fitting intercept with other solvers (i.e. cholesky/sparse_cg). This was already the case before and it should be changed when the ridge_regression function is refactored as discussed in #13336.

This comment has been minimized.

Copy link
@jnothman

jnothman Mar 12, 2019

Member

Should we not just raise an error in this case?

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Mar 20, 2019

Member

I think an error is better too.
Doesn't 'saga' also support fitting intercept ?

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Apr 10, 2019

Contributor

Since #13336 is merged this needs to be update

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Apr 17, 2019

Contributor

I agree an error should be raised instead of a warning

solver = 'sag'

_dtype = [np.float64, np.float32]
@@ -404,14 +418,7 @@ def _ridge_regression(X, y, alpha, sample_weight=None, solver='auto',
raise ValueError("Number of samples in X and y does not correspond:"
" %d != %d" % (n_samples, n_samples_))

has_sw = sample_weight is not None

if solver == 'auto':
# cholesky if it's a dense array and cg in any other case
if not sparse.issparse(X) or has_sw:
solver = 'cholesky'
else:
solver = 'sparse_cg'

if has_sw:
if np.atleast_1d(sample_weight).ndim > 1:
@@ -432,8 +439,6 @@ def _ridge_regression(X, y, alpha, sample_weight=None, solver='auto',
if alpha.size == 1 and n_targets > 1:
alpha = np.repeat(alpha, n_targets)

if solver not in ('sparse_cg', 'cholesky', 'svd', 'lsqr', 'sag', 'saga'):
raise ValueError('Solver %s not understood' % solver)

n_iter = None
if solver == 'sparse_cg':
@@ -555,7 +560,7 @@ def fit(self, X, y, sample_weight=None):
# add the offset which was subtracted by _preprocess_data
self.intercept_ += y_offset
else:
if sparse.issparse(X):
if sparse.issparse(X) and self.solver == 'sparse_cg':
# required to fit intercept with sparse_cg solver
params = {'X_offset': X_offset, 'X_scale': X_scale}
else:
@@ -7,6 +7,7 @@

from sklearn.utils.testing import assert_almost_equal
from sklearn.utils.testing import assert_array_almost_equal
from sklearn.utils.testing import assert_allclose
from sklearn.utils.testing import assert_equal
from sklearn.utils.testing import assert_array_equal
from sklearn.utils.testing import assert_greater
@@ -778,7 +779,8 @@ def test_raises_value_error_if_solver_not_supported():
wrong_solver = "This is not a solver (MagritteSolveCV QuantumBitcoin)"

exception = ValueError
message = "Solver %s not understood" % wrong_solver
message = ("Known solvers are 'sparse_cg', 'cholesky', 'svd'"
" 'lsqr', 'sag' or 'saga'. Got %s." % wrong_solver)

def func():
X = np.eye(3)
@@ -837,6 +839,62 @@ def test_ridge_fit_intercept_sparse():
assert_array_almost_equal(dense.coef_, sparse.coef_)


@pytest.mark.parametrize('return_intercept', [False, True])
@pytest.mark.parametrize('sample_weight', [None, np.ones(1000)])
@pytest.mark.parametrize('arr_type', [np.array, sp.csr_matrix])
@pytest.mark.parametrize('solver', ['auto', 'sparse_cg', 'cholesky', 'lsqr',
'sag', 'saga'])
def test_ridge_regression_check_arguments_validity(return_intercept,
sample_weight, arr_type,
solver):
"""check if all combinations of arguments give valid estimations"""

# test excludes 'svd' solver because it raises exception for sparse inputs

X = np.random.rand(1000, 3)

This comment has been minimized.

Copy link
@agramfort

agramfort Apr 10, 2019

Member

please use a fixed random_state

This comment has been minimized.

Copy link
@btel

btel Apr 16, 2019

Author Contributor

done

true_coefs = [1, 2, 0.1]
y = np.dot(X, true_coefs)
X_testing = arr_type(X)

target = ridge_regression(X_testing, y, 1,
solver=solver,
sample_weight=sample_weight,
return_intercept=return_intercept
)
if return_intercept:
coef, intercept = target
assert_allclose(coef, true_coefs, rtol=0, atol=0.03)
assert_allclose(intercept, 0, rtol=0, atol=0.03)
else:

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Mar 21, 2019

Member

is an absolute tol of 0.1 necessary ?

This comment has been minimized.

Copy link
@jnothman

jnothman Mar 21, 2019

Member

Needs to be atol when comparing to 0 but 0.1 seems big for checking equality

This comment has been minimized.

Copy link
@btel

btel Mar 21, 2019

Author Contributor

it's true, but the differences in the estimations are around 0.02, so I can change to atol=0.03.

The true coefs are: 1, 2, 0.1, intercept 0

.solver auto, return_intercept=True, dense, no sw: (array([0.98738028, 1.97544885, 0.0983598 ]), 0.019808458760372082)
.solver auto, return_intercept=False, dense, with sw: [0.99946651 1.98731769 0.10972413]
.solver auto, return_intercept=True, dense, with sw: (array([0.98793847, 1.97696595, 0.1002527 ]), 0.01906343205046055)
.solver auto, return_intercept=False, sparse, no sw: [0.99836423 1.98816093 0.10940093]
.solver auto, return_intercept=True, sparse, no sw: (array([0.98888637, 1.97631928, 0.10069256]), 0.016659883979934714)
.solver auto, return_intercept=False, sparse, with sw: [0.99966452 1.98818802 0.10897127]
.solver auto, return_intercept=True, sparse, with sw: (array([0.9893669 , 1.97592915, 0.09956311]), 0.017160723846287827)
.solver sparse_cg, return_intercept=False, dense, no sw: [0.9991974  1.98769984 0.10987237]
.solver sparse_cg, return_intercept=True, dense, no sw: (array([0.98920646, 1.97558791, 0.09788704]), 0.021340292109204684)
.solver sparse_cg, return_intercept=False, dense, with sw: [0.99024222 1.99174497 0.11463053]
.solver sparse_cg, return_intercept=True, dense, with sw: (array([0.98761024, 1.97533341, 0.09901903]), 0.01854073215870237)
.solver sparse_cg, return_intercept=False, sparse, no sw: [0.9994363  1.98744613 0.1091634 ]
.solver sparse_cg, return_intercept=True, sparse, no sw: (array([0.9880755 , 1.97727505, 0.09942038]), 0.01744125549431998)
.solver sparse_cg, return_intercept=False, sparse, with sw: [0.99054007 1.99162964 0.1142885 ]
.solver sparse_cg, return_intercept=True, sparse, with sw: (array([0.98772017, 1.9773033 , 0.10027304]), 0.017187277743091305)
.solver cholesky, return_intercept=False, dense, no sw: [0.99913301 1.98694126 0.10987065]
.solver cholesky, return_intercept=True, dense, no sw: (array([0.98777785, 1.97725736, 0.09918002]), 0.019112808226411305)
.solver cholesky, return_intercept=False, dense, with sw: [0.99911638 1.98733099 0.10993568]
.solver cholesky, return_intercept=True, dense, with sw: (array([0.98614477, 1.97491618, 0.09864355]), 0.01985489984301323)
.solver cholesky, return_intercept=False, sparse, no sw: [0.99941728 1.98712034 0.1096048 ]
.solver cholesky, return_intercept=True, sparse, no sw: (array([0.98829793, 1.97603815, 0.09797809]), 0.019415378825590024)
.solver cholesky, return_intercept=False, sparse, with sw: [0.99934764 1.98748946 0.10933262]
.solver cholesky, return_intercept=True, sparse, with sw: (array([0.98731514, 1.97603497, 0.09975062]), 0.018854865586171575)
.solver lsqr, return_intercept=False, dense, no sw: [0.99951604 1.98764362 0.10910706]
.solver lsqr, return_intercept=True, dense, no sw: (array([0.9870815 , 1.97716204, 0.0995867 ]), 0.015630561450455646)
.solver lsqr, return_intercept=False, dense, with sw: [0.99976968 1.98730972 0.10931853]
.solver lsqr, return_intercept=True, dense, with sw: (array([0.9884556 , 1.97708318, 0.0994495 ]), 0.020868291821179098)
.solver lsqr, return_intercept=False, sparse, no sw: [0.99895738 1.98781999 0.1097974 ]
.solver lsqr, return_intercept=True, sparse, no sw: (array([0.98793535, 1.97810114, 0.10162843]), 0.01745574357981361)
.solver lsqr, return_intercept=False, sparse, with sw: [0.99888735 1.98760483 0.11008256]
.solver lsqr, return_intercept=True, sparse, with sw: (array([0.98908756, 1.97644754, 0.09898548]), 0.0179269121290331)
.solver sag, return_intercept=False, dense, no sw: [0.99851148 1.98589838 0.10733857]
.solver sag, return_intercept=True, dense, no sw: (array([0.98735441, 1.97573369, 0.09955717]), 0.019699491413496736)
.solver sag, return_intercept=False, dense, with sw: [1.00098593 1.98703252 0.10881227]
.solver sag, return_intercept=True, dense, with sw: (array([0.98860356, 1.97635123, 0.09842113]), 0.01970872473553323)
.solver sag, return_intercept=False, sparse, no sw: [0.99926502 1.9880752  0.11024386]
.solver sag, return_intercept=True, sparse, no sw: (array([0.98690006, 1.97720689, 0.09983899]), 0.018324191901867053)
.solver sag, return_intercept=False, sparse, with sw: [1.00034609 1.98534772 0.10857017]
.solver sag, return_intercept=True, sparse, with sw: (array([0.98836367, 1.97905487, 0.09644623]), 0.018058154068734605)
.solver saga, return_intercept=False, dense, no sw: [0.999297   1.98728781 0.10987951]
.solver saga, return_intercept=True, dense, no sw: (array([0.98715784, 1.97528129, 0.09813245]), 0.02077689395931017)
.solver saga, return_intercept=False, dense, with sw: [0.99869864 1.98810477 0.10949984]
.solver saga, return_intercept=True, dense, with sw: (array([0.98897455, 1.97488114, 0.09956767]), 0.021863847921240975)
.solver saga, return_intercept=False, sparse, no sw: [0.99885491 1.98692853 0.11079414]
.solver saga, return_intercept=True, sparse, no sw: (array([0.98841551, 1.97610063, 0.10014244]), 0.01736938519414459)
.solver saga, return_intercept=False, sparse, with sw: [0.99622644 1.99183347 0.10405478]
.solver saga, return_intercept=True, sparse, with sw: (array([0.98847833, 1.97703492, 0.09749779]), 0.018143914964244622)

so I can change to atol=0.03

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Mar 27, 2019

Member

I guess the default tol of 1e-3 might be the reason of this poor comparison. Could you try with a zero tol ?

This comment has been minimized.

Copy link
@agramfort

agramfort Apr 10, 2019

Member

@btel I don't understand why you cannot use a lower tolerance as you have no noise added to data.

This comment has been minimized.

Copy link
@btel

btel Apr 17, 2019

Author Contributor

what do you mean? the data are randomly generated, so I don't get exactly the coefficients I put in. I can freeze the seed and test against the coefficients that I get after a test run, but still I might get some small differences between the solvers.

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Apr 17, 2019

Contributor

What @agramfort meant is that you could pass a smaller tolerance to ridge_regression (as pushed in 604f7d1). Since your data isn't noisy, the solvers should converge just fine

assert_allclose(target, true_coefs, rtol=0, atol=0.03)


def test_ridge_regression_warns_with_return_intercept():
# return_itercept is only supported by sag/saga

X, y = make_regression(n_samples=1000, n_features=2, n_informative=2,
bias=10., random_state=42)

for solver in ['sparse_cg', 'cholesky', 'svd', 'lsqr', 'saga']:

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Apr 17, 2019

Contributor

this could be parametrized but OK

with pytest.warns(UserWarning, match="In Ridge, only 'sag' solver"):
target = ridge_regression(X, y, 1,

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Mar 20, 2019

Member

you can match the warning message directly

with pytest.warns(UserWarning, match='return_intercept=True is only'):
   ...

This comment has been minimized.

Copy link
@btel

btel Mar 21, 2019

Author Contributor

done

solver=solver,
return_intercept=True
)
assert len(target) == 2
assert target[0].shape == (2,)

with pytest.warns(None) as record:
target = ridge_regression(X, y, 1,
solver="sag",
return_intercept=True
)
assert len(target) == 2
assert target[0].shape == (2,)

# no warning should be raised
assert len(record) == 0

def test_errors_and_values_helper():
ridgecv = _RidgeGCV()
rng = check_random_state(42)
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.