Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix selection of solver in ridge_regression when solver=='auto' #13363

Merged
merged 26 commits into from
Apr 18, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions doc/whats_new/v0.21.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,9 +177,9 @@ Support for Python 3.4 and below has been officially dropped.
- |Enhancement| Minimized the validation of X in
:class:`ensemble.AdaBoostClassifier` and :class:`ensemble.AdaBoostRegressor`
:issue:`13174` by :user:`Christos Aridas <chkoar>`.

- |Enhancement| :class:`ensemble.IsolationForest` now exposes ``warm_start``
parameter, allowing iterative addition of trees to an isolation
parameter, allowing iterative addition of trees to an isolation
forest. :issue:`13496` by :user:`Peter Marko <petibear>`.

- |Efficiency| Make :class:`ensemble.IsolationForest` more memory efficient
Expand Down Expand Up @@ -340,6 +340,22 @@ Support for Python 3.4 and below has been officially dropped.
deterministic when trained in a multi-class setting on several threads.
:issue:`13422` by :user:`Clément Doumouro <ClemDoum>`.

- |Fix| Fixed bug in :func:`linear_model.ridge.ridge_regression`,
:class:`linear_model.ridge.Ridge` and
:class:`linear_model.ridge.ridge.RidgeClassifier` that
caused unhandled exception for arguments ``return_intercept=True`` and
``solver=auto`` (default) or any other solver different from ``sag``.
:issue:`13363` by :user:`Bartosz Telenczuk <btel>`

- |Fix| :func:`linear_model.ridge.ridge_regression` will now raise an exception
if ``return_intercept=True`` and solver is different from ``sag``. Previously,
only warning was issued. :issue:`13363` by :user:`Bartosz Telenczuk <btel>`

- |API| :func:`linear_model.ridge.ridge_regression` will choose ``sparse_cg``
solver for sparse inputs when ``solver=auto`` and ``sample_weight``
is provided (previously `cholesky` solver was selected). :issue:`13363`
by :user:`Bartosz Telenczuk <btel>`

:mod:`sklearn.manifold`
............................

Expand Down
36 changes: 20 additions & 16 deletions sklearn/linear_model/ridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,12 +368,25 @@ def _ridge_regression(X, y, alpha, sample_weight=None, solver='auto',
return_n_iter=False, return_intercept=False,
X_scale=None, X_offset=None):

if return_intercept and sparse.issparse(X) and solver != 'sag':
if solver != 'auto':
warnings.warn("In Ridge, only 'sag' solver can currently fit the "
"intercept when X is sparse. Solver has been "
"automatically changed into 'sag'.")
solver = 'sag'
has_sw = sample_weight is not None
GaelVaroquaux marked this conversation as resolved.
Show resolved Hide resolved

if solver == 'auto':
if return_intercept:
# only sag supports fitting intercept directly
solver = "sag"
elif not sparse.issparse(X):
solver = "cholesky"
else:
solver = "sparse_cg"

if solver not in ('sparse_cg', 'cholesky', 'svd', 'lsqr', 'sag', 'saga'):
raise ValueError("Known solvers are 'sparse_cg', 'cholesky', 'svd'"
" 'lsqr', 'sag' or 'saga'. Got %s." % solver)

if return_intercept and solver != 'sag':
raise ValueError("In Ridge, only 'sag' solver can directly fit the "
"intercept. Please change solver to 'sag' or set "
"return_intercept=False.")

_dtype = [np.float64, np.float32]

Expand Down Expand Up @@ -404,14 +417,7 @@ def _ridge_regression(X, y, alpha, sample_weight=None, solver='auto',
raise ValueError("Number of samples in X and y does not correspond:"
" %d != %d" % (n_samples, n_samples_))

has_sw = sample_weight is not None

if solver == 'auto':
# cholesky if it's a dense array and cg in any other case
if not sparse.issparse(X) or has_sw:
solver = 'cholesky'
else:
solver = 'sparse_cg'

if has_sw:
if np.atleast_1d(sample_weight).ndim > 1:
Expand All @@ -432,8 +438,6 @@ def _ridge_regression(X, y, alpha, sample_weight=None, solver='auto',
if alpha.size == 1 and n_targets > 1:
alpha = np.repeat(alpha, n_targets)

if solver not in ('sparse_cg', 'cholesky', 'svd', 'lsqr', 'sag', 'saga'):
raise ValueError('Solver %s not understood' % solver)

n_iter = None
if solver == 'sparse_cg':
Expand Down Expand Up @@ -555,7 +559,7 @@ def fit(self, X, y, sample_weight=None):
# add the offset which was subtracted by _preprocess_data
self.intercept_ += y_offset
else:
if sparse.issparse(X):
if sparse.issparse(X) and self.solver == 'sparse_cg':
# required to fit intercept with sparse_cg solver
params = {'X_offset': X_offset, 'X_scale': X_scale}
else:
Expand Down
58 changes: 54 additions & 4 deletions sklearn/linear_model/tests/test_ridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from sklearn.utils.testing import assert_almost_equal
from sklearn.utils.testing import assert_array_almost_equal
from sklearn.utils.testing import assert_allclose
from sklearn.utils.testing import assert_equal
from sklearn.utils.testing import assert_array_equal
from sklearn.utils.testing import assert_greater
Expand Down Expand Up @@ -778,7 +779,8 @@ def test_raises_value_error_if_solver_not_supported():
wrong_solver = "This is not a solver (MagritteSolveCV QuantumBitcoin)"

exception = ValueError
message = "Solver %s not understood" % wrong_solver
message = ("Known solvers are 'sparse_cg', 'cholesky', 'svd'"
" 'lsqr', 'sag' or 'saga'. Got %s." % wrong_solver)

def func():
X = np.eye(3)
Expand Down Expand Up @@ -832,9 +834,57 @@ def test_ridge_fit_intercept_sparse():
# test the solver switch and the corresponding warning
for solver in ['saga', 'lsqr']:
sparse = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True)
assert_warns(UserWarning, sparse.fit, X_csr, y)
assert_almost_equal(dense.intercept_, sparse.intercept_)
assert_array_almost_equal(dense.coef_, sparse.coef_)
assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y)


@pytest.mark.parametrize('return_intercept', [False, True])
@pytest.mark.parametrize('sample_weight', [None, np.ones(1000)])
@pytest.mark.parametrize('arr_type', [np.array, sp.csr_matrix])
@pytest.mark.parametrize('solver', ['auto', 'sparse_cg', 'cholesky', 'lsqr',
'sag', 'saga'])
def test_ridge_regression_check_arguments_validity(return_intercept,
sample_weight, arr_type,
solver):
"""check if all combinations of arguments give valid estimations"""

# test excludes 'svd' solver because it raises exception for sparse inputs

rng = check_random_state(42)
X = rng.rand(1000, 3)
true_coefs = [1, 2, 0.1]
y = np.dot(X, true_coefs)
true_intercept = 0.
if return_intercept:
true_intercept = 10000.
y += true_intercept
X_testing = arr_type(X)

alpha, atol, tol = 1e-3, 1e-4, 1e-6

if solver not in ['sag', 'auto'] and return_intercept:
assert_raises_regex(ValueError,
"In Ridge, only 'sag' solver",
ridge_regression, X_testing, y,
alpha=alpha,
solver=solver,
sample_weight=sample_weight,
return_intercept=return_intercept,
tol=tol)
return

out = ridge_regression(X_testing, y, alpha=alpha,
solver=solver,
sample_weight=sample_weight,
return_intercept=return_intercept,
tol=tol,
)

if return_intercept:
coef, intercept = out
assert_allclose(coef, true_coefs, rtol=0, atol=atol)
assert_allclose(intercept, true_intercept, rtol=0, atol=atol)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is an absolute tol of 0.1 necessary ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be atol when comparing to 0 but 0.1 seems big for checking equality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's true, but the differences in the estimations are around 0.02, so I can change to atol=0.03.

The true coefs are: 1, 2, 0.1, intercept 0

.solver auto, return_intercept=True, dense, no sw: (array([0.98738028, 1.97544885, 0.0983598 ]), 0.019808458760372082)
.solver auto, return_intercept=False, dense, with sw: [0.99946651 1.98731769 0.10972413]
.solver auto, return_intercept=True, dense, with sw: (array([0.98793847, 1.97696595, 0.1002527 ]), 0.01906343205046055)
.solver auto, return_intercept=False, sparse, no sw: [0.99836423 1.98816093 0.10940093]
.solver auto, return_intercept=True, sparse, no sw: (array([0.98888637, 1.97631928, 0.10069256]), 0.016659883979934714)
.solver auto, return_intercept=False, sparse, with sw: [0.99966452 1.98818802 0.10897127]
.solver auto, return_intercept=True, sparse, with sw: (array([0.9893669 , 1.97592915, 0.09956311]), 0.017160723846287827)
.solver sparse_cg, return_intercept=False, dense, no sw: [0.9991974  1.98769984 0.10987237]
.solver sparse_cg, return_intercept=True, dense, no sw: (array([0.98920646, 1.97558791, 0.09788704]), 0.021340292109204684)
.solver sparse_cg, return_intercept=False, dense, with sw: [0.99024222 1.99174497 0.11463053]
.solver sparse_cg, return_intercept=True, dense, with sw: (array([0.98761024, 1.97533341, 0.09901903]), 0.01854073215870237)
.solver sparse_cg, return_intercept=False, sparse, no sw: [0.9994363  1.98744613 0.1091634 ]
.solver sparse_cg, return_intercept=True, sparse, no sw: (array([0.9880755 , 1.97727505, 0.09942038]), 0.01744125549431998)
.solver sparse_cg, return_intercept=False, sparse, with sw: [0.99054007 1.99162964 0.1142885 ]
.solver sparse_cg, return_intercept=True, sparse, with sw: (array([0.98772017, 1.9773033 , 0.10027304]), 0.017187277743091305)
.solver cholesky, return_intercept=False, dense, no sw: [0.99913301 1.98694126 0.10987065]
.solver cholesky, return_intercept=True, dense, no sw: (array([0.98777785, 1.97725736, 0.09918002]), 0.019112808226411305)
.solver cholesky, return_intercept=False, dense, with sw: [0.99911638 1.98733099 0.10993568]
.solver cholesky, return_intercept=True, dense, with sw: (array([0.98614477, 1.97491618, 0.09864355]), 0.01985489984301323)
.solver cholesky, return_intercept=False, sparse, no sw: [0.99941728 1.98712034 0.1096048 ]
.solver cholesky, return_intercept=True, sparse, no sw: (array([0.98829793, 1.97603815, 0.09797809]), 0.019415378825590024)
.solver cholesky, return_intercept=False, sparse, with sw: [0.99934764 1.98748946 0.10933262]
.solver cholesky, return_intercept=True, sparse, with sw: (array([0.98731514, 1.97603497, 0.09975062]), 0.018854865586171575)
.solver lsqr, return_intercept=False, dense, no sw: [0.99951604 1.98764362 0.10910706]
.solver lsqr, return_intercept=True, dense, no sw: (array([0.9870815 , 1.97716204, 0.0995867 ]), 0.015630561450455646)
.solver lsqr, return_intercept=False, dense, with sw: [0.99976968 1.98730972 0.10931853]
.solver lsqr, return_intercept=True, dense, with sw: (array([0.9884556 , 1.97708318, 0.0994495 ]), 0.020868291821179098)
.solver lsqr, return_intercept=False, sparse, no sw: [0.99895738 1.98781999 0.1097974 ]
.solver lsqr, return_intercept=True, sparse, no sw: (array([0.98793535, 1.97810114, 0.10162843]), 0.01745574357981361)
.solver lsqr, return_intercept=False, sparse, with sw: [0.99888735 1.98760483 0.11008256]
.solver lsqr, return_intercept=True, sparse, with sw: (array([0.98908756, 1.97644754, 0.09898548]), 0.0179269121290331)
.solver sag, return_intercept=False, dense, no sw: [0.99851148 1.98589838 0.10733857]
.solver sag, return_intercept=True, dense, no sw: (array([0.98735441, 1.97573369, 0.09955717]), 0.019699491413496736)
.solver sag, return_intercept=False, dense, with sw: [1.00098593 1.98703252 0.10881227]
.solver sag, return_intercept=True, dense, with sw: (array([0.98860356, 1.97635123, 0.09842113]), 0.01970872473553323)
.solver sag, return_intercept=False, sparse, no sw: [0.99926502 1.9880752  0.11024386]
.solver sag, return_intercept=True, sparse, no sw: (array([0.98690006, 1.97720689, 0.09983899]), 0.018324191901867053)
.solver sag, return_intercept=False, sparse, with sw: [1.00034609 1.98534772 0.10857017]
.solver sag, return_intercept=True, sparse, with sw: (array([0.98836367, 1.97905487, 0.09644623]), 0.018058154068734605)
.solver saga, return_intercept=False, dense, no sw: [0.999297   1.98728781 0.10987951]
.solver saga, return_intercept=True, dense, no sw: (array([0.98715784, 1.97528129, 0.09813245]), 0.02077689395931017)
.solver saga, return_intercept=False, dense, with sw: [0.99869864 1.98810477 0.10949984]
.solver saga, return_intercept=True, dense, with sw: (array([0.98897455, 1.97488114, 0.09956767]), 0.021863847921240975)
.solver saga, return_intercept=False, sparse, no sw: [0.99885491 1.98692853 0.11079414]
.solver saga, return_intercept=True, sparse, no sw: (array([0.98841551, 1.97610063, 0.10014244]), 0.01736938519414459)
.solver saga, return_intercept=False, sparse, with sw: [0.99622644 1.99183347 0.10405478]
.solver saga, return_intercept=True, sparse, with sw: (array([0.98847833, 1.97703492, 0.09749779]), 0.018143914964244622)

so I can change to atol=0.03

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the default tol of 1e-3 might be the reason of this poor comparison. Could you try with a zero tol ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btel I don't understand why you cannot use a lower tolerance as you have no noise added to data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? the data are randomly generated, so I don't get exactly the coefficients I put in. I can freeze the seed and test against the coefficients that I get after a test run, but still I might get some small differences between the solvers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @agramfort meant is that you could pass a smaller tolerance to ridge_regression (as pushed in 604f7d1). Since your data isn't noisy, the solvers should converge just fine

assert_allclose(out, true_coefs, rtol=0, atol=atol)


def test_errors_and_values_helper():
Expand Down