[MRG+1] Simplify sample_weight support in Ridge. #4116

mblondel · 2015-01-18T02:49:45Z

I simplified the code by doing the rescaling upfront. This allows to support sample_weight in all solvers and fixes the long-standing bug #1190. Ping @eickenberg, @fabianp, @agramfort

coveralls · 2015-01-18T02:59:40Z

Coverage increased (+0.0%) when pulling 7afa609 on mblondel:sample_weights into 2a2e7c6 on scikit-learn:master.

agramfort · 2015-01-18T08:58:53Z

lgtm

eickenberg · 2015-01-20T08:52:48Z

sklearn/linear_model/ridge.py

+def _rescale_data(X, y, sample_weight):
+    """Rescale data so as to support sample_weight"""
+    n_samples = X.shape[0]
+    sample_weight = sample_weight * np.ones(n_samples)


The default value for sample_weight coming in is None if I haven't missed a decision branch on the way. That should raise an error here.

There is a branch at https://github.com/mblondel/scikit-learn/blob/sample_weights/sklearn/linear_model/ridge.py#L320 I was just assuming that _rescale_data is not called if sample_weight is None.

The branch is not shown in the diff because it's already there in the current version.

OK. I think it is good the way it is. Moving a check for sample_weight=None would distribute decision making to auxiliary functions which would be weird. Sorry for the noise.

eickenberg · 2015-01-20T08:56:05Z

I am actually surprised to see that we had the clause that sample weights forced the use of the cholesky solver. I was convinced I had implemented sw support for the other solvers, too. But I am probably confusing this with my efforts from half a year ago that I wasn't able to finalize.

I like the idea in general, but it does come with some drawbacks: As it is now, applying sample weights causes a weighted copy of X to be in memory, which for large design matrices can be prohibitive. On the other hand, if you multiply inplace, all other sorts of (thread-related?) problems can crop up in addition to the problem of sample weights equalling 0.

So I see it as a nice default fallback way of solving the sample weight problem, but it should probably give precedence to solver-specific implementations iff these are able to avoid full copies of X (for n_samples < n_features a copy of the kernel matrix is better than a copy of X).

mblondel · 2015-01-20T11:10:02Z

@eickenberg For the n_features < n_samples case, there is not benefit in solver-specific handling since
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L103
allocates a copy of the design matrix as well. For the n_samples < n_features case, I agree with you, a copy of the kernel matrix could be better but I am not a fan of adding more branching in ridge_regression. In any case, I am keeping the sample_weight support in _solve_cholesky_kernel because this is needed for pre-computed kernels.

eickenberg · 2015-01-20T12:24:29Z

For the n_features < n_samples case, there is not benefit in solver-specific handling since https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L103

Indeed, I had overlooked that. Makes sense. So then for me this is fine.

mblondel · 2015-01-24T07:41:53Z

@eickenberg I removed the else clause.

eickenberg · 2015-01-24T14:32:11Z

sklearn/kernel_ridge.py

-from sklearn.base import BaseEstimator, RegressorMixin
-from sklearn.metrics.pairwise import pairwise_kernels
-from sklearn.linear_model.ridge import _solve_cholesky_kernel
-from sklearn.utils.validation import check_is_fitted


Tangential issue: I am confused now about whether or not to avoid relative imports in tests. For some reason, I thought absolute imports were preferred. Although I do agree that by a stroke of bad luck, the absolute import may cause the import of an entirely different package.

@agramfort why did you ask for absolute imports here ?

The way I see it: tests are like user scripts and so use absolute imports. Library is internal code so it uses relative imports. However, absolute imports should work too.

Ah yes, that is the difference. Thanks for the clarification!!

test should always use absolute imports in scikit-learn. It might not be the same version that the tests belong to, but it will be the version that the user actually gets.

eickenberg · 2015-01-24T14:33:55Z

LGTM!

mblondel · 2015-01-24T15:21:17Z

Merged by rebase.

mblondel changed the title ~~[MRG] Simplify sample_weight support in Ridge.~~ [MRG+1] Simplify sample_weight support in Ridge. Jan 18, 2015

eickenberg reviewed Jan 20, 2015
View reviewed changes

mblondel added 4 commits January 24, 2015 16:58

Handle sample_weight upfront.

2c31213

Simplify rescaling.

f7bacbd

Use relative imports.

1e123e3

Add what's new entry.

795b12b

mblondel force-pushed the sample_weights branch from 1d645ee to 795b12b Compare January 24, 2015 07:58

eickenberg reviewed Jan 24, 2015
View reviewed changes

mblondel closed this Jan 24, 2015

mblondel mentioned this pull request Jan 24, 2015

Better sample_weight support in Ridge #1190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Simplify sample_weight support in Ridge. #4116

[MRG+1] Simplify sample_weight support in Ridge. #4116

mblondel commented Jan 18, 2015

coveralls commented Jan 18, 2015

agramfort commented Jan 18, 2015

eickenberg Jan 20, 2015

mblondel Jan 20, 2015

mblondel Jan 20, 2015

eickenberg Jan 20, 2015

eickenberg commented Jan 20, 2015

mblondel commented Jan 20, 2015

eickenberg commented Jan 20, 2015

mblondel commented Jan 24, 2015

eickenberg Jan 24, 2015

mblondel Jan 24, 2015

eickenberg Jan 24, 2015

amueller Jan 24, 2015

eickenberg commented Jan 24, 2015

mblondel commented Jan 24, 2015

[MRG+1] Simplify sample_weight support in Ridge. #4116

[MRG+1] Simplify sample_weight support in Ridge. #4116

Conversation

mblondel commented Jan 18, 2015

coveralls commented Jan 18, 2015

agramfort commented Jan 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eickenberg commented Jan 20, 2015

mblondel commented Jan 20, 2015

eickenberg commented Jan 20, 2015

mblondel commented Jan 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eickenberg commented Jan 24, 2015

mblondel commented Jan 24, 2015