New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Simplify sample_weight support in Ridge. #4116
Conversation
lgtm |
def _rescale_data(X, y, sample_weight): | ||
"""Rescale data so as to support sample_weight""" | ||
n_samples = X.shape[0] | ||
sample_weight = sample_weight * np.ones(n_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for sample_weight
coming in is None
if I haven't missed a decision branch on the way. That should raise an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a branch at https://github.com/mblondel/scikit-learn/blob/sample_weights/sklearn/linear_model/ridge.py#L320 I was just assuming that _rescale_data
is not called if sample_weight
is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch is not shown in the diff because it's already there in the current version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think it is good the way it is. Moving a check for sample_weight=None
would distribute decision making to auxiliary functions which would be weird. Sorry for the noise.
I am actually surprised to see that we had the clause that sample weights forced the use of the cholesky solver. I was convinced I had implemented sw support for the other solvers, too. But I am probably confusing this with my efforts from half a year ago that I wasn't able to finalize. I like the idea in general, but it does come with some drawbacks: As it is now, applying sample weights causes a weighted copy of So I see it as a nice default fallback way of solving the sample weight problem, but it should probably give precedence to solver-specific implementations iff these are able to avoid full copies of |
@eickenberg For the n_features < n_samples case, there is not benefit in solver-specific handling since |
Indeed, I had overlooked that. Makes sense. So then for me this is fine. |
@eickenberg I removed the else clause. |
1d645ee
to
795b12b
Compare
from sklearn.base import BaseEstimator, RegressorMixin | ||
from sklearn.metrics.pairwise import pairwise_kernels | ||
from sklearn.linear_model.ridge import _solve_cholesky_kernel | ||
from sklearn.utils.validation import check_is_fitted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tangential issue: I am confused now about whether or not to avoid relative imports in tests. For some reason, I thought absolute imports were preferred. Although I do agree that by a stroke of bad luck, the absolute import may cause the import of an entirely different package.
@agramfort why did you ask for absolute imports here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I see it: tests are like user scripts and so use absolute imports. Library is internal code so it uses relative imports. However, absolute imports should work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that is the difference. Thanks for the clarification!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test should always use absolute imports in scikit-learn. It might not be the same version that the tests belong to, but it will be the version that the user actually gets.
LGTM! |
Merged by rebase. |
I simplified the code by doing the rescaling upfront. This allows to support sample_weight in all solvers and fixes the long-standing bug #1190. Ping @eickenberg, @fabianp, @agramfort