Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG fix Normalize for linear models when used with sample_weight #19426

Merged
merged 63 commits into from Feb 22, 2021

Conversation

maikia
Copy link
Contributor

@maikia maikia commented Feb 10, 2021

_preprocess_data in _base for linear models, when evoked with normalize = True and sample_weight does not weight standard deviation and therefore does not behave as expected. This PR updates it.

Currently we are working on deprecation of normalize: #3020
as pointed out by @ogrisel in #17743 (comment) when testing for the same result when running lasso or ridge with normalize set to True or in a pipeline with StandardScaler with normalize set to False, the test fails.

After through investigation @agramfort realized the above mentioned issue. If not updated we cannot expect the pipeline with a StandardScaler to give the same results as linear_model with normalize set to True.

This issue has been also noted previously by @jnothman

# XXX: if normalize=True, should we expect a weighted standard deviation?

Possibly it might also related to: #17444

cc @glemaitre

@agramfort agramfort changed the title WIP BUG fix Normalize for linear models when used with sample_weight MRG fix Normalize for linear models when used with sample_weight Feb 11, 2021
@agramfort
Copy link
Member

this one should be good to go

cc @glemaitre @ogrisel @lorentzenchr

@glemaitre glemaitre self-requested a review February 11, 2021 08:13
@agramfort
Copy link
Member

agramfort commented Feb 14, 2021

@ogrisel let's see if it helps to understand the problem. I wrote from scratch a
ridge function that does what this branch currently does. You'll see that it
matches in both cases (trimmed or not). To achieve what you want you need
to remove the line

X_var *= X.shape[0] # difference between normalize and standardize

This line basically means you take a scale that is given by n_samples
ignore the weights. Let me know if it's not clearn.

import numpy as np
from numpy.testing import assert_allclose
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.linear_model._base import _preprocess_data

np.random.seed(0)

n_samples = 100
X, y = make_regression(n_samples=n_samples, n_features=3)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Check that setting the sw to 0 for the first 10 samples
# is equivalent to removing them:
sw = np.ones(X_train.shape[0])
sw[:10] = 0.

params = dict(
    fit_intercept=True,
    normalize=True,
    alpha=1,
    solver='lsqr'
)
ridge_sw_zero = Ridge(**params).fit(X_train, y_train, sample_weight=sw)
ridge_trimmed = Ridge(**params).fit(X_train[10:], y_train[10:])


def ridge(X, y, sample_weight, alpha, X_test, y_test):
    X_mean = np.average(X, weights=sample_weight, axis=0)
    X_var = (np.average(X**2, weights=sample_weight, axis=0) -
             np.average(X, weights=sample_weight, axis=0)**2)
    X_var *= X.shape[0]  # difference between normalize and standardize
    X_scale = np.sqrt(X_var)
    X = (X - X_mean) / X_scale
    y_mean = np.average(y, weights=sample_weight, axis=0)
    y = y - y_mean

    X *= np.sqrt(sample_weight)[:, None]
    y *= np.sqrt(sample_weight)

    coef = np.linalg.solve(
        X.T @ X + alpha * np.eye(X.shape[1]), X.T @ y
    )
    coef = coef / X_scale
    intercept = y_mean - np.dot(X_mean, coef)
    y_pred = np.dot(X_test, coef) + intercept
    print(r2_score(y_test, y_pred))
    return y_pred


assert_allclose(
    ridge_sw_zero.predict(X_test),
    ridge(X_train, y_train, sw, params['alpha'], X_test, y_test),
    rtol=1e-1
)

assert_allclose(
    ridge_trimmed.predict(X_test),
    ridge(X_train[10:], y_train[10:], sw[10:], params['alpha'], X_test, y_test),
    rtol=1e-1
)

assert_allclose(
    ridge(X_train, y_train, sw, params['alpha'], X_test, y_test),
    ridge(X_train[10:], y_train[10:], sw[10:], params['alpha'], X_test, y_test),
    rtol=1e-1
)

@agramfort
Copy link
Member

agramfort commented Feb 14, 2021

here is what the new code after killing normalize should do. Just solving the problem with l-bfgs

import numpy as np
from numpy.testing import assert_allclose
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from scipy.optimize import fmin_l_bfgs_b, check_grad

np.random.seed(0)

n_samples = 100
X, y = make_regression(n_samples=n_samples, n_features=3)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Check that setting the sw to 0 for the first 10 samples
# is equivalent to removing them:
sw = 10 * np.ones(X_train.shape[0])
sw[:10] = 0.
alpha = 1


def ridge_bfgs(X, y, sample_weight, alpha, X_test):
    def f(w):
        coef, intercept = w[:-1], w[-1]
        return (
            0.5 * np.sum(sample_weight * (y - X @ coef - intercept) ** 2) +
            0.5 * alpha * np.sum(coef ** 2)
        )

    def fprime(w):
        coef, intercept = w[:-1], w[-1]
        residual = y - X @ coef - intercept
        grad = np.empty(X.shape[1] + 1)
        grad[:-1] = -X.T @ (sample_weight * residual) + alpha * coef
        grad[-1] = -np.sum(sample_weight * residual)
        return grad

    dim = X.shape[1] + 1
    print(check_grad(f, fprime, np.random.RandomState(42).randn(dim)))
    w0 = np.zeros(dim)
    w0[-1] = np.average(y, weights=sample_weight)
    w = fmin_l_bfgs_b(f, w0, fprime)[0]
    coef, intercept = w[:-1], w[-1]
    y_pred = X_test @ coef + intercept
    return y_pred


y_pred = ridge_bfgs(X_train, y_train, sw, alpha, X_test)
y_pred_trimmed = ridge_bfgs(X_train[10:], y_train[10:], sw[10:], alpha, X_test)

assert_allclose(
    y_pred,
    y_pred_trimmed,
    rtol=1e-5
)

@ogrisel
Copy link
Member

ogrisel commented Feb 19, 2021

Thanks @agramfort. I don't understand the need for the init of the intercept coef in the l-bfgs solver:

 w0[-1] = np.average(y, weights=sample_weight)

since the problem is convex, initializing the intercept at 0 should work, no? I tried and it seems to work on my machine.

@maikia I think it would be good to include Alex's last minimal snippet as a new test and compare that we can reproduce the same results with our Ridge class.

@agramfort
Copy link
Member

agramfort commented Feb 19, 2021 via email

@ogrisel
Copy link
Member

ogrisel commented Feb 19, 2021

@agramfort in the l-bfgs snippet the output of check_grad is far from machine precision:

0.037224522259787715
0.025841839727800693

but the gradient expression seems correct to me. Do you have any idea why it's the case? dim is 4 in this case, so I would expect this value to be much smaller.

@ogrisel
Copy link
Member

ogrisel commented Feb 19, 2021

I have extended the l-bfgs code snippet to compare with the scikit-learn Ridge(normalize=False) class and we do not get the same results:

https://gist.github.com/ogrisel/ea98add276dc9624fea8e41a5bb6c989

I have also extended (in the same gist) to add a check for the equivalence between fitting with 2 * sw and duplicating the training set and weights and it works as expected for the ridge_lbfgs function (but scikit-learn does not match its predictions).

@ogrisel
Copy link
Member

ogrisel commented Feb 19, 2021

The remaing failure happen in test_solver_consistency. This test is not changed in this PR so this is a regression from master. Here is a summary of the combinations that PASS or FAIL on my local machine:

sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float32-0.1-cholesky-False] FAILED                                         [  1%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float32-0.1-sag-False] FAILED                                              [  2%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float32-0.1-sparse_cg-False] FAILED                                        [  4%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float32-0.1-sparse_cg-True] FAILED                                         [  5%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float32-0.1-lsqr-False] FAILED                                             [  6%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float32-0.1-saga-False] FAILED                                             [  8%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float32-0.1-ridgecv-False] FAILED                                          [  9%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float32-0.1-ridgecv-True] FAILED                                           [ 11%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-40-float32-1.0-cholesky-False] PASSED                                         [ 12%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-40-float32-1.0-sag-False] PASSED                                              [ 13%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-40-float32-1.0-sparse_cg-False] PASSED                                        [ 15%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-40-float32-1.0-sparse_cg-True] PASSED                                         [ 16%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-40-float32-1.0-lsqr-False] PASSED                                             [ 18%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-40-float32-1.0-saga-False] PASSED                                             [ 19%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-40-float32-1.0-ridgecv-False] PASSED                                          [ 20%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-40-float32-1.0-ridgecv-True] PASSED                                           [ 22%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float64-0.2-cholesky-False] PASSED                                         [ 23%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float64-0.2-sag-False] PASSED                                              [ 25%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float64-0.2-sparse_cg-False] PASSED                                        [ 26%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float64-0.2-sparse_cg-True] PASSED                                         [ 27%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float64-0.2-lsqr-False] PASSED                                             [ 29%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float64-0.2-saga-False] PASSED                                             [ 30%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float64-0.2-ridgecv-False] PASSED                                          [ 31%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed0-20-float64-0.2-ridgecv-True] PASSED                                           [ 33%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float32-0.1-cholesky-False] FAILED                                         [ 34%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float32-0.1-sag-False] FAILED                                              [ 36%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float32-0.1-sparse_cg-False] FAILED                                        [ 37%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float32-0.1-sparse_cg-True] FAILED                                         [ 38%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float32-0.1-lsqr-False] FAILED                                             [ 40%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float32-0.1-saga-False] FAILED                                             [ 41%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float32-0.1-ridgecv-False] FAILED                                          [ 43%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float32-0.1-ridgecv-True] FAILED                                           [ 44%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-40-float32-1.0-cholesky-False] PASSED                                         [ 45%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-40-float32-1.0-sag-False] PASSED                                              [ 47%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-40-float32-1.0-sparse_cg-False] PASSED                                        [ 48%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-40-float32-1.0-sparse_cg-True] PASSED                                         [ 50%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-40-float32-1.0-lsqr-False] PASSED                                             [ 51%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-40-float32-1.0-saga-False] PASSED                                             [ 52%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-40-float32-1.0-ridgecv-False] PASSED                                          [ 54%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-40-float32-1.0-ridgecv-True] PASSED                                           [ 55%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float64-0.2-cholesky-False] PASSED                                         [ 56%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float64-0.2-sag-False] PASSED                                              [ 58%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float64-0.2-sparse_cg-False] PASSED                                        [ 59%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float64-0.2-sparse_cg-True] FAILED                                         [ 61%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float64-0.2-lsqr-False] PASSED                                             [ 62%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float64-0.2-saga-False] PASSED                                             [ 63%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float64-0.2-ridgecv-False] PASSED                                          [ 65%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed1-20-float64-0.2-ridgecv-True] PASSED                                           [ 66%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float32-0.1-cholesky-False] FAILED                                         [ 68%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float32-0.1-sag-False] FAILED                                              [ 69%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float32-0.1-sparse_cg-False] FAILED                                        [ 70%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float32-0.1-sparse_cg-True] FAILED                                         [ 72%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float32-0.1-lsqr-False] FAILED                                             [ 73%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float32-0.1-saga-False] FAILED                                             [ 75%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float32-0.1-ridgecv-False] FAILED                                          [ 76%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float32-0.1-ridgecv-True] FAILED                                           [ 77%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-40-float32-1.0-cholesky-False] PASSED                                         [ 79%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-40-float32-1.0-sag-False] PASSED                                              [ 80%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-40-float32-1.0-sparse_cg-False] PASSED                                        [ 81%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-40-float32-1.0-sparse_cg-True] PASSED                                         [ 83%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-40-float32-1.0-lsqr-False] PASSED                                             [ 84%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-40-float32-1.0-saga-False] PASSED                                             [ 86%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-40-float32-1.0-ridgecv-False] PASSED                                          [ 87%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-40-float32-1.0-ridgecv-True] PASSED                                           [ 88%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float64-0.2-cholesky-False] FAILED                                         [ 90%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float64-0.2-sag-False] FAILED                                              [ 91%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float64-0.2-sparse_cg-False] FAILED                                        [ 93%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float64-0.2-sparse_cg-True] FAILED                                         [ 94%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float64-0.2-lsqr-False] FAILED                                             [ 95%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float64-0.2-saga-False] FAILED                                             [ 97%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float64-0.2-ridgecv-False] FAILED                                          [ 98%]
sklearn/linear_model/tests/test_ridge.py::test_solver_consistency[seed2-20-float64-0.2-ridgecv-True] FAILED                                           [100%]

Note that in #19503 I extended the test to also cover the case where we pass normalize=False and scale the data manually instead. Those cases would pass both in the main branch and on this PR so we can ignore that for now and focus on why the change in this PR (#19426) break this consistency check.

I fail to see a pattern in the failures: it can either fail on float32 or float64 (but not always for either), sometimes it's all solver for a given tasks that are not consistent together with the SVD solver. Sometimes it's only the sparse-cg solver...

@ogrisel
Copy link
Member

ogrisel commented Feb 19, 2021

Reading the error message more carefully, it seems that the PR causes some coefs to have nan or -np.inf values... so probably we have nan or np.inf values after the internal preprocessing step.

@ogrisel
Copy link
Member

ogrisel commented Feb 19, 2021

Ok I think I fixed the problem with the handling of near constant features.

@ogrisel
Copy link
Member

ogrisel commented Feb 19, 2021

I re-enabled the test for the failing edge case (non-zero constant feature on sparse data). I think we need to understand what's going on in this case. I am still not sure but at least the error is simpler now.

Then we will need to understand the discrepancy with @agramfort's minimal ridge implementations.

@ogrisel
Copy link
Member

ogrisel commented Feb 22, 2021

Ok I put more details in #19450 (comment) . I can solve the problem here but @maikia is right and we also have it in StandardScaler so it's better to treat it in a separate PR. Let me comment out the edge case.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge this one. The remaining problems are properly documented in the linked issues.

@ogrisel ogrisel merged commit dbd68b2 into scikit-learn:main Feb 22, 2021
@agramfort
Copy link
Member

Thanks @ogrisel !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants