Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intercept in Kernel Ridge Regression #21840

Open
Turakar opened this issue Nov 30, 2021 · 7 comments
Open

Intercept in Kernel Ridge Regression #21840

Turakar opened this issue Nov 30, 2021 · 7 comments

Comments

@Turakar
Copy link

Turakar commented Nov 30, 2021

Describe the workflow you want to enable

Currently, Kernel Ridge Regression as implemented in sklearn.kernel_ridge.KernelRidge does not support an intercept like e.g. sklearn.svm.SVR does. This can lead to problems if the target value is shifted.

Describe your proposed solution

Implement an intercept into KernelRidge.

Describe alternatives you've considered, if relevant

Document that there is no intercept present for KernelRidge and that, therefore, manual target value shifting is necessary. This is of course the easier path with less work and maintenance.

Additional context

Demo of problems caused by missing intercept

@glemaitre
Copy link
Member

@agramfort Would this be achievable by internally preprocessing X and y as in the linear models before to compute the kernel and then apply the offset at predict?

@TomDLT
Copy link
Member

TomDLT commented Dec 11, 2021

Another option that works with precomputed kernels:

  • fit_transform the kernel with a KernelCenterer, get the attribute kernel_centerer.K_fit_rows_
  • center the target y, keep the mean y_mean
  • fit KernelRidge without intercept on the centered kernel and target, get the attribute kernel_ridge.dual_coef_
  • compute the intercept: intercept = y_mean - kernel_centerer.K_fit_rows_ @ kernel_ridge.dual_coef_
  • et voilà !

@Turakar
Copy link
Author

Turakar commented Dec 11, 2021

@TomDLT can you give a source or explanation for your statement? I have no knowledge of kernel matrix centering as of now. Why is it not sufficient to set intercept = y_mean and train on y - y_mean?

@TomDLT
Copy link
Member

TomDLT commented Dec 14, 2021

Test on the linear-kernel case. (edit: incorrect, see fixed version below)

import numpy as np
from sklearn.kernel_ridge import KernelRidge
from sklearn.linear_model import Ridge
from sklearn.preprocessing import KernelCenterer

fit_intercept = True
need_intercept = True
n_samples = 100
n_features = 10
alpha = 0.1

###############################################################################
# create dataset
X_train = np.random.randn(n_samples, n_features)
X_test = np.random.randn(n_samples, n_features)
y_train = np.random.randn(n_samples)
if need_intercept:
    X_train += 10
    X_test += 10
    y_train += 5
else:
    X_train -= X_train.mean(axis=0)
    y_train -= y_train.mean(axis=0)

###############################################################################
# model A: ridge regression
model_a = Ridge(alpha=alpha, fit_intercept=fit_intercept).fit(X_train, y_train)
y_pred_a = model_a.predict(X_test)

###############################################################################
# model B: kernel ridge regression with a linear kernel

if fit_intercept:
    # precompute kernel
    K_train = X_train @ X_train.T
    centerer = KernelCenterer()
    K_train_centered = centerer.fit_transform(K_train)

    # center target
    y_train_mean = y_train.mean(axis=0)
    y_train_centered = y_train - y_train_mean

    # fit centered model
    model_b = KernelRidge(alpha=alpha,
                          kernel="precomputed").fit(K_train_centered,
                                                    y_train_centered)

    K_test = X_test @ X_train.T
    y_pred_b = model_b.predict(K_test)

    # add intercept
    intercept = y_train_mean - centerer.K_fit_rows_ @ model_b.dual_coef_
    y_pred_b += intercept

else:
    y_pred_b = KernelRidge(alpha=alpha,
                           kernel="linear").fit(X_train,
                                                y_train).predict(X_test)

###############################################################################
np.testing.assert_array_almost_equal(y_pred_a, y_pred_b)

Not sure how to handle the sample weights though.

@shimenghuang
Copy link

shimenghuang commented Apr 23, 2023

It seems the above solution may be incorrect. The adjustment for the intercept should be more than just the term centerer.K_fit_rows_ @ model_b.dual_coef_ if one does not transform K_test, two additional terms were missing (I think this section in sklearn's user guide explains). To demonstrate, I changed the data generating process and the above solution no longer matches ridge regression with intercept. Instead of doing the adjustment this way, one could instead transform K_test by the mean of training data in the feature space and still add back y_mean at prediction. See the code below (adapted from the above solution):

# %%
# library
# ##
import numpy as np
from sklearn.kernel_ridge import KernelRidge
from sklearn.linear_model import Ridge
from sklearn.preprocessing import KernelCenterer

n_samples = 500
n_features = 10
alpha = 0.1

# %%
# generate data
# ##

X_all = np.random.randn(n_samples, n_features)
beta = np.concatenate(
    [np.random.uniform(0, 0, n_features//2), np.random.uniform(3, 10, n_features-n_features//2)])
X_all -= X_all.mean(axis=0)
X_all += np.random.normal(10, 5, n_features)  # randomly shift the mean of X
y_all = X_all @ beta + np.random.randn(n_samples)

n_train = 300
n_test = n_samples-n_train
X_train, y_train = X_all[:n_train, :], y_all[:n_train]
X_test, y_test = X_all[-n_test:, :], y_all[-n_test:]

# %%
# model A: ridge regression
# ##

model_a = Ridge(alpha=alpha, fit_intercept=True).fit(X_train, y_train)
y_pred_a = model_a.predict(X_test)

# %%
# model B: kernel ridge regression with a linear kernel (transform K_test and add back y_mean at prediction)
# ##

# precompute kernel
K_train = X_train @ X_train.T
centerer = KernelCenterer()
K_train_centered = centerer.fit_transform(K_train)

# center target
y_train_mean = y_train.mean(axis=0)
y_train_centered = y_train.copy() - y_train_mean

# fit centered model
model_b = KernelRidge(alpha=alpha,
                      kernel="precomputed").fit(K_train_centered,
                                                y_train_centered)

K_test = X_test @ X_train.T
K_test = centerer.transform(K_test)
y_pred_c = model_b.predict(K_test)

# add intercept
intercept = y_train_mean
y_pred_c += intercept
# check if the predictions are the same

np.testing.assert_array_almost_equal(y_pred_a, y_pred_c)

# %%
# model B: kernel ridge regression with a linear kernel (the above solution)
# ##

# precompute kernel
K_train = X_train @ X_train.T
centerer = KernelCenterer()
K_train_centered = centerer.fit_transform(K_train)

# center target
y_train_mean = y_train.mean(axis=0)
y_train_centered = y_train.copy() - y_train_mean

# fit centered model
model_b = KernelRidge(alpha=alpha,
                      kernel="precomputed").fit(K_train_centered,
                                                y_train_centered)

K_test = X_test @ X_train.T
y_pred_b = model_b.predict(K_test)

# add intercept
intercept = y_train_mean - centerer.K_fit_rows_ @ model_b.dual_coef_
y_pred_b += intercept

# check if the predictions are the same
np.testing.assert_array_almost_equal(y_pred_a, y_pred_b)

# the values in the 3rd and 4th term of K.tilde times dual_coef_ were very small in the original DGP, 
# after increasing the magnitude of X (so then K_train), this no longer works.
print((np.ones((n_test, n_train))/n_train @ K_train @ model_b.dual_coef_)[:10])
print((K_train @ np.ones((n_train, n_train)) /
      n_train @ model_b.dual_coef_)[:10])
print((np.ones((n_test, n_train))/n_train @ K_train @ np.ones((n_train, n_train)) /
       n_train @ model_b.dual_coef_)[:10])

@TomDLT
Copy link
Member

TomDLT commented Apr 26, 2023

I think you are right, the formula intercept = y_train_mean - centerer.K_fit_rows_ @ model_b.dual_coef_ seems to need some additional terms. I don't have time to fix the equation right now, but your fix using centerer.transform(K_test) is even simpler.

Here is the updated version of my script above using this fix.

import numpy as np
from sklearn.kernel_ridge import KernelRidge
from sklearn.linear_model import Ridge
from sklearn.preprocessing import KernelCenterer

fit_intercept = True
need_intercept = True
n_samples = 100
n_features = 10
alpha = 0.1

###############################################################################
# create dataset
X_train = np.random.randn(n_samples, n_features)
X_test = np.random.randn(n_samples, n_features)
y_train = np.random.randn(n_samples)
if need_intercept:
    intercepts = np.random.randn(n_features) * 10
    X_train += intercepts
    X_test += intercepts
    y_train += 5
else:
    X_train -= X_train.mean(axis=0)
    y_train -= y_train.mean(axis=0)

###############################################################################
# model A: ridge regression
model_a = Ridge(alpha=alpha, fit_intercept=fit_intercept).fit(X_train, y_train)
y_pred_a = model_a.predict(X_test)

###############################################################################
# model B: kernel ridge regression with a linear kernel

if fit_intercept:
    # precompute kernel
    K_train = X_train @ X_train.T
    centerer = KernelCenterer()
    K_train_centered = centerer.fit_transform(K_train)

    # center target
    y_train_mean = y_train.mean(axis=0)
    y_train_centered = y_train - y_train_mean

    # fit centered model
    model_b = KernelRidge(alpha=alpha,
                          kernel="precomputed").fit(K_train_centered,
                                                    y_train_centered)

    K_test = X_test @ X_train.T
    K_test_centered = centerer.transform(K_test)
    y_pred_b = model_b.predict(K_test_centered)

    # add intercept
    intercept = y_train_mean
    y_pred_b += intercept

else:
    y_pred_b = KernelRidge(alpha=alpha,
                           kernel="linear").fit(X_train,
                                                y_train).predict(X_test)

###############################################################################
np.testing.assert_array_almost_equal(y_pred_a, y_pred_b, decimal=10)

@cheebit
Copy link

cheebit commented Mar 10, 2024

Based on your comments, maybe the KernelRidge class can be modified like bellow?
@TomDLT @shimenghuang

import numpy as np
from sklearn.kernel_ridge import KernelRidge
from sklearn.preprocessing import KernelCenterer
from sklearn.linear_model._ridge import _solve_cholesky_kernel
from sklearn.utils.validation import check_is_fitted, _check_sample_weight

class KernelRidgeRegression(KernelRidge):

def __init__(
    self,
    alpha=1,
    *,
    kernel="linear",
    gamma=None,
    degree=3,
    coef0=1,
    kernel_params=None,
    fit_intercept=True
):
    
    self.fit_intercept = fit_intercept
    super().__init__(
        alpha=alpha, kernel=kernel, gamma=gamma, degree=degree,
        coef0=coef0, kernel_params=kernel_params
        )
def fit(self, X, y, sample_weight=None):

    self._validate_params()

    # Convert data
    X, y = self._validate_data(
        X, y, accept_sparse=("csr", "csc"), multi_output=True, y_numeric=True
    )
    if sample_weight is not None and not isinstance(sample_weight, float):
        sample_weight = _check_sample_weight(sample_weight, X)
    
    if self.fit_intercept:
        # center y and K
        self.intercept = y.mean(axis=0)
        self.centerer = KernelCenterer()
        K = self.centerer.fit_transform(self._get_kernel(X))
        y = y-self.intercept
    else:
        K = self._get_kernel(X)
    alpha = np.atleast_1d(self.alpha)

    ravel = False
    if len(y.shape) == 1:
        y = y.reshape(-1, 1)
        ravel = True

    copy = self.kernel == "precomputed"
    self.dual_coef_ = _solve_cholesky_kernel(
        K, y, alpha, sample_weight, copy
        )
    if ravel:
        self.dual_coef_ = self.dual_coef_.ravel()

    self.X_fit_ = X

    return self

def predict(self, X):

    check_is_fitted(self)
    X = self._validate_data(X, accept_sparse=("csr", "csc"), reset=False)
    K = self.centerer.transform(self._get_kernel(X, self.X_fit_))
    return np.dot(K, self.dual_coef_)+self.intercept

if name == 'main':

from sklearn.linear_model import Ridge

n_samples = 1000
n_features = 10000
alpha = 1

###############################################################################
# create dataset
X_train = np.random.randn(n_samples, n_features)
X_test = np.random.randn(n_samples, n_features)
y_train = np.random.randn(n_samples).reshape(-1, 1)

###############################################################################
# model A: ridge regression
model_a = Ridge(alpha=alpha).fit(X_train, y_train)
y_pred_a = model_a.predict(X_test)

###############################################################################
# model B: kernel ridge regression
model_b = KernelRidgeRegression(alpha=alpha).fit(X_train, y_train)
y_pred_b = model_b.predict(X_test)

# test equal
###############################################################################
np.testing.assert_array_almost_equal(y_pred_a, y_pred_b, decimal=10)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants