WIP: Kernel ridge #2165

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
10 participants
@mblondel
Member

mblondel commented Jul 17, 2013

This PR adds kernel support to the 4 classes in the ridge module: Ridge, RidgeClassifier, RidgeCV and RidgeClassifierCV.

The basic functionality is there but the the class hierarchy became a bit messy and probably needs some clean up.

I am not sure yet how to handle fit_intercept=True in the kernel case so I introduced fit_incercept="auto" which sets the value to False when a non-linear kernel is used. Note that the "auto" mode will have to be introduced to fix #1389 anyway.

I rewrote the polynomial interpolation example using the new functionality.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jul 17, 2013

Member

btw I think @npinto was interested in this.

Member

amueller commented Jul 17, 2013

btw I think @npinto was interested in this.

@npinto

This comment has been minimized.

Show comment Hide comment
@npinto

npinto Jul 17, 2013

Contributor

Nice, thanks @amueller for the pointer!

Contributor

npinto commented Jul 17, 2013

Nice, thanks @amueller for the pointer!

+ Parameters
+ ----------
+ K : array-like
+ shape = [n_samples, n_samples]

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Jul 17, 2013

Member

shape ... should be on line above.

@agramfort

agramfort Jul 17, 2013

Member

shape ... should be on line above.

+ Parameters
+ ----------
+ X : numpy array of shape [n_samples, n_features]
+

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Jul 17, 2013

Member

Add comment line like "The data"

@agramfort

agramfort Jul 17, 2013

Member

Add comment line like "The data"

+
+ Parameters
+ ----------
+ X : numpy array of shape [n_samples, n_features]

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Jul 17, 2013

Member

same here. Even if it's now obvious what X is, it's standard to add a one line note for proper doc formatting.

@agramfort

agramfort Jul 17, 2013

Member

same here. Even if it's now obvious what X is, it's standard to add a one line note for proper doc formatting.

@@ -400,10 +527,32 @@ class Ridge(_BaseRidge, RegressorMixin):
tol : float
Precision of the solution.
+ kernel: "linear" | "poly" | "rbf" | "sigmoid" | "cosine" | "precomputed"
+ Kernel.

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Jul 17, 2013

Member

The kernel. Default: "linear".

on one line.

@agramfort

agramfort Jul 17, 2013

Member

The kernel. Default: "linear".

on one line.

@@ -493,10 +647,32 @@ class RidgeClassifier(LinearClassifierMixin, _BaseRidge):
tol : float
Precision of the solution.
+ kernel: "linear" | "poly" | "rbf" | "sigmoid" | "cosine" | "precomputed"
+ Kernel.
+ Default: "linear"

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Jul 17, 2013

Member

same here.

@agramfort

agramfort Jul 17, 2013

Member

same here.

@agramfort

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Jul 17, 2013

Member

besides my nitpicking LGTM

Member

agramfort commented Jul 17, 2013

besides my nitpicking LGTM

+
+ return dual_coef
+
+
class _BaseRidge(six.with_metaclass(ABCMeta, LinearModel)):

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jul 20, 2013

Member

I think it's a bad idea to inherit from LinearModel if your model isn't linear... and the inheritance would have a problem if we decided to add _LearntSelectorMixin to the base classes. There's also the problem of the package name. So maybe you need separate classes for linear and non-linear implementations as for SVM. As messy as that is. :s

@jnothman

jnothman Jul 20, 2013

Member

I think it's a bad idea to inherit from LinearModel if your model isn't linear... and the inheritance would have a problem if we decided to add _LearntSelectorMixin to the base classes. There's also the problem of the package name. So maybe you need separate classes for linear and non-linear implementations as for SVM. As messy as that is. :s

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Jul 20, 2013

Member

Creating kernel-specific classes would increase the number of classes by 4: one for Ridge, RidgeCV, RidgeClassifier and RidgeClassifierCV. This was my main motivation for reusing the same classes, together with the fact that you can grid search the kernel parameter, yet use efficient dedicated solvers for the linear kernel case.

Technically, a kernel induces a feature space in which the model is linear so I'm not too concerned about the name "linear". But we could perhaps create a KernelizableModel base class if it's a concern for others.

If I remember correctly, SVC does have both coef_ and dual_coef_ if the kernel is linear, BTW.

One concern I have is that in the linear case, the current implementation already solves the kernelized problem if it's more efficient (n_samples < n_features). So having kernel-specific classes would be redundant in the case of a linear kernel.

Is _LearntSelectorMixin useful for L2-regularized models such as Ridge or other kernelizable models? (since the primal solution is dense)

@mblondel

mblondel Jul 20, 2013

Member

Creating kernel-specific classes would increase the number of classes by 4: one for Ridge, RidgeCV, RidgeClassifier and RidgeClassifierCV. This was my main motivation for reusing the same classes, together with the fact that you can grid search the kernel parameter, yet use efficient dedicated solvers for the linear kernel case.

Technically, a kernel induces a feature space in which the model is linear so I'm not too concerned about the name "linear". But we could perhaps create a KernelizableModel base class if it's a concern for others.

If I remember correctly, SVC does have both coef_ and dual_coef_ if the kernel is linear, BTW.

One concern I have is that in the linear case, the current implementation already solves the kernelized problem if it's more efficient (n_samples < n_features). So having kernel-specific classes would be redundant in the case of a linear kernel.

Is _LearntSelectorMixin useful for L2-regularized models such as Ridge or other kernelizable models? (since the primal solution is dense)

This comment has been minimized.

Show comment Hide comment
@jnothman

jnothman Jul 20, 2013

Member

I don't have any nice solutions here. It's a matter of working out what
inheritance means.

I don't know if it's useful as such, but _LearntSelectorMixin.transform
handles the non-l1 case for linear models, setting the default threshold to
mean, as it does with feature_importances_

On Sat, Jul 20, 2013 at 11:01 PM, Mathieu Blondel
notifications@github.comwrote:

In sklearn/linear_model/ridge.py:

  • sample_weight : float or numpy array of shape [n_samples]
  •    Individual weights for each sample
    
  • """
  • y, alpha, ravel = _check_params(K, y, alpha)
  • dual_coef = _solve_dense_cholesky_kernel(K, y, alpha,
  •                                         sample_weight=sample_weight)
    
  • if ravel:
  •    # When y was passed as a 1d-array, we flatten the coefficients.
    
  •    dual_coef = dual_coef.ravel()
    
  • return dual_coef

class _BaseRidge(six.with_metaclass(ABCMeta, LinearModel)):

Creating kernel-specific classes would increase the number of classes by
4: one for Ridge, RidgeCV, RidgeClassifier and RidgeClassifierCV. This was
my main motivation for reusing the same classes, together with the fact
that you can grid search the kernel parameter, yet use efficient dedicated
solvers for the linear kernel case.

Technically, a kernel induces a feature space in which the model is linear
so I'm not too concerned about the name "linear". But we could perhaps
create a KernelizableModel base class if it's a concern for others.

If I remember correctly, SVC does have both coef_ and dual_coef_ if the
kernel is linear, BTW.

One concern I have is that in the linear case, the current implementation
already solves the kernelized problem if it's more efficient (n_samples <
n_features). So having kernel-specific classes would be redundant in the
case of a linear kernel.

Is _LearntSelectorMixin useful for L2-regularized models such as Ridge or
other kernelized models?


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2165/files#r5306939
.

@jnothman

jnothman Jul 20, 2013

Member

I don't have any nice solutions here. It's a matter of working out what
inheritance means.

I don't know if it's useful as such, but _LearntSelectorMixin.transform
handles the non-l1 case for linear models, setting the default threshold to
mean, as it does with feature_importances_

On Sat, Jul 20, 2013 at 11:01 PM, Mathieu Blondel
notifications@github.comwrote:

In sklearn/linear_model/ridge.py:

  • sample_weight : float or numpy array of shape [n_samples]
  •    Individual weights for each sample
    
  • """
  • y, alpha, ravel = _check_params(K, y, alpha)
  • dual_coef = _solve_dense_cholesky_kernel(K, y, alpha,
  •                                         sample_weight=sample_weight)
    
  • if ravel:
  •    # When y was passed as a 1d-array, we flatten the coefficients.
    
  •    dual_coef = dual_coef.ravel()
    
  • return dual_coef

class _BaseRidge(six.with_metaclass(ABCMeta, LinearModel)):

Creating kernel-specific classes would increase the number of classes by
4: one for Ridge, RidgeCV, RidgeClassifier and RidgeClassifierCV. This was
my main motivation for reusing the same classes, together with the fact
that you can grid search the kernel parameter, yet use efficient dedicated
solvers for the linear kernel case.

Technically, a kernel induces a feature space in which the model is linear
so I'm not too concerned about the name "linear". But we could perhaps
create a KernelizableModel base class if it's a concern for others.

If I remember correctly, SVC does have both coef_ and dual_coef_ if the
kernel is linear, BTW.

One concern I have is that in the linear case, the current implementation
already solves the kernelized problem if it's more efficient (n_samples <
n_features). So having kernel-specific classes would be redundant in the
case of a linear kernel.

Is _LearntSelectorMixin useful for L2-regularized models such as Ridge or
other kernelized models?


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2165/files#r5306939
.

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jul 21, 2013

Member

I share those concerns although I don't see a nice an easy solution either.

Also to me LinearModel is really about being linear in the original space that is being unable to work well on non-linearly separable data.

Maybe Ridge* should no longer inherit from this base class as from now own, it's no longer a linear model in general. We might want to the ridge family in their own top level package and keep an import alias in the sklearn.linear_model for backward compat.

Maybe @larsmans and @GaelVaroquaux have an opinion on this inheritance issue too?

There is clearly a trade-off between class proliferation and encoding too many assumptions in the inheritance tree.

@ogrisel

ogrisel Jul 21, 2013

Member

I share those concerns although I don't see a nice an easy solution either.

Also to me LinearModel is really about being linear in the original space that is being unable to work well on non-linearly separable data.

Maybe Ridge* should no longer inherit from this base class as from now own, it's no longer a linear model in general. We might want to the ridge family in their own top level package and keep an import alias in the sklearn.linear_model for backward compat.

Maybe @larsmans and @GaelVaroquaux have an opinion on this inheritance issue too?

There is clearly a trade-off between class proliferation and encoding too many assumptions in the inheritance tree.

This comment has been minimized.

Show comment Hide comment
@larsmans

larsmans Jul 21, 2013

Member

The question is whether we want LinearModel to be public or not; I'd say no and since we never documented it, we can rename it _LinearModel and tell users to do duck typing instead of isinstance checks. When the class is private, that means it's there for code reuse, not to express a hierarchy of concepts.

@larsmans

larsmans Jul 21, 2013

Member

The question is whether we want LinearModel to be public or not; I'd say no and since we never documented it, we can rename it _LinearModel and tell users to do duck typing instead of isinstance checks. When the class is private, that means it's there for code reuse, not to express a hierarchy of concepts.

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Aug 19, 2013

Member

for me the semantic is normalize = make columns unit norm
and for regression models fit_intercept = center X and y

@agramfort

agramfort Aug 19, 2013

Member

for me the semantic is normalize = make columns unit norm
and for regression models fit_intercept = center X and y

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Aug 19, 2013

Member

for me the semantic is normalize = make columns unit norm

But this is not the case here either: here it centers by column and scale by the column standard deviation to make each non-zero column scaled to unit variance.

Feature-wise centering + unit variance scaling is often referred to as "standardization": http://en.wikipedia.org/wiki/Standardizing .

That being said I am ok with leaving the normalize keyword of linear regression models as it is. For the non-linear ridge classifier though, I don't know what to expect. Should the non-linear ridge model do the "normalization" in the kernel space instead?

@ogrisel

ogrisel Aug 19, 2013

Member

for me the semantic is normalize = make columns unit norm

But this is not the case here either: here it centers by column and scale by the column standard deviation to make each non-zero column scaled to unit variance.

Feature-wise centering + unit variance scaling is often referred to as "standardization": http://en.wikipedia.org/wiki/Standardizing .

That being said I am ok with leaving the normalize keyword of linear regression models as it is. For the non-linear ridge classifier though, I don't know what to expect. Should the non-linear ridge model do the "normalization" in the kernel space instead?

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Aug 20, 2013

Member

for me the semantic is normalize = make columns unit norm

But this is not the case here either: here it centers by column and scale by the column standard deviation to make each non-zero column scaled to unit variance.

AFAIK Ridge uses center_data in linear_model.base so it devides features by:

X_std = np.sqrt(np.sum(X ** 2, axis=0))

which is not a really an std dev (no div by n_samples).

indeed the scaling is applied before centering if required by fit_intercept

Feature-wise centering + unit variance scaling is often referred to as "standardization": http://en.wikipedia.org/wiki/Standardizing .

we use unit norm with X_std = np.sqrt(np.sum(X ** 2, axis=0))

That being said I am ok with leaving the normalize keyword of linear regression models as it is. For the non-linear ridge classifier though, I don't know what to expect. Should the non-linear ridge model do the "normalization" in the kernel space instead?

yes in kernel space if it's precomputed at some point.

@agramfort

agramfort Aug 20, 2013

Member

for me the semantic is normalize = make columns unit norm

But this is not the case here either: here it centers by column and scale by the column standard deviation to make each non-zero column scaled to unit variance.

AFAIK Ridge uses center_data in linear_model.base so it devides features by:

X_std = np.sqrt(np.sum(X ** 2, axis=0))

which is not a really an std dev (no div by n_samples).

indeed the scaling is applied before centering if required by fit_intercept

Feature-wise centering + unit variance scaling is often referred to as "standardization": http://en.wikipedia.org/wiki/Standardizing .

we use unit norm with X_std = np.sqrt(np.sum(X ** 2, axis=0))

That being said I am ok with leaving the normalize keyword of linear regression models as it is. For the non-linear ridge classifier though, I don't know what to expect. Should the non-linear ridge model do the "normalization" in the kernel space instead?

yes in kernel space if it's precomputed at some point.

This comment has been minimized.

Show comment Hide comment
@flxb

flxb Aug 21, 2013

Contributor

The class should have a

@property
def _pairwise(self):
    return self.kernel == 'precomputed'

to make it compatible with GridSearchCV.

@flxb

flxb Aug 21, 2013

Contributor

The class should have a

@property
def _pairwise(self):
    return self.kernel == 'precomputed'

to make it compatible with GridSearchCV.

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Aug 21, 2013

Member

Indeed, I noticed that too when looking at kernel PCA.

@mblondel

mblondel Aug 21, 2013

Member

Indeed, I noticed that too when looking at kernel PCA.

Merge with master.
Conflicts:
	sklearn/linear_model/ridge.py
+
+ def _get_kernel(self, X, Y=None):
+ if callable(self.kernel):
+ params = self.kernel_params or {}

This comment has been minimized.

Show comment Hide comment
@flxb

flxb Aug 20, 2013

Contributor

If self.kernel is callable here, it get's interpreted as a metric (or the kernel function k(x_i, x_j) that computes one entry in the kernel matrix). It's probably better to interpret it as the function that computes the whole kernel matrix, e.g. change the line to

return self.kernel(X, Y, **kernel_params)

Since most kernels can be computed more effectively than looping through all entries, this would allow more efficient and more general solutions. I'm not very familiar with the other scikit-learn methods but they seem to have a different behavior, too.

@flxb

flxb Aug 20, 2013

Contributor

If self.kernel is callable here, it get's interpreted as a metric (or the kernel function k(x_i, x_j) that computes one entry in the kernel matrix). It's probably better to interpret it as the function that computes the whole kernel matrix, e.g. change the line to

return self.kernel(X, Y, **kernel_params)

Since most kernels can be computed more effectively than looping through all entries, this would allow more efficient and more general solutions. I'm not very familiar with the other scikit-learn methods but they seem to have a different behavior, too.

This comment has been minimized.

Show comment Hide comment
@larsmans

larsmans Aug 20, 2013

Member

This class offloads it kernel computation to sklearn.metrics.pairwise.pairwise_kernels, just like most other kernel models. If you have a more efficient way of computing the kernel for multiple samples, then you can pass metric="precomputed" and pass the Gram matrix as X.

@larsmans

larsmans Aug 20, 2013

Member

This class offloads it kernel computation to sklearn.metrics.pairwise.pairwise_kernels, just like most other kernel models. If you have a more efficient way of computing the kernel for multiple samples, then you can pass metric="precomputed" and pass the Gram matrix as X.

This comment has been minimized.

Show comment Hide comment
@flxb

flxb Aug 20, 2013

Contributor

I guess by Gram matrix you mean the kernel matrix. Using metric="precomputed" would make a grid search over kernel parameters much more complicated though.

In sklearn/svm/base.py _compute_kernel method the behavior is different from this commit. I couldn't find any method that implements it like in this commit. Anyhow, the behavior "should" be consistent for all kernel methods.

@flxb

flxb Aug 20, 2013

Contributor

I guess by Gram matrix you mean the kernel matrix. Using metric="precomputed" would make a grid search over kernel parameters much more complicated though.

In sklearn/svm/base.py _compute_kernel method the behavior is different from this commit. I couldn't find any method that implements it like in this commit. Anyhow, the behavior "should" be consistent for all kernel methods.

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Aug 20, 2013

Member

I would be +1 to remove the for loop here https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/pairwise.py#L861 and call the kernel function directly assuming it returns a matrix.

@mblondel

mblondel Aug 20, 2013

Member

I would be +1 to remove the for loop here https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/pairwise.py#L861 and call the kernel function directly assuming it returns a matrix.

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Aug 20, 2013

Member

@flxb I copy-pasted this code from kernel pca.

@mblondel

mblondel Aug 20, 2013

Member

@flxb I copy-pasted this code from kernel pca.

This comment has been minimized.

Show comment Hide comment
@larsmans

larsmans Aug 20, 2013

Member

Hmm, indeed, SVMs do it differently from all other kernel methods. Arguably, they're doing it right.

@larsmans

larsmans Aug 20, 2013

Member

Hmm, indeed, SVMs do it differently from all other kernel methods. Arguably, they're doing it right.

X, y, X_mean, y_mean, X_std = LinearModel._center_data(
- X, y, self.fit_intercept, self.normalize, self.copy_X,
+ X, y, fit_intercept, self.normalize, self.copy_X,
sample_weight=sample_weight)

This comment has been minimized.

Show comment Hide comment
@flxb

flxb Aug 21, 2013

Contributor

Maybe this should go into a if self.kernel != 'linear'. The n_samples, n_features = X.shape is confusing in case of precomputed kernels, too.

@flxb

flxb Aug 21, 2013

Contributor

Maybe this should go into a if self.kernel != 'linear'. The n_samples, n_features = X.shape is confusing in case of precomputed kernels, too.

This comment has been minimized.

Show comment Hide comment
@larsmans

larsmans Aug 21, 2013

Member

Not if you consider each K[:, j] to be the feature "kernel similarity to point X[j]".

@larsmans

larsmans Aug 21, 2013

Member

Not if you consider each K[:, j] to be the feature "kernel similarity to point X[j]".

This comment has been minimized.

Show comment Hide comment
@flxb

flxb Aug 21, 2013

Contributor

Ok, I agree. It would not harm here anyway. But the if self.kernel != 'linear' should still be there.

@flxb

flxb Aug 21, 2013

Contributor

Ok, I agree. It would not harm here anyway. But the if self.kernel != 'linear' should still be there.

@mblondel

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Aug 21, 2013

Member

@ogrisel @agramfort @larsmans Are you ok with creating 4 new KernelRidge* classes too? One advantage is that it will be consistent with KernelPCA / PCA and Kernel KMeans / KMeans. One disadvantage is that kernel="linear" will be handled inefficiently which is bad if the user wants to cross-validate the kernel.

Member

mblondel commented Aug 21, 2013

@ogrisel @agramfort @larsmans Are you ok with creating 4 new KernelRidge* classes too? One advantage is that it will be consistent with KernelPCA / PCA and Kernel KMeans / KMeans. One disadvantage is that kernel="linear" will be handled inefficiently which is bad if the user wants to cross-validate the kernel.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Aug 21, 2013

Member

@ogrisel @agramfort @larsmans Are you ok with creating 4 new KernelRidge* classes too? One advantage is that it will be consistent with KernelPCA / PCA and Kernel KMeans / KMeans.

I think this choice is probably the better to:

  • solve the LinearModel inheritance issue
  • keep the feature normalization logic consistent with other linear regression classes for the base Ridge class.

One disadvantage is that kernel="linear" will be handled inefficiently which is bad if the user wants to cross-validate the kernel.

I think this should be tackled later by improving our model selection tools to support grid searching for various model classes. We have a similar issue with SVC vs LinearSVC for instance.

In the mean time the user can just run two independent grid search and compare the results. This is not a big deal.

Member

ogrisel commented Aug 21, 2013

@ogrisel @agramfort @larsmans Are you ok with creating 4 new KernelRidge* classes too? One advantage is that it will be consistent with KernelPCA / PCA and Kernel KMeans / KMeans.

I think this choice is probably the better to:

  • solve the LinearModel inheritance issue
  • keep the feature normalization logic consistent with other linear regression classes for the base Ridge class.

One disadvantage is that kernel="linear" will be handled inefficiently which is bad if the user wants to cross-validate the kernel.

I think this should be tackled later by improving our model selection tools to support grid searching for various model classes. We have a similar issue with SVC vs LinearSVC for instance.

In the mean time the user can just run two independent grid search and compare the results. This is not a big deal.

@mblondel

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Aug 21, 2013

Member

I think this should be tackled later by improving our model selection tools to support grid searching for various model classes. We have a similar issue with SVC vs LinearSVC for instance.

Indeed, having grid search tools to, say, automatically choose between LogiscticRegression and LinearSVC, would be useful.

Member

mblondel commented Aug 21, 2013

I think this should be tackled later by improving our model selection tools to support grid searching for various model classes. We have a similar issue with SVC vs LinearSVC for instance.

Indeed, having grid search tools to, say, automatically choose between LogiscticRegression and LinearSVC, would be useful.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Aug 21, 2013

Member

Indeed, having grid search tools to, say, automatically choose between LogiscticRegression and LinearSVC, would be useful.

Note that hyperopt has primitives to define such a model space for complex scikit-learn pipelines. However that comes at the expense of being more complex to setup than our simple GridSearchCV or RandomizedSearchCV API.

Member

ogrisel commented Aug 21, 2013

Indeed, having grid search tools to, say, automatically choose between LogiscticRegression and LinearSVC, would be useful.

Note that hyperopt has primitives to define such a model space for complex scikit-learn pipelines. However that comes at the expense of being more complex to setup than our simple GridSearchCV or RandomizedSearchCV API.

@agramfort

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Aug 21, 2013

Member

Why 4? I would expect KernelRidge and KernelRidgeClassifier. I also would not support linear kernel and raise an exception to recommend plain Ridge

Member

agramfort commented Aug 21, 2013

Why 4? I would expect KernelRidge and KernelRidgeClassifier. I also would not support linear kernel and raise an exception to recommend plain Ridge

@mblondel

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Aug 21, 2013

Member

KernelRidgeCV and KernelRidgeCVClassifier...

Member

mblondel commented Aug 21, 2013

KernelRidgeCV and KernelRidgeCVClassifier...

@agramfort

This comment has been minimized.

Show comment Hide comment
@agramfort

agramfort Aug 21, 2013

Member

My bad

Yes it is fine with me

Member

agramfort commented Aug 21, 2013

My bad

Yes it is fine with me

@larsmans

This comment has been minimized.

Show comment Hide comment
@larsmans

larsmans Aug 21, 2013

Member

Yes, fine with me too. I wouldn't disallow a linear kernel just because we have a faster implementation, though -- just explain it in the docs.

Member

larsmans commented Aug 21, 2013

Yes, fine with me too. I wouldn't disallow a linear kernel just because we have a faster implementation, though -- just explain it in the docs.

@mblondel

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Aug 30, 2013

Member

Hum, where do we put the kernel ridge classes? In linear_model.ridge?!

Member

mblondel commented Aug 30, 2013

Hum, where do we put the kernel ridge classes? In linear_model.ridge?!

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Aug 30, 2013

Member

I would create a new sklearn.kernel_ridge package, or maybe just sklearn.ridge.

Member

ogrisel commented Aug 30, 2013

I would create a new sklearn.kernel_ridge package, or maybe just sklearn.ridge.

@mblondel

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Aug 30, 2013

Member

We could create a kernel_method module and move SVC there as well. LinearSVC would go to linear_model just like LogiscticRegression. Also we can move KernelPCA there too (IIRC, @GaelVaroquaux is not happy with it in decomposition). KernelKMeans could go there too if I get motivated to write documentation and tests. And maybe @amueller's kernel approximation stuff as well.

Member

mblondel commented Aug 30, 2013

We could create a kernel_method module and move SVC there as well. LinearSVC would go to linear_model just like LogiscticRegression. Also we can move KernelPCA there too (IIRC, @GaelVaroquaux is not happy with it in decomposition). KernelKMeans could go there too if I get motivated to write documentation and tests. And maybe @amueller's kernel approximation stuff as well.

@larsmans

This comment has been minimized.

Show comment Hide comment
@larsmans

larsmans Sep 9, 2013

Member

Sounds attractive, but also a pretty big change. Users are going to search for SVMs in the svm module, whether we like it or not.

Member

larsmans commented Sep 9, 2013

Sounds attractive, but also a pretty big change. Users are going to search for SVMs in the svm module, whether we like it or not.

@flxb

This comment has been minimized.

Show comment Hide comment
@flxb

flxb Oct 27, 2013

Contributor

Are there any updates on this?

I think a kernel methods module makes sense. It would organize the code and would help with introducing functionality that is common to all kernel methods.

Besides from that, with thousands of papers being published each year that use kernel ridge regression, it is a real drawback for scikit-learn to not have an explicit implementation.

Contributor

flxb commented Oct 27, 2013

Are there any updates on this?

I think a kernel methods module makes sense. It would organize the code and would help with introducing functionality that is common to all kernel methods.

Besides from that, with thousands of papers being published each year that use kernel ridge regression, it is a real drawback for scikit-learn to not have an explicit implementation.

@mblondel

This comment has been minimized.

Show comment Hide comment
@mblondel

mblondel Oct 29, 2013

Member

You might also be interested in this little code snippet:
https://gist.github.com/mblondel/6230778

I implemented it by following Rasmussen's book on Gaussian Processes then realized I had just reimplemented a kernel ridge, only with a different solver (cholesky). If you don't fit the kernel matrix parameters, kernel ridge and GP seem to be equivalent.

Member

mblondel commented Oct 29, 2013

You might also be interested in this little code snippet:
https://gist.github.com/mblondel/6230778

I implemented it by following Rasmussen's book on Gaussian Processes then realized I had just reimplemented a kernel ridge, only with a different solver (cholesky). If you don't fit the kernel matrix parameters, kernel ridge and GP seem to be equivalent.

@jmetzen jmetzen referenced this pull request Oct 21, 2014

Closed

[WIP] Kernel regression #3780

@eickenberg

This comment has been minimized.

Show comment Hide comment
@eickenberg

eickenberg Nov 6, 2014

Contributor

Any fresh insights here? :)

Contributor

eickenberg commented Nov 6, 2014

Any fresh insights here? :)

@mblondel mblondel closed this Aug 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment