SpectralClustering supports the RBF kernel, but it could benefit from the generic kernel support in sklearn.metric.pairwise_kernels.
KernelPCA and sklearn.kernel_approximations.Nystroem already do this and provide an example API (kernel_params for callables, custom attributes for the built-in kernels).
Can I try to tackle this? I'd like to participate in GSOC, and this seems like an easy enough first issue to fix
Be our guest; I don't do it myself to leave it to potential GSOC participants :)
Okay. I'll try to do fix this.
I have a question though. Both KernelPCA and sklearn.kernel_approximations.Nystroem use the constructor argument kernel for setting the kernel type, but SpectralClustering uses the affinity to choose between rbf, nearest_neighbors or precomputed. Should I change affinity to kernel or not? Or add another parameter?
I'm not an expert in kernel methods, Gaël is not available for scikit-learn stuff right now and changing a parameter name requires following a deprecation procedure, so I think the safe option is to not change the name and pass affinity to pairwise_kernels if it is not equal to nearest_neighbors.
Affinity is correct in this case. SpectralClustering does not need a kernel. For example the nearest neighbor graph works fine.
The difference being that an affinity need not be symmetric, right? But then a callable needs to be handled specially (e.g. by simply disallowing it) because pairwise_kernels assumes symmetry to reduce the number of calls.
The difference is that a kernel needs to be positive definite.
symmetry should be assumed in both cases, I think.
A kernel needs to be pos def for an SVM, but is that a requirement for calling something a kernel? E.g. an RVM can handle "non-Mercer" kernels (not sure what properties they should still have).
You're right about the symmetry, though, this is actually enforced inside the algorithm.
Mercer's condition is just positive definiteness, right?
In machine learning, I think it is understood that "kernel" means "mercer kernel".
There are also kernels in density estimation and image processing so the word is pretty overloaded.
Apparently so. Let's stick to "affinity".
The question then is what happens to kernel_params?
@larsmans I fixed the things you mentioned about the commit, but I have a problem with some of the tests. For most of the kernels (all except poly, rbf and linear) complained about negative values in the arrays, so I moved the cluster centers away from the negative region, and I also test for equality of shapes, instead of the adjusted_rand_score. Is this okay?
And additive_chi2 still gives an error about ValueError: Array contains NaN or infinity. What should I do with it?
Let's discuss at the PR, not here.
ENH Added custom kernels to SpectralClustering