Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom kernels to SpectralClustering #1791

Closed
larsmans opened this issue Mar 18, 2013 · 13 comments
Closed

Add custom kernels to SpectralClustering #1791

larsmans opened this issue Mar 18, 2013 · 13 comments
Labels
Easy Well-defined and straightforward way to resolve Enhancement

Comments

@larsmans
Copy link
Member

SpectralClustering supports the RBF kernel, but it could benefit from the generic kernel support in sklearn.metric.pairwise_kernels.

KernelPCA and sklearn.kernel_approximations.Nystroem already do this and provide an example API (kernel_params for callables, custom attributes for the built-in kernels).

@rolisz
Copy link
Contributor

rolisz commented Mar 19, 2013

Can I try to tackle this? I'd like to participate in GSOC, and this seems like an easy enough first issue to fix

@larsmans
Copy link
Member Author

Be our guest; I don't do it myself to leave it to potential GSOC participants :)

@rolisz
Copy link
Contributor

rolisz commented Mar 19, 2013

Okay. I'll try to do fix this.

I have a question though. Both KernelPCA and sklearn.kernel_approximations.Nystroem use the constructor argument kernel for setting the kernel type, but SpectralClustering uses the affinity to choose between rbf, nearest_neighbors or precomputed. Should I change affinity to kernel or not? Or add another parameter?

@larsmans
Copy link
Member Author

I'm not an expert in kernel methods, Gaël is not available for scikit-learn stuff right now and changing a parameter name requires following a deprecation procedure, so I think the safe option is to not change the name and pass affinity to pairwise_kernels if it is not equal to nearest_neighbors.

@amueller
Copy link
Member

Affinity is correct in this case. SpectralClustering does not need a kernel. For example the nearest neighbor graph works fine.

@larsmans
Copy link
Member Author

The difference being that an affinity need not be symmetric, right? But then a callable needs to be handled specially (e.g. by simply disallowing it) because pairwise_kernels assumes symmetry to reduce the number of calls.

@amueller
Copy link
Member

The difference is that a kernel needs to be positive definite.

@amueller
Copy link
Member

symmetry should be assumed in both cases, I think.

@larsmans
Copy link
Member Author

A kernel needs to be pos def for an SVM, but is that a requirement for calling something a kernel? E.g. an RVM can handle "non-Mercer" kernels (not sure what properties they should still have).

You're right about the symmetry, though, this is actually enforced inside the algorithm.

@amueller
Copy link
Member

Mercer's condition is just positive definiteness, right?
In machine learning, I think it is understood that "kernel" means "mercer kernel".

There are also kernels in density estimation and image processing so the word is pretty overloaded.

@larsmans
Copy link
Member Author

Apparently so. Let's stick to "affinity".

The question then is what happens to kernel_params?

@rolisz
Copy link
Contributor

rolisz commented Mar 20, 2013

@larsmans I fixed the things you mentioned about the commit, but I have a problem with some of the tests. For most of the kernels (all except poly, rbf and linear) complained about negative values in the arrays, so I moved the cluster centers away from the negative region, and I also test for equality of shapes, instead of the adjusted_rand_score. Is this okay?

And additive_chi2 still gives an error about ValueError: Array contains NaN or infinity. What should I do with it?

@larsmans
Copy link
Member Author

Let's discuss at the PR, not here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve Enhancement
Projects
None yet
Development

No branches or pull requests

3 participants