New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated K-means clustering for Nystroem #3126
Conversation
…mplement it in plot_kernel_approximation example to show difference
Hi @nateyoder. Cheers, |
doc/modules/kernel_approximation.rst
Outdated
@@ -35,9 +35,15 @@ Nystroem Method for Kernel Approximation | |||
The Nystroem method, as implemented in :class:`Nystroem` is a general method | |||
for low-rank approximations of kernels. It achieves this by essentially subsampling | |||
the data on which the kernel is evaluated. | |||
The subsampling methodology used to generate the approximate kernel is specified by | |||
the parameter ``basis_method`` which can either be ``random`` or ``clustered``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it kmeans
instead of clustered
, to be more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also basis_sampling
or basis_selection
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestions. They are incorporated in the new version.
Sorry I accidentally deleted the branch and I think doing this closed the issue. Sorry!! |
Have you tried it on a different dataset? This above is digits, right? Maybe try MNIST? Or is there some other dataset where RBF works well? |
I think this should help but I also think we should make sure that it actually does ;) |
You could also try on Olivetti faces with RandomizedPCA preprocessing: http://scikit-learn.org/stable/auto_examples/applications/face_recognition.html To try on a bigger dataset you can use LFW instead of Olivetti. |
Sounds great guys thanks for the suggestions. I'll give them a shot this week and post the results. Also I noticed my build failed but it failed because of errors in OrthogonalMatchingPursuitCV. Do you guys know if this an intermitant test or something I should look into? |
The travis failure is unrelated, you can ignore it. |
Sorry for the long layoff guys. Finally got a chance to run amueller's MINST example with k-means and random. As the graph shows k-means does show some minor improvement but nothing big. However, since it seems to almost always be a little better in the examples I tried it seems like it might still be worth adding it? I briefly tried on Olivetti but I think because of the limited amount of faces saw a lot of variance in the output and didn't really get anything useful other than k-means definitely isn't a silver bullet. I didn't have time to look into LFW. |
It seems consistent from the little I have seen thus far - I will try to run some tests as well. Looks pretty nice! |
Thanks for the bench on mnist. It would be great to run the same on lfw and |
At first these results seemed at odds to me with the MNIST line in Table 2 of Kumar, Mohri and Talwalkar, Sampling Methods for the Nyström Method, JMLR 2012. But actually, that table is showing the kernel reconstruction "accuracy" , where K_k is the optimal rank-k reconstruction (the truncated SVD), and \tilde{K}_k is the rank-k Nyström approximation. I guess the kernel isn't as well-approximated by the uniform reconstruction, but it's still good enough to do classification with. Might be good to make sure that's the case. Also, it might be better to use kmeans++ initialization rather than random; did you try that? |
Brief update. I ran MINST again to compare "better" clustering with k-means [KMeans++ initialization, max_iter=300, and n_init=10] vs k-means as suggested in the literature ['random' initialization, max_iter=5, n_init=1] vs random Nystroem. As shown below the much more time intensive clustering has almost no impact on the classification performance while significantly increasing the time needed to train the model. I also did the same on LFW and the results are below. In this case k-means appears to little to no consistent improvement over random selection. If you are interested I used the parameters found in http://nbviewer.ipython.org/github/jakevdp/sklearn_scipy2013/blob/master/rendered_notebooks/05.1_application_to_face_recognition.ipynb other than doing my own RBF grid search to find the optimal RBF parameters. I'll try to do the covertype test later this week if I get time and you guys think it is still needed. |
Can you please rebase your branch on master and try with MinibatchKMeans? This might be master to converge while giving good enough centroids. |
hm... this actually looks good. @mth4saurabh any chance you are still interested in working on this? |
@amueller : sure, would love to; will start on monday. |
Because I wanted to try K-means clustering as the basis for Nystroem approximation and it appeared as though pull request #2591 might be stalled I created a slightly modified version. I also tried to address @amueller comment about the effectiveness of the method by including it in the plot_kernel_approximation example and @dougalsutherland comment concerning the possible singularity of the sub-sampled kernel matrix using the same approach as scipy does in pinv2.
Since it is my first commit to the project (hopefully the first of many) any feedback or suggestions you have would be appreciated.