Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Nystrom approximation for approximate kernel methods #3058

Closed
karlnapf opened this issue Mar 10, 2016 · 20 comments
Closed

Implement Nystrom approximation for approximate kernel methods #3058

karlnapf opened this issue Mar 10, 2016 · 20 comments

Comments

@karlnapf
Copy link
Member

This is another important way to approximate kernel methods

See here for a reference implementation

This paper described the original method

This task is closely related to the random fourier features counterpart, #2768
and the incomplete cholesky one #3057

@fredhallgren
Copy link
Contributor

Hi I started looking at this. I have a pretty good idea of how to proceed. I can check in the irc if I have any questions.

@karlnapf
Copy link
Member Author

Great. Let's start with kernel ridge regression and uniform sub-sampling.

We can add leverage scores for non-uniform sub-sampling later.

For now I guess we have to think a bit on how to best implement the nystrom + KRR. This needs some changes to the kernel machine I guess. Maybe we can add a Nystrom kernel machine class. This one can deal with the matrix inversions that are simpler for the approximated models. It would be good to do this high up in the hierarchy....but not 100% sure how to do it myself yet. Looking forward to your input :)

The same is actually true for the random fourier features #2768

@karlnapf
Copy link
Member Author

Both this and Nystrom would start with KRR. KRR should put the method to solve the (K+\lambdaI)^-1 system in a method. A subclass would then overload this method and implement the cheaper matrix inversion for Nystrom or incomplete cholesky

@fredhallgren
Copy link
Contributor

Sounds like a good plan. I'll get back in a few days with hopefully some constructive input :)

@fredhallgren
Copy link
Contributor

You mean we refactor the current 'train_machine_pinv' method to put the matrix inversion in another method, which either calls lapack directly or carries out Nystrom in our subclass?

I wrote a short example implementing KRR. Getting somewhat more familiar with the code base.

Unfortunately I'll miss the reading group tomorrow since I'm moving house :'( Hoping to attend next time :)

@karlnapf
Copy link
Member Author

There currently is a PR on changing the KRR internals a bit. The nystrom would just overload a single method in there.... #3098
Let me know if you need help and good luck with the move

@karlnapf
Copy link
Member Author

this is a very good reference for the Nystrom method.
We should aim to reproduce the experiments (uniform Nystrom part for now) on the same datasets as in the paper. In addition, we should offer to compute the regularization path over the number of sub-samples as in Algorithm1 in the paper. This requires Cholesyk up and downdates, see here

@fredhallgren
Copy link
Contributor

Thanks I'm starting to settle into my new place. I hope you had a nice Easter break.

Great thanks for the pointer I'll overload the new "solve_krr_system" method in my code.

@karlnapf
Copy link
Member Author

karlnapf commented Apr 5, 2016

Some inspiration https://github.com/LCSL/NystromCoRe

@fredhallgren
Copy link
Contributor

Thanks :) I implemented most of the class (called it 'CKrrNystrom' we can change that if you come up with a better name) now I'm going to override the 'solve_krr_system' method. I'll read in the data from the article as well and run some examples comparing with the original krr implementation.

@karlnapf
Copy link
Member Author

karlnapf commented Apr 8, 2016

A few test cases that are useful:

  • m=n should reproduce exactly the KRR result
  • m=0.9n should lead to a very small deviation compared to the true result

@fredhallgren
Copy link
Contributor

Good idea. I could add the first one (or both) to the unit tests.

Should the example I'm writing where I run both krr and nystrom be added to the 'examples' directory? Read that the example setup is about to change soon though.

@karlnapf
Copy link
Member Author

karlnapf commented Apr 9, 2016

Both in separate unit tests is best.
You can add an example, best would be a meta example I think. Check the cookbook readme.
Or you can modify the ipython notebook (or start a new one)

@fredhallgren
Copy link
Contributor

Interesting seminar at gatsby yesterday.

I have finished a first implementation of the solve_krr_system method. It compiles and runs.

My code does the following:

  • Add tau to the original kernel matrix (had to make m_tau protected instead of private in base class)
  • Create random vector through SGVector::random(0,n-1)
  • Sort vector
  • Picks out these columns/(rows) from the original kernel matrix to create K_{n,m} and K_{m,m} matrices (adjusting for the sampling not having been done without replacement).
  • Calculate eigenvalues and eigenvectors of K_{m,m} using Eigen3::SelfAdjointEigenSolver
  • Adjusting the eigenvectors to the approximate eigendecomposition for K_{n,n} as explained in Williams and Seeger
  • Invert the diagonal eigenvalue matrix to solve for alpha

Some pending items:

  • run examples on data from article (add data to shogun-data repository?)
  • various cleanup and check of code
  • write unit tests
  • write documentation
  • something else I haven't thought of

Other comments:

  • Shouldn't CKernelRidgeRegression::solve_krr_system() be virtual?

Happy to come by at some point :)

@karlnapf
Copy link
Member Author

Good news!

Yes, the solve_krr should be virtual! :)

Could you send a PR? I am not sure I 100% get what you are doing there. I think we can do this without an (expensive) Eigenvalue decomposition, but just via LLT solving the K_mm.
What dimension does alpha have in your code? The Nystrom basically restricts optimization over the regressor so the space of functions \sum_i a_i k(x_i, .) where i=1...m -- so only the sub-sampled elements. See http://arxiv.org/abs/1507.04717

The apply methods also have to be adjusted as the support vectors (data that spans the regression function), etc. But we can start from the PR....

Looking forward to it!

@fredhallgren
Copy link
Contributor

Thanks for your comments.

Will do some minor cleanup then send a PR :)

My alpha vector currently has dimenstion n.

@fredhallgren
Copy link
Contributor

Hey I made a PR (don't know if I should connect it to this issue somehow).

To comment on your previous comment I am attempting to create an approximation to the entire original kernel matrix through an adjusted eigendecomposition of the reduced matrix, and then solve the system with all y's.

I haven't looked in detail yet at the other article by Rudi et al but was mostly using Seeger & Williams.

@karlnapf
Copy link
Member Author

Thanks for the patch #3172
Looks good, don't get scared by the number of iteration we have to do -- that is totally normal :)

@fredhallgren
Copy link
Contributor

Thanks for the comments :) I'll look at it tonight.

@karlnapf
Copy link
Member Author

Done for regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants