Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relevance Vector Machine (RVM) #1513

Closed
yedtoss opened this issue Jan 3, 2013 · 69 comments
Closed

Relevance Vector Machine (RVM) #1513

yedtoss opened this issue Jan 3, 2013 · 69 comments
Labels
module:svm Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository Needs Decision - Close Requires decision for closing

Comments

@yedtoss
Copy link

yedtoss commented Jan 3, 2013

RVM is a Bayesian framework for obtaining sparse solutions to regression and classification tasks. It used a model of identical form to SVM ( Support Vector Machine). It solves the following disadvantages of SVM:
-- The number of basis function in SVM grows linearly with the size of the training set
In RVM, we start with 0 basis and incrementally update (add/delete) the set of basis function until convergence.

-- SVM predictions are not probabilistic while RVM's are probabilistic

-- It is necessary in SVM to estimate the margin trade-off parameter 'C' which is not the case in RVM

-- SVM kernel should be positive definite. In RVM we can use any kernel.

It is already implemented in dlib http://dlib.net/dlib/svm/rvm_abstract.h.html and there is also a matlab implementation here http://www.vectoranomaly.com/downloads/downloads.htm. These codes should serve as a guide.
I think it will be a good idea to add it to scikit-learn.

References :
1- Tipping, M. E. and A. C. Faul (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, Jan 3-6.

2- Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244.

@amueller
Copy link
Member

amueller commented Jan 3, 2013

I'd have to read up on it again but in general I think RVMs would be a good addition.
dlib is boost licensed which should be compatible. It might not be so easy to wrap because of the boost-heavy coding style, though.
Is the problem optimized using SMO? Is it sensible if we implement SMO?

Gotta grab my Bishop.

@amueller
Copy link
Member

amueller commented Jan 3, 2013

What is the relation between ARD and RVM? Is RVM just the "basis function" version of ARD?

@amueller
Copy link
Member

amueller commented Jan 3, 2013

Btw is anyone ever bothered by the fact that the section Generalized Linear Models doesn't contain any generalized models?

@amueller
Copy link
Member

amueller commented Jan 3, 2013

Ok so we should use the sequential sparse learning algorithm Bishop p. 352 following I guess?
Know yourself out ;)

@amueller
Copy link
Member

amueller commented Jan 3, 2013

I wonder whether there is a similar method for ARD? That would be cool as the current ARD implementation is quite slow :-/

@yedtoss
Copy link
Author

yedtoss commented Jan 3, 2013

No the implementation of RVM definitely does not use SMO. I think SMO is only used for SVM optimization.
Yes we should use the sequential sparse learning algorithm in reference 1 page 7. ( is it in bishop p 352? which one). This algorithm is "simple" enough and we can write it without using dlib. I was thinking about writing it in python and then use cython for optimization. In this case we can take full advantage of the matlab implementation. What do you think?
Anyway, It should be possible to write in C++. But for that we will need a good linear algebra library in C++. I am not sure if by default scikit-learn comes with one.

@amueller
Copy link
Member

amueller commented Jan 3, 2013

Bishop is "Machine Learning and Pattern Recognition".

The algorithm is probably not completely easy to get right. If you want to start on this, it will definitely be a bigger project. If you are interested and want to implement it any way, go ahead.

Implementing it is also quite a bit different from getting it into scikit-learn. That also involves writing tests, documentation and a user guide - and pretty code.

For a first try you should definitely use just numpy. It uses blas internally and is therefore quite fast.
Speeding up using Cython only makes sense if there is a lot of python overhead. If all the time is spent in the BLAS calls, using Cython doesn't make much sense.

@yedtoss
Copy link
Author

yedtoss commented Jan 3, 2013

OK for Cython and numpy. I didn't know bishop talk about RVM.
For the ARD and RVM relationship. I don't know a lot about ARD. But in reference 2 the authors said that RVM is based on ARD : "We term those training vectors associated with the remaining non-zero weights 'relevance' vectors, in deference to the principle of automatic relevance determination which motivates the presented approach" Page 3 (213) line 8.
Anyway how does ARD works?

@amueller
Copy link
Member

amueller commented Jan 5, 2013

ARD is also explained in Bishops book and in the user guide. It puts a diagonal Gaussian prior on the weights and tries to estimate the variances, which (as I understand it) is the same that RVM does. Is that correct?

@agramfort
Copy link
Member

I realize that the ref mentioned :

http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf

is not the implementation we use. I think that @vmichel implemented the
Bishop approach while this paper purposes a fixed point approach similar to
a coordinate descent approach. This code definitely needs some love...

@amueller
Copy link
Member

amueller commented Jan 6, 2013

Thanks @agramfort, I was wondering about that. I didn't go through the details but I thought as the paper was the only reference...

I would very much appreciate it if you could add a comment there citing the chapter of bishop that was used an maybe saying we should implement the other paper instead.

@amueller
Copy link
Member

amueller commented Jan 6, 2013

(btw of of the slowest things in the test suite right now is fitting ARD on the boston dataset in the common tests)

@agramfort
Copy link
Member

Thanks @agramfort, I was wondering about that. I didn't go through the details but I thought as the paper was the only reference...

I would very much appreciate it if you could add a comment there citing the chapter of bishop that was used an maybe saying we should implement the other paper instead.

see : #1530

@GaelVaroquaux
Copy link
Member

Btw is anyone ever bothered by the fact that the section Generalized Linear
Models doesn't contain any generalized models?

It does: logistic regressions.

@GaelVaroquaux
Copy link
Member

I wonder whether there is a similar method for ARD? That would be cool as the
current ARD implementation is quite slow :-/

I believe that the most promising rapid solver of ARD is to implement the
strategy exposed in:
http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf

@agramfort
Copy link
Member

I put on a gist a code wrote a while ago.

If somebody wants to work on ARD I thought it could be useful.

https://gist.github.com/4494613

WARNING : it's not much tested and I don't guarantee correctness but it
seems to work.

@yedtoss
Copy link
Author

yedtoss commented Jan 16, 2013

@amueller
I have analysed ARD in http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf and I think RVM and ARD want to optimize the same objective function. The difference appears in the method used to optimize this function. In RVM, the authors noticed that most of the weight will be near zero and they used this to derive a "fast" algorithm.

@amueller
Copy link
Member

That sounds odd. If the objective is the same, you should be able to use the same methods for optimization, right?

@yedtoss
Copy link
Author

yedtoss commented Jan 16, 2013

Yes for sure you should, but I guess the authors of RVM used a different optimization strategy to have a faster and more sparse algorithm.

@amueller
Copy link
Member

@yedtoss I'm pretty sure there is some other difference. As I said before, this might be that RVM work in a feature space or with a kernel or something. Otherwise you could just replace the ARD implementation? That is regression, though, and you want classification, right?

@agramfort do you know anything about the difference of ARD and RVM?

@yedtoss
Copy link
Author

yedtoss commented Jan 16, 2013

@amueller
Initially RVM is a regression technique. But the authors presented a way to use it for classification. RVM used any kind of kernel.
What I mean is the log likelihood of ARD (equation 2 of A New View of Automatic Relevance Determination) and of RVM ( equation 7 of Fast marginal likelihood maximisation for sparse Bayesian models) are identical.

@amueller
Copy link
Member

I guess I'd have to read the papers to know whats going on....

@agramfort
Copy link
Member

sorry guys I am not much of a Bayesian guy... I don't know well the
subtilties...

@larsmans
Copy link
Member

RVMs are patented by Microsoft.

@amueller
Copy link
Member

crazy.

@kalxas
Copy link

kalxas commented Sep 6, 2013

@larsmans @amueller while there is a patent in the US for RVM, the author recommends a GPLv2 Matlab implementation on his web page, so I guess it is ok to implement it...
http://www.miketipping.com/sparsebayes.htm

Best,
Angelos

@larsmans
Copy link
Member

larsmans commented Sep 6, 2013

@kalxas License and patent are quite orthogonal and GPLv2 in particular didn't address software patents. The rights you have with such an implementation are the intersection of the rights granted by the GPL and those granted by the patent holder.

That said, I found out in the meantime that support vector machines are patented by AT&T but the patent was apparently never enforced. If something similar can be proven of RVMs, I might change my mind about them.

@0x0L
Copy link

0x0L commented Oct 9, 2014

@larsmans I wrote a pure numpy/python port of dlib implementation (awfully slow at the moment, I'll try to cythonize it). According to the header, dlib's implem has been around since 2008 and they seem fine with it. Would you consider changing your mind about having RVM in sklearn ?

@larsmans
Copy link
Member

Let's hear @GaelVaroquaux's opinion on this. The dlib implem doesn't show a thing as long as you can't prove it's widely used without a patent license.

@ZaixuCui
Copy link

OK,thanks

On Mon, Oct 19, 2015 at 8:29 AM, Gael Varoquaux notifications@github.com
wrote:

Because I use scikit learn to do SVR, elastic-net.
So, if there is a RVR implementation, I will not need to do use matlab
when
do machine learning analysis.

You can use the Python code doing RVR that we pointed to in the
discussion.


Reply to this email directly or view it on GitHub
#1513 (comment)
.

@glouppe
Copy link
Contributor

glouppe commented Oct 19, 2015

Couldnt we implement this as very lightweight class based on our new Gaussian Process implementation? As a far as I understand, RVR is only the name given to a GP with a special kind of kernel.

Though this would require only minimal effort, basing the implementation of RVR on the one of GP may not be the most appropriate thing to do? CC: @jmetzen

@AmazaspShumik
Copy link

@amueller @GaelVaroquaux @ZaixuCui @yedtoss @jlopezpena

Hi, I implemented fast version of Relevance Vector Machine with scikit-learn API,
so if anybody intends to use it feel free to do it.

Code: https://github.com/AmazaspShumik/sklearn_bayes/blob/master/sklearn_bayes/rvm/fast_rvm.py

Examples: https://github.com/AmazaspShumik/sklearn_bayes/blob/master/ipython_notebooks_tutorials/rvm_ard/rvm_demo.ipynb

There are four implemented classes in the code:
-RegressionARD
-ClassificationARD
-RVC
-RVR

So may be RegressionARD and ClassificationARD can be useful as well

@liamnaka
Copy link

@AmazaspShumik thank you so much for your implementation. Great work 👍

@ZaixuCui
Copy link

@AmazaspShumik

Thank you for your efforts very much.
I will definitely try this package.

Wish you all the best.

Zaixu

@Blair-Young
Copy link

Has anyone had trouble implementing @AmazaspShumik predict_proba method?

@tendiarifin
Copy link

any one here have library for RVM on php? i dont understand with RVm can explain for me?

@tendiarifin
Copy link

any one have library RVM for PHP?

@a-n-ermakov
Copy link

RVMs are patented by Microsoft.

Patent will be expired soon

2019-09-04
Anticipated expiration

@a-n-ermakov
Copy link

a-n-ermakov commented Apr 22, 2019

Some pointed in discussion links to @AmazaspShumik implementation are broken, just put them here for people who interested in (and some other implementations):

https://github.com/AmazaspShumik/sklearn_bayes - RVM + some other algs implementation
https://github.com/JamesRitchie/scikit-rvm - simple implementations using scipy
https://github.com/siavashserver/neonrvm - C implementation with Python binding

Also, here is relevant papers collection:
http://www.miketipping.com/sparsebayes.htm

@UnixJunkie
Copy link
Contributor

There is a C++ implementation of RVM in there (supplementary material of a paper):
https://pubs.acs.org/doi/abs/10.1021/ci200128w
Precisely here:
https://pubs.acs.org/doi/suppl/10.1021/ci200128w/suppl_file/ci200128w_si_001.zip

@themrzmaster
Copy link

Microsoft patent has expired. Could we possibly add it to sklearn?

@amueller
Copy link
Member

It easily clears the standard requirements, so I don't see why not. Maybe having some good/convincing examples might be interesting. scikit-rvm and sklearn_bayes seem unmaintained but might still be useful.
Probably mostly needs a champion now that actually wants to put the work in.

@themrzmaster
Copy link

On Murphy's book he claims the RVMs performance is really similar to SVMs but has the advantage of being a true probabilistic method so it gives calibrated probabilities as answers. Here https://github.com/probml/pmtk3/blob/master/docs/tutorial/html/tutKernelClassif.html he compared the methods using a small dataset

@amueller
Copy link
Member

Can you provide a link to the rendered version?
Also, itn's not surprising that it's on a small dataset, given that is is likely to have scalability issues.

@amueller amueller reopened this Nov 12, 2019
@themrzmaster
Copy link

I will try to work on the implementation. Any help would be highly appreciated.

@UnixJunkie
Copy link
Contributor

UnixJunkie commented Nov 13, 2019

IIRC, one advantage of RVM over SVM, is that you can find the optimal C parameter without doing an optimization pass.
I wonder if the original author would be willing to contribute his reference implementation.
Well, it is in Matlab and Mike Tipping is not even on github...

@PedroFerreiradaCosta
Copy link

Hi @amueller and everybody! We saw this thread and decided to implement a sklearn-compatible version of the RVM (https://github.com/Mind-the-Pineapple/sklearn-rvm). We based a lot of what we did on the JamesRitchie implementation. It would be great if someone would be willing to take a look at it and feedbacks and contributions (@themrzmaster) are welcome :)
We have plans of maintaining this repo, implementing the fast version on version 0.2 and hope that maybe some day this code could be helpful for scikit-learn main repository 😊 🍍

@mustuner
Copy link

mustuner commented May 3, 2020

Hi @amueller and everybody! We saw this thread and decided to implement a sklearn-compatible version of the RVM (https://github.com/Mind-the-Pineapple/sklearn-rvm). We based a lot of what we did on the JamesRitchie implementation. It would be great if someone would be willing to take a look at it and feedbacks and contributions (@themrzmaster) are welcome :)
We have plans of maintaining this repo, implementing the fast version on version 0.2 and hope that maybe some day this code could be helpful for scikit-learn main repository 😊 🍍

Hi @PedroFerreiradaCosta
I have tested this scikit-learn api and it seems it is still so slow (seems not responding yet). Do you think the reason could be it is implemented on Windows? Below is what I used:
EMRVC(kernel='rbf', gamma='scale', n_iter_posterior=10, max_iter=500, compute_score=True , verbose=True ) Thanks for your answer @themrzmaster @PedroFerreiradaCosta

@PedroFerreiradaCosta
Copy link

PedroFerreiradaCosta commented May 10, 2020

Hi @mustuner ! Thank you for trying out our API!
The RVM does have a higher computational complexity than for example the SVM (O(M^3)), which might make it slower for cases where you have a large number of basis functions. Either way, there are a number of things that you can do to speed up the process. You can pre-compute the kernel matrix and feed it to our algorithm instead of X, (set kernel='precomputed') or you can also decrease the scale of the alpha_threshold (set by default at 1e5). Please have in mind that this second option might lead to lower accuracy of the model.
Hope this helps! And feel free to open an issue for us to help further.

Best,
Pedro

@mustuner
Copy link

Hi @mustuner ! Thank you for trying out our API!
The RVM does have a higher computational complexity than for example the SVM (O(M^3)), which might make it slower for cases where you have a large number of basis functions. Either way, there are a number of things that you can do to speed up the process. You can pre-compute the kernel matrix and feed it to our algorithm instead of X, (set kernel='precomputed') or you can also decrease the scale of the alpha_threshold (set by default at 1e5). Please have in mind that this second option might lead to lower accuracy of the model.
Hope this helps! And feel free to open an issue for us to help further.

Best,
Pedro

Thank you @PedroFerreiradaCosta Let me try that

@cmarmo cmarmo added Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository Needs Decision - Close Requires decision for closing and removed New Feature labels May 17, 2022
@glemaitre
Copy link
Member

Since some side compatible packages are available, we can close this issue.

@glemaitre glemaitre closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:svm Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository Needs Decision - Close Requires decision for closing
Projects
None yet
Development

No branches or pull requests