Relevance Vector Machine (RVM) #1513

yedtoss · 2013-01-03T21:55:27Z

RVM is a Bayesian framework for obtaining sparse solutions to regression and classification tasks. It used a model of identical form to SVM ( Support Vector Machine). It solves the following disadvantages of SVM:
-- The number of basis function in SVM grows linearly with the size of the training set
In RVM, we start with 0 basis and incrementally update (add/delete) the set of basis function until convergence.

-- SVM predictions are not probabilistic while RVM's are probabilistic

-- It is necessary in SVM to estimate the margin trade-off parameter 'C' which is not the case in RVM

-- SVM kernel should be positive definite. In RVM we can use any kernel.

It is already implemented in dlib http://dlib.net/dlib/svm/rvm_abstract.h.html and there is also a matlab implementation here http://www.vectoranomaly.com/downloads/downloads.htm. These codes should serve as a guide.
I think it will be a good idea to add it to scikit-learn.

References :
1- Tipping, M. E. and A. C. Faul (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, Jan 3-6.

2- Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244.

The text was updated successfully, but these errors were encountered:

amueller · 2013-01-03T22:15:05Z

I'd have to read up on it again but in general I think RVMs would be a good addition.
dlib is boost licensed which should be compatible. It might not be so easy to wrap because of the boost-heavy coding style, though.
Is the problem optimized using SMO? Is it sensible if we implement SMO?

Gotta grab my Bishop.

amueller · 2013-01-03T22:39:03Z

What is the relation between ARD and RVM? Is RVM just the "basis function" version of ARD?

amueller · 2013-01-03T22:39:53Z

Btw is anyone ever bothered by the fact that the section Generalized Linear Models doesn't contain any generalized models?

amueller · 2013-01-03T22:52:13Z

Ok so we should use the sequential sparse learning algorithm Bishop p. 352 following I guess?
Know yourself out ;)

amueller · 2013-01-03T22:54:26Z

I wonder whether there is a similar method for ARD? That would be cool as the current ARD implementation is quite slow :-/

yedtoss · 2013-01-03T23:06:43Z

No the implementation of RVM definitely does not use SMO. I think SMO is only used for SVM optimization.
Yes we should use the sequential sparse learning algorithm in reference 1 page 7. ( is it in bishop p 352? which one). This algorithm is "simple" enough and we can write it without using dlib. I was thinking about writing it in python and then use cython for optimization. In this case we can take full advantage of the matlab implementation. What do you think?
Anyway, It should be possible to write in C++. But for that we will need a good linear algebra library in C++. I am not sure if by default scikit-learn comes with one.

amueller · 2013-01-03T23:13:04Z

Bishop is "Machine Learning and Pattern Recognition".

The algorithm is probably not completely easy to get right. If you want to start on this, it will definitely be a bigger project. If you are interested and want to implement it any way, go ahead.

Implementing it is also quite a bit different from getting it into scikit-learn. That also involves writing tests, documentation and a user guide - and pretty code.

For a first try you should definitely use just numpy. It uses blas internally and is therefore quite fast.
Speeding up using Cython only makes sense if there is a lot of python overhead. If all the time is spent in the BLAS calls, using Cython doesn't make much sense.

yedtoss · 2013-01-03T23:31:57Z

OK for Cython and numpy. I didn't know bishop talk about RVM.
For the ARD and RVM relationship. I don't know a lot about ARD. But in reference 2 the authors said that RVM is based on ARD : "We term those training vectors associated with the remaining non-zero weights 'relevance' vectors, in deference to the principle of automatic relevance determination which motivates the presented approach" Page 3 (213) line 8.
Anyway how does ARD works?

amueller · 2013-01-05T12:14:45Z

ARD is also explained in Bishops book and in the user guide. It puts a diagonal Gaussian prior on the weights and tries to estimate the variances, which (as I understand it) is the same that RVM does. Is that correct?

agramfort · 2013-01-06T09:55:08Z

I realize that the ref mentioned :

http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf

is not the implementation we use. I think that @vmichel implemented the
Bishop approach while this paper purposes a fixed point approach similar to
a coordinate descent approach. This code definitely needs some love...

amueller · 2013-01-06T13:01:43Z

Thanks @agramfort, I was wondering about that. I didn't go through the details but I thought as the paper was the only reference...

I would very much appreciate it if you could add a comment there citing the chapter of bishop that was used an maybe saying we should implement the other paper instead.

amueller · 2013-01-06T13:02:26Z

(btw of of the slowest things in the test suite right now is fitting ARD on the boston dataset in the common tests)

agramfort · 2013-01-07T10:40:14Z

Thanks @agramfort, I was wondering about that. I didn't go through the details but I thought as the paper was the only reference...

I would very much appreciate it if you could add a comment there citing the chapter of bishop that was used an maybe saying we should implement the other paper instead.

see : #1530

GaelVaroquaux · 2013-01-09T06:53:39Z

Btw is anyone ever bothered by the fact that the section Generalized Linear
Models doesn't contain any generalized models?

It does: logistic regressions.

GaelVaroquaux · 2013-01-09T06:55:55Z

I wonder whether there is a similar method for ARD? That would be cool as the
current ARD implementation is quite slow :-/

I believe that the most promising rapid solver of ARD is to implement the
strategy exposed in:
http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf

agramfort · 2013-01-09T16:39:36Z

I put on a gist a code wrote a while ago.

If somebody wants to work on ARD I thought it could be useful.

https://gist.github.com/4494613

WARNING : it's not much tested and I don't guarantee correctness but it
seems to work.

yedtoss · 2013-01-16T08:36:28Z

@amueller
I have analysed ARD in http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf and I think RVM and ARD want to optimize the same objective function. The difference appears in the method used to optimize this function. In RVM, the authors noticed that most of the weight will be near zero and they used this to derive a "fast" algorithm.

amueller · 2013-01-16T08:40:32Z

That sounds odd. If the objective is the same, you should be able to use the same methods for optimization, right?

yedtoss · 2013-01-16T08:48:59Z

Yes for sure you should, but I guess the authors of RVM used a different optimization strategy to have a faster and more sparse algorithm.

amueller · 2013-01-16T09:01:07Z

@yedtoss I'm pretty sure there is some other difference. As I said before, this might be that RVM work in a feature space or with a kernel or something. Otherwise you could just replace the ARD implementation? That is regression, though, and you want classification, right?

@agramfort do you know anything about the difference of ARD and RVM?

yedtoss · 2013-01-16T10:14:41Z

@amueller
Initially RVM is a regression technique. But the authors presented a way to use it for classification. RVM used any kind of kernel.
What I mean is the log likelihood of ARD (equation 2 of A New View of Automatic Relevance Determination) and of RVM ( equation 7 of Fast marginal likelihood maximisation for sparse Bayesian models) are identical.

amueller · 2013-01-16T10:21:29Z

I guess I'd have to read the papers to know whats going on....

agramfort · 2013-01-16T12:31:40Z

sorry guys I am not much of a Bayesian guy... I don't know well the
subtilties...

larsmans · 2013-03-17T20:53:05Z

RVMs are patented by Microsoft.

amueller · 2013-03-19T14:50:25Z

crazy.

kalxas · 2013-09-06T07:02:55Z

@larsmans @amueller while there is a patent in the US for RVM, the author recommends a GPLv2 Matlab implementation on his web page, so I guess it is ok to implement it...
http://www.miketipping.com/sparsebayes.htm

Best,
Angelos

larsmans · 2013-09-06T07:29:01Z

@kalxas License and patent are quite orthogonal and GPLv2 in particular didn't address software patents. The rights you have with such an implementation are the intersection of the rights granted by the GPL and those granted by the patent holder.

That said, I found out in the meantime that support vector machines are patented by AT&T but the patent was apparently never enforced. If something similar can be proven of RVMs, I might change my mind about them.

0x0L · 2014-10-09T21:23:09Z

@larsmans I wrote a pure numpy/python port of dlib implementation (awfully slow at the moment, I'll try to cythonize it). According to the header, dlib's implem has been around since 2008 and they seem fine with it. Would you consider changing your mind about having RVM in sklearn ?

larsmans · 2014-10-10T08:13:40Z

Let's hear @GaelVaroquaux's opinion on this. The dlib implem doesn't show a thing as long as you can't prove it's widely used without a patent license.

ZaixuCui · 2015-10-19T13:45:46Z

OK，thanks

On Mon, Oct 19, 2015 at 8:29 AM, Gael Varoquaux notifications@github.com
wrote:

Because I use scikit learn to do SVR, elastic-net.
So, if there is a RVR implementation, I will not need to do use matlab
when
do machine learning analysis.

You can use the Python code doing RVR that we pointed to in the
discussion.

—
Reply to this email directly or view it on GitHub
#1513 (comment)
.

glouppe · 2015-10-19T13:52:02Z

Couldnt we implement this as very lightweight class based on our new Gaussian Process implementation? As a far as I understand, RVR is only the name given to a GP with a special kind of kernel.

Though this would require only minimal effort, basing the implementation of RVR on the one of GP may not be the most appropriate thing to do? CC: @jmetzen

AmazaspShumik · 2016-02-05T21:52:19Z

@amueller @GaelVaroquaux @ZaixuCui @yedtoss @jlopezpena

Hi, I implemented fast version of Relevance Vector Machine with scikit-learn API,
so if anybody intends to use it feel free to do it.

Code: https://github.com/AmazaspShumik/sklearn_bayes/blob/master/sklearn_bayes/rvm/fast_rvm.py

Examples: https://github.com/AmazaspShumik/sklearn_bayes/blob/master/ipython_notebooks_tutorials/rvm_ard/rvm_demo.ipynb

There are four implemented classes in the code:
-RegressionARD
-ClassificationARD
-RVC
-RVR

So may be RegressionARD and ClassificationARD can be useful as well

liamnaka · 2016-02-21T04:36:50Z

@AmazaspShumik thank you so much for your implementation. Great work 👍

ZaixuCui · 2016-02-21T08:00:08Z

@AmazaspShumik

Thank you for your efforts very much.
I will definitely try this package.

Wish you all the best.

Zaixu

Blair-Young · 2017-01-31T16:53:03Z

Has anyone had trouble implementing @AmazaspShumik predict_proba method?

tendiarifin · 2017-03-15T14:17:26Z

any one here have library for RVM on php? i dont understand with RVm can explain for me?

tendiarifin · 2017-04-20T05:33:43Z

any one have library RVM for PHP?

a-n-ermakov · 2019-04-22T12:45:08Z

RVMs are patented by Microsoft.

Patent will be expired soon

2019-09-04
Anticipated expiration

a-n-ermakov · 2019-04-22T13:11:08Z

Some pointed in discussion links to @AmazaspShumik implementation are broken, just put them here for people who interested in (and some other implementations):

https://github.com/AmazaspShumik/sklearn_bayes - RVM + some other algs implementation
https://github.com/JamesRitchie/scikit-rvm - simple implementations using scipy
https://github.com/siavashserver/neonrvm - C implementation with Python binding

Also, here is relevant papers collection:
http://www.miketipping.com/sparsebayes.htm

UnixJunkie · 2019-06-10T03:41:02Z

There is a C++ implementation of RVM in there (supplementary material of a paper):
https://pubs.acs.org/doi/abs/10.1021/ci200128w
Precisely here:
https://pubs.acs.org/doi/suppl/10.1021/ci200128w/suppl_file/ci200128w_si_001.zip

themrzmaster · 2019-11-12T21:52:34Z

Microsoft patent has expired. Could we possibly add it to sklearn?

amueller · 2019-11-12T22:01:23Z

It easily clears the standard requirements, so I don't see why not. Maybe having some good/convincing examples might be interesting. scikit-rvm and sklearn_bayes seem unmaintained but might still be useful.
Probably mostly needs a champion now that actually wants to put the work in.

themrzmaster · 2019-11-12T22:08:41Z

On Murphy's book he claims the RVMs performance is really similar to SVMs but has the advantage of being a true probabilistic method so it gives calibrated probabilities as answers. Here https://github.com/probml/pmtk3/blob/master/docs/tutorial/html/tutKernelClassif.html he compared the methods using a small dataset

amueller · 2019-11-12T22:12:19Z

Can you provide a link to the rendered version?
Also, itn's not surprising that it's on a small dataset, given that is is likely to have scalability issues.

themrzmaster · 2019-11-12T22:17:25Z

@amueller http://htmlpreview.github.io/?https://github.com/probml/pmtk3/blob/master/docs/tutorial/html/tutKernelClassif.html

themrzmaster · 2019-11-12T22:42:56Z

I will try to work on the implementation. Any help would be highly appreciated.

UnixJunkie · 2019-11-13T01:13:42Z

IIRC, one advantage of RVM over SVM, is that you can find the optimal C parameter without doing an optimization pass.
I wonder if the original author would be willing to contribute his reference implementation.
Well, it is in Matlab and Mike Tipping is not even on github...

PedroFerreiradaCosta · 2020-01-24T17:36:49Z

Hi @amueller and everybody! We saw this thread and decided to implement a sklearn-compatible version of the RVM (https://github.com/Mind-the-Pineapple/sklearn-rvm). We based a lot of what we did on the JamesRitchie implementation. It would be great if someone would be willing to take a look at it and feedbacks and contributions (@themrzmaster) are welcome :)
We have plans of maintaining this repo, implementing the fast version on version 0.2 and hope that maybe some day this code could be helpful for scikit-learn main repository 😊 🍍

mustuner · 2020-05-03T16:37:59Z

Hi @amueller and everybody! We saw this thread and decided to implement a sklearn-compatible version of the RVM (https://github.com/Mind-the-Pineapple/sklearn-rvm). We based a lot of what we did on the JamesRitchie implementation. It would be great if someone would be willing to take a look at it and feedbacks and contributions (@themrzmaster) are welcome :)
We have plans of maintaining this repo, implementing the fast version on version 0.2 and hope that maybe some day this code could be helpful for scikit-learn main repository 😊 🍍

Hi @PedroFerreiradaCosta
I have tested this scikit-learn api and it seems it is still so slow (seems not responding yet). Do you think the reason could be it is implemented on Windows? Below is what I used:
EMRVC(kernel='rbf', gamma='scale', n_iter_posterior=10, max_iter=500, compute_score=True , verbose=True ) Thanks for your answer @themrzmaster @PedroFerreiradaCosta

PedroFerreiradaCosta · 2020-05-10T14:01:45Z

Hi @mustuner ! Thank you for trying out our API!
The RVM does have a higher computational complexity than for example the SVM (O(M^3)), which might make it slower for cases where you have a large number of basis functions. Either way, there are a number of things that you can do to speed up the process. You can pre-compute the kernel matrix and feed it to our algorithm instead of X, (set kernel='precomputed') or you can also decrease the scale of the alpha_threshold (set by default at 1e5). Please have in mind that this second option might lead to lower accuracy of the model.
Hope this helps! And feel free to open an issue for us to help further.

Best,
Pedro

mustuner · 2020-05-18T13:22:51Z

Hi @mustuner ! Thank you for trying out our API!
The RVM does have a higher computational complexity than for example the SVM (O(M^3)), which might make it slower for cases where you have a large number of basis functions. Either way, there are a number of things that you can do to speed up the process. You can pre-compute the kernel matrix and feed it to our algorithm instead of X, (set kernel='precomputed') or you can also decrease the scale of the alpha_threshold (set by default at 1e5). Please have in mind that this second option might lead to lower accuracy of the model.
Hope this helps! And feel free to open an issue for us to help further.

Best,
Pedro

Thank you @PedroFerreiradaCosta Let me try that

glemaitre · 2022-07-29T09:17:53Z

Since some side compatible packages are available, we can close this issue.

larsmans closed this as completed Mar 19, 2013

amueller mentioned this issue Oct 8, 2016

Bayesian Logistic Regression / Logistic Regression with ARD #6259

Closed

amueller reopened this Nov 12, 2019

thomasjpfan added the module:svm label Apr 23, 2022

cmarmo added Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository Needs Decision - Close Requires decision for closing and removed New Feature labels May 17, 2022

glemaitre closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2022

Relevance Vector Machine (RVM) #1513

Relevance Vector Machine (RVM) #1513

Comments

yedtoss commented Jan 3, 2013

amueller commented Jan 3, 2013

amueller commented Jan 3, 2013

amueller commented Jan 3, 2013

amueller commented Jan 3, 2013

amueller commented Jan 3, 2013

yedtoss commented Jan 3, 2013

amueller commented Jan 3, 2013

yedtoss commented Jan 3, 2013

amueller commented Jan 5, 2013

agramfort commented Jan 6, 2013

amueller commented Jan 6, 2013

amueller commented Jan 6, 2013

agramfort commented Jan 7, 2013

GaelVaroquaux commented Jan 9, 2013

GaelVaroquaux commented Jan 9, 2013

agramfort commented Jan 9, 2013

yedtoss commented Jan 16, 2013

amueller commented Jan 16, 2013

yedtoss commented Jan 16, 2013

amueller commented Jan 16, 2013

yedtoss commented Jan 16, 2013

amueller commented Jan 16, 2013

agramfort commented Jan 16, 2013

larsmans commented Mar 17, 2013

amueller commented Mar 19, 2013

kalxas commented Sep 6, 2013

larsmans commented Sep 6, 2013

0x0L commented Oct 9, 2014

larsmans commented Oct 10, 2014

ZaixuCui commented Oct 19, 2015

glouppe commented Oct 19, 2015

AmazaspShumik commented Feb 5, 2016

liamnaka commented Feb 21, 2016

ZaixuCui commented Feb 21, 2016

Blair-Young commented Jan 31, 2017

tendiarifin commented Mar 15, 2017

tendiarifin commented Apr 20, 2017

a-n-ermakov commented Apr 22, 2019

a-n-ermakov commented Apr 22, 2019 • edited Loading

UnixJunkie commented Jun 10, 2019

themrzmaster commented Nov 12, 2019

amueller commented Nov 12, 2019

themrzmaster commented Nov 12, 2019

amueller commented Nov 12, 2019

themrzmaster commented Nov 12, 2019

themrzmaster commented Nov 12, 2019

UnixJunkie commented Nov 13, 2019 • edited Loading

PedroFerreiradaCosta commented Jan 24, 2020

mustuner commented May 3, 2020

PedroFerreiradaCosta commented May 10, 2020 • edited Loading

mustuner commented May 18, 2020

glemaitre commented Jul 29, 2022

a-n-ermakov commented Apr 22, 2019 •

edited

Loading

UnixJunkie commented Nov 13, 2019 •

edited

Loading

PedroFerreiradaCosta commented May 10, 2020 •

edited

Loading