New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relevance Vector Machine (RVM) #1513

Closed
yedtoss opened this Issue Jan 3, 2013 · 54 comments

Comments

Projects
None yet
@yedtoss

yedtoss commented Jan 3, 2013

RVM is a Bayesian framework for obtaining sparse solutions to regression and classification tasks. It used a model of identical form to SVM ( Support Vector Machine). It solves the following disadvantages of SVM:
-- The number of basis function in SVM grows linearly with the size of the training set
In RVM, we start with 0 basis and incrementally update (add/delete) the set of basis function until convergence.

-- SVM predictions are not probabilistic while RVM's are probabilistic

-- It is necessary in SVM to estimate the margin trade-off parameter 'C' which is not the case in RVM

-- SVM kernel should be positive definite. In RVM we can use any kernel.

It is already implemented in dlib http://dlib.net/dlib/svm/rvm_abstract.h.html and there is also a matlab implementation here http://www.vectoranomaly.com/downloads/downloads.htm. These codes should serve as a guide.
I think it will be a good idea to add it to scikit-learn.

References :
1- Tipping, M. E. and A. C. Faul (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, Jan 3-6.

2- Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244.

@amueller

This comment has been minimized.

Member

amueller commented Jan 3, 2013

I'd have to read up on it again but in general I think RVMs would be a good addition.
dlib is boost licensed which should be compatible. It might not be so easy to wrap because of the boost-heavy coding style, though.
Is the problem optimized using SMO? Is it sensible if we implement SMO?

Gotta grab my Bishop.

@amueller

This comment has been minimized.

Member

amueller commented Jan 3, 2013

What is the relation between ARD and RVM? Is RVM just the "basis function" version of ARD?

@amueller

This comment has been minimized.

Member

amueller commented Jan 3, 2013

Btw is anyone ever bothered by the fact that the section Generalized Linear Models doesn't contain any generalized models?

@amueller

This comment has been minimized.

Member

amueller commented Jan 3, 2013

Ok so we should use the sequential sparse learning algorithm Bishop p. 352 following I guess?
Know yourself out ;)

@amueller

This comment has been minimized.

Member

amueller commented Jan 3, 2013

I wonder whether there is a similar method for ARD? That would be cool as the current ARD implementation is quite slow :-/

@yedtoss

This comment has been minimized.

yedtoss commented Jan 3, 2013

No the implementation of RVM definitely does not use SMO. I think SMO is only used for SVM optimization.
Yes we should use the sequential sparse learning algorithm in reference 1 page 7. ( is it in bishop p 352? which one). This algorithm is "simple" enough and we can write it without using dlib. I was thinking about writing it in python and then use cython for optimization. In this case we can take full advantage of the matlab implementation. What do you think?
Anyway, It should be possible to write in C++. But for that we will need a good linear algebra library in C++. I am not sure if by default scikit-learn comes with one.

@amueller

This comment has been minimized.

Member

amueller commented Jan 3, 2013

Bishop is "Machine Learning and Pattern Recognition".

The algorithm is probably not completely easy to get right. If you want to start on this, it will definitely be a bigger project. If you are interested and want to implement it any way, go ahead.

Implementing it is also quite a bit different from getting it into scikit-learn. That also involves writing tests, documentation and a user guide - and pretty code.

For a first try you should definitely use just numpy. It uses blas internally and is therefore quite fast.
Speeding up using Cython only makes sense if there is a lot of python overhead. If all the time is spent in the BLAS calls, using Cython doesn't make much sense.

@yedtoss

This comment has been minimized.

yedtoss commented Jan 3, 2013

OK for Cython and numpy. I didn't know bishop talk about RVM.
For the ARD and RVM relationship. I don't know a lot about ARD. But in reference 2 the authors said that RVM is based on ARD : "We term those training vectors associated with the remaining non-zero weights 'relevance' vectors, in deference to the principle of automatic relevance determination which motivates the presented approach" Page 3 (213) line 8.
Anyway how does ARD works?

@amueller

This comment has been minimized.

Member

amueller commented Jan 5, 2013

ARD is also explained in Bishops book and in the user guide. It puts a diagonal Gaussian prior on the weights and tries to estimate the variances, which (as I understand it) is the same that RVM does. Is that correct?

@agramfort

This comment has been minimized.

Member

agramfort commented Jan 6, 2013

I realize that the ref mentioned :

http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf

is not the implementation we use. I think that @vmichel implemented the
Bishop approach while this paper purposes a fixed point approach similar to
a coordinate descent approach. This code definitely needs some love...

@amueller

This comment has been minimized.

Member

amueller commented Jan 6, 2013

Thanks @agramfort, I was wondering about that. I didn't go through the details but I thought as the paper was the only reference...

I would very much appreciate it if you could add a comment there citing the chapter of bishop that was used an maybe saying we should implement the other paper instead.

@amueller

This comment has been minimized.

Member

amueller commented Jan 6, 2013

(btw of of the slowest things in the test suite right now is fitting ARD on the boston dataset in the common tests)

@agramfort

This comment has been minimized.

Member

agramfort commented Jan 7, 2013

Thanks @agramfort, I was wondering about that. I didn't go through the details but I thought as the paper was the only reference...

I would very much appreciate it if you could add a comment there citing the chapter of bishop that was used an maybe saying we should implement the other paper instead.

see : #1530

@GaelVaroquaux

This comment has been minimized.

Member

GaelVaroquaux commented Jan 9, 2013

Btw is anyone ever bothered by the fact that the section Generalized Linear
Models doesn't contain any generalized models?

It does: logistic regressions.

@GaelVaroquaux

This comment has been minimized.

Member

GaelVaroquaux commented Jan 9, 2013

I wonder whether there is a similar method for ARD? That would be cool as the
current ARD implementation is quite slow :-/

I believe that the most promising rapid solver of ARD is to implement the
strategy exposed in:
http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf

@agramfort

This comment has been minimized.

Member

agramfort commented Jan 9, 2013

I put on a gist a code wrote a while ago.

If somebody wants to work on ARD I thought it could be useful.

https://gist.github.com/4494613

WARNING : it's not much tested and I don't guarantee correctness but it
seems to work.

@yedtoss

This comment has been minimized.

yedtoss commented Jan 16, 2013

@amueller
I have analysed ARD in http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf and I think RVM and ARD want to optimize the same objective function. The difference appears in the method used to optimize this function. In RVM, the authors noticed that most of the weight will be near zero and they used this to derive a "fast" algorithm.

@amueller

This comment has been minimized.

Member

amueller commented Jan 16, 2013

That sounds odd. If the objective is the same, you should be able to use the same methods for optimization, right?

@yedtoss

This comment has been minimized.

yedtoss commented Jan 16, 2013

Yes for sure you should, but I guess the authors of RVM used a different optimization strategy to have a faster and more sparse algorithm.

@amueller

This comment has been minimized.

Member

amueller commented Jan 16, 2013

@yedtoss I'm pretty sure there is some other difference. As I said before, this might be that RVM work in a feature space or with a kernel or something. Otherwise you could just replace the ARD implementation? That is regression, though, and you want classification, right?

@agramfort do you know anything about the difference of ARD and RVM?

@yedtoss

This comment has been minimized.

yedtoss commented Jan 16, 2013

@amueller
Initially RVM is a regression technique. But the authors presented a way to use it for classification. RVM used any kind of kernel.
What I mean is the log likelihood of ARD (equation 2 of A New View of Automatic Relevance Determination) and of RVM ( equation 7 of Fast marginal likelihood maximisation for sparse Bayesian models) are identical.

@amueller

This comment has been minimized.

Member

amueller commented Jan 16, 2013

I guess I'd have to read the papers to know whats going on....

@agramfort

This comment has been minimized.

Member

agramfort commented Jan 16, 2013

sorry guys I am not much of a Bayesian guy... I don't know well the
subtilties...

@larsmans

This comment has been minimized.

Member

larsmans commented Mar 17, 2013

RVMs are patented by Microsoft.

@amueller

This comment has been minimized.

Member

amueller commented Mar 19, 2013

crazy.

@larsmans larsmans closed this Mar 19, 2013

@kalxas

This comment has been minimized.

kalxas commented Sep 6, 2013

@larsmans @amueller while there is a patent in the US for RVM, the author recommends a GPLv2 Matlab implementation on his web page, so I guess it is ok to implement it...
http://www.miketipping.com/sparsebayes.htm

Best,
Angelos

@larsmans

This comment has been minimized.

Member

larsmans commented Sep 6, 2013

@kalxas License and patent are quite orthogonal and GPLv2 in particular didn't address software patents. The rights you have with such an implementation are the intersection of the rights granted by the GPL and those granted by the patent holder.

That said, I found out in the meantime that support vector machines are patented by AT&T but the patent was apparently never enforced. If something similar can be proven of RVMs, I might change my mind about them.

@0x0L

This comment has been minimized.

Contributor

0x0L commented Oct 9, 2014

@larsmans I wrote a pure numpy/python port of dlib implementation (awfully slow at the moment, I'll try to cythonize it). According to the header, dlib's implem has been around since 2008 and they seem fine with it. Would you consider changing your mind about having RVM in sklearn ?

@larsmans

This comment has been minimized.

Member

larsmans commented Oct 10, 2014

Let's hear @GaelVaroquaux's opinion on this. The dlib implem doesn't show a thing as long as you can't prove it's widely used without a patent license.

@jlopezpena

This comment has been minimized.

jlopezpena commented Mar 30, 2015

Are there any updates on this topic? I've been looking into RVMs lately and was wondering if there was any code around there...

@amueller

This comment has been minimized.

Member

amueller commented Mar 30, 2015

I don't think anyone has tried for a fast implementation, and we are still unsure about the legal status.

@0x0L

This comment has been minimized.

Contributor

0x0L commented Apr 6, 2015

@jlopezpena Take a look at dlib, the code is pretty clear and it's header (templates) only. It should be fairly easy to build a C extension for use from numpy

@jhallock7

This comment has been minimized.

jhallock7 commented Aug 7, 2015

Hi everyone,

I recently translated Mike Tipping’s freely-available SparseBayes MATLAB program, which mainly implements the RVM, from MATLAB into Python. It can be found here: https://github.com/jhallock7/SparseBayes-Python . I contacted Mike Tipping, and he said that Microsoft’s patent is only for the original slow algorithm, whereas the SparseBayes program uses the faster one found here: http://www.miketipping.com/papers/met-fastsbl.pdf . So it would be fine if some form of his program was folded into scikit-learn. I am relatively new to Python, so my translation can undoubtedly be improved or modified.

@amueller

This comment has been minimized.

Member

amueller commented Aug 8, 2015

Thanks for wanting to contribute and also thanks for checking the patent status.
There is another question though, which is if this algorithms is widely useful.

I haven't looked at the algorithm and its use since I last read Bishop's book, which is a while ago.
I think an interesting example would be to show that it either gives better uncertainty then calibrating
an SVM, or that it is faster (calibrating an SVM and searching over C needs a lot of cross-validation).

@ZaixuCui

This comment has been minimized.

ZaixuCui commented Sep 17, 2015

@amueller RVM (mainly revelance vector regression (RVR)) is pretty useful in neuroimaging data analysis. Lots of papers uses this method rather than SVR to predict. It will be perfect, if this method can be added to the scikit learn toolbox.

@AmazaspShumik

This comment has been minimized.

AmazaspShumik commented Sep 29, 2015

@amueller I implemented slow version of RVM which can use either EM or fixed-point algorithm to fit model (mainly for learning / academic purpose) and major difference between RVM and SVR that I noted from couple of examples is sparsity i.e. number of 'support' vectors used in prediction . In many cases RVM produces results comparable with SVR with number of support vectors being only a fraction of what SVR uses
(here is one simple example that is also used in Tipping 2001)

@AmazaspShumik

This comment has been minimized.

AmazaspShumik commented Sep 29, 2015

@amueller (Adding to previous comment) And obviously small number of support vectors will imply very fast prediction.
Another advantage of RVM is probabilistic treatmen. With RVM for every data point in test set you find not only point estimate but also predictive distribution

@amueller

This comment has been minimized.

Member

amueller commented Oct 13, 2015

does RVR provide probability distributions, too?

Sounds like RVR and RVM are reasonable candidates for inclusion. I'm not sure about the state of the art algorithms, though. Is it http://www.miketipping.com/papers/met-fastsbl.pdf ? That seems pretty old. The SparseBayes coding style is.... interesting, and I think it would serve better as a reference than as a basis for the sklearn implementation.

@AmazaspShumik

This comment has been minimized.

AmazaspShumik commented Oct 15, 2015

Yes RVR provides probability distributions, however in some cases variance of predictive distribution can be smaller for data points outside the domain of the training set example.
To the best of my knowledge paper you mentioned is last version of RVM it also corresponds to Matlab implementation on Tipping's website (version 2).

@AmazaspShumik

This comment has been minimized.

AmazaspShumik commented Oct 15, 2015

I also found interesting comparison of RVM and SVM speed in Kevin Murphy's book:
"RVM is also fastest to train. This is despite the fact that RVM code is in Matlab and the SVM code is in C"(Chapter 14, p.490). However it seems that they made comparisons for small datasets only.

@GaelVaroquaux

This comment has been minimized.

Member

GaelVaroquaux commented Oct 15, 2015

@ZaixuCui

This comment has been minimized.

ZaixuCui commented Oct 15, 2015

Some neuroimaging studies have used relevance vector regression (RVR) and
made comparisons between RVR and SVR.
To list a few:
http://www.sciencedirect.com/science/article/pii/S1053811910000108
http://www.sciencedirect.com/science/article/pii/S1053811910012644
http://www.sciencedirect.com/science/article/pii/S1053811910003459
http://www.nature.com/npp/journal/v39/n3/abs/npp2013251a.html

And, the RVR is implemented in a pattern recognition toolbox for
neuroimaging data:
http://www.mlnl.cs.ucl.ac.uk/pronto/

Hope that RVR can be incorporated in scikit-learn.

Best wishes

Zaixu

On Thu, Oct 15, 2015 at 12:57 PM, Gael Varoquaux notifications@github.com
wrote:

@amueller RVM (mainly revelance vector regression (RVR)) is pretty
useful in
neuroimaging data analysis.

I am not convinced (and I do neuroimaging). I haven't seen a good
empirical comparison.

I see the way to go is to have RVMs in a separate package, with
scikit-learn API, and encourage good empirical work to show their
usefulness. If they are useful, merge them in scikit-learn.


Reply to this email directly or view it on GitHub
#1513 (comment)
.

@mblondel

This comment has been minimized.

Member

mblondel commented Oct 15, 2015

I see the way to go is to have RVMs in a separate package, with scikit-learn API, and encourage good empirical work to show their usefulness. If they are useful, merge them in scikit-learn.

+1

@amueller

This comment has been minimized.

Member

amueller commented Oct 15, 2015

I mean there is https://github.com/AmazaspShumik/Bayesian-Regression-Methods/blob/master/Relevance%20Vector%20Machine%20%26%20ARD/rvm.py which looks relatively compatible. needs set_params and get_params or inheriting from BaseEstimator.

And there is https://github.com/jhallock7/SparseBayes-Python which could be wrapped.

@ZaixuCui why do you want it to be in scikit-learn when there is a ready to use implementation out there?

I tend to agree with @GaelVaroquaux and @mblondel . If no-one published on algorithms in nearly ten years, people don't seem to be very interested. [oh, the standard algorithm is from 2003 even. but then again libsvm is 2005]

@ZaixuCui

This comment has been minimized.

ZaixuCui commented Oct 19, 2015

Because I use scikit learn to do SVR, elastic-net.
So, if there is a RVR implementation, I will not need to do use matlab when
do machine learning analysis.

Thanks very much.
Wishes you all the best

Zaixu

On Thu, Oct 15, 2015 at 11:13 AM, Andreas Mueller notifications@github.com
wrote:

I mean there is
https://github.com/AmazaspShumik/Bayesian-Regression-Methods/blob/master/Relevance%20Vector%20Machine%20%26%20ARD/rvm.py
which looks relatively compatible. needs set_params and get_params or
inheriting from BaseEstimator.

And there is https://github.com/jhallock7/SparseBayes-Python which could
be wrapped.

@ZaixuCui https://github.com/ZaixuCui why do you want it to be in
scikit-learn when there is a ready to use implementation out there?

I tend to agree with @GaelVaroquaux https://github.com/GaelVaroquaux
and @mblondel https://github.com/mblondel . If no-one published on
algorithms in nearly ten years, people don't seem to be very interested.


Reply to this email directly or view it on GitHub
#1513 (comment)
.

@GaelVaroquaux

This comment has been minimized.

Member

GaelVaroquaux commented Oct 19, 2015

@ZaixuCui

This comment has been minimized.

ZaixuCui commented Oct 19, 2015

OK,thanks

On Mon, Oct 19, 2015 at 8:29 AM, Gael Varoquaux notifications@github.com
wrote:

Because I use scikit learn to do SVR, elastic-net.
So, if there is a RVR implementation, I will not need to do use matlab
when
do machine learning analysis.

You can use the Python code doing RVR that we pointed to in the
discussion.


Reply to this email directly or view it on GitHub
#1513 (comment)
.

@glouppe

This comment has been minimized.

Member

glouppe commented Oct 19, 2015

Couldnt we implement this as very lightweight class based on our new Gaussian Process implementation? As a far as I understand, RVR is only the name given to a GP with a special kind of kernel.

Though this would require only minimal effort, basing the implementation of RVR on the one of GP may not be the most appropriate thing to do? CC: @jmetzen

@AmazaspShumik

This comment has been minimized.

AmazaspShumik commented Feb 5, 2016

@amueller @GaelVaroquaux @ZaixuCui @yedtoss @jlopezpena

Hi, I implemented fast version of Relevance Vector Machine with scikit-learn API,
so if anybody intends to use it feel free to do it.

Code: https://github.com/AmazaspShumik/sklearn_bayes/blob/master/sklearn_bayes/rvm/fast_rvm.py

Examples: https://github.com/AmazaspShumik/sklearn_bayes/blob/master/ipython_notebooks_tutorials/rvm_ard/rvm_demo.ipynb

There are four implemented classes in the code:
-RegressionARD
-ClassificationARD
-RVC
-RVR

So may be RegressionARD and ClassificationARD can be useful as well

@liamnaka

This comment has been minimized.

liamnaka commented Feb 21, 2016

@AmazaspShumik thank you so much for your implementation. Great work 👍

@ZaixuCui

This comment has been minimized.

ZaixuCui commented Feb 21, 2016

@AmazaspShumik

Thank you for your efforts very much.
I will definitely try this package.

Wish you all the best.

Zaixu

@Blair-Young

This comment has been minimized.

Blair-Young commented Jan 31, 2017

Has anyone had trouble implementing @AmazaspShumik predict_proba method?

@tendiarifin

This comment has been minimized.

tendiarifin commented Mar 15, 2017

any one here have library for RVM on php? i dont understand with RVm can explain for me?

@tendiarifin

This comment has been minimized.

tendiarifin commented Apr 20, 2017

any one have library RVM for PHP?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment