Suggested additional functionality to GMM #205

rcurtin · 2014-12-29T11:15:47Z

Reported by Adam on 4 Dec 42211383 00:14 UTC
Within the GMM methods, I noticed that you are using the standard EM algorithm to estimate / fit the data. I think it would be extremely useful to have an additional more flexible estimation / fitting method that allows the use of priors and / or unknown numbers of clusters. The obvious choice would be the IGMM (infinite Gaussian Mixture Model), also known as Diriclet process mixture model.

More info of what I am talking about / envision here,

http://scikit-learn.sourceforge.net/dev/modules/mixture.html

Migrated-From: http://trac.research.cc.gatech.edu/fastlab/ticket/211

rcurtin · 2014-12-30T01:38:31Z

Commented by rcurtin on 6 Nov 42236434 16:09 UTC
The way to do this would be to modify the GMM class to take a template class for the clustering:

class GMM {
  // ...

  template<typename ClusteringMethod = EMCluster> // or some similar name
  void Estimate(const arma::mat& observations);

  // ...
}

and then in the class,

template<typename ClusteringMethod>
void GMM::Estimate(const arma::mat& observations)
{
  // All of the hard work is done by the clustering method.
  // Maybe some other administrivia goes in this method.
  // We should have to tell the clustering method where to store its results.
  ClusteringMethod::Cluster(observations, means, covariances, weights);
}

or something similar to that. I can easily split off the EM algorithm into its own file, but I don't have time to implement the IGMM training as a separate class, unfortunately. If you are interested in doing that implementation, we'll just have to agree on a uniform API call to ClusteringMethod::Cluster() and then you'll need to write the IGMMClustering class (or whatever name might suit it better) and I'll take care of integrating it all.

There is a very old L2 loss implementation of GMM training floating around this lab somewhere; if we move forward with this I'll have to see if I can dig it out of its grave.

rcurtin · 2014-12-30T01:39:31Z

Commented by Adam on 27 Nov 42236520 13:16 UTC
As I stated in my other reply to GMM Clustering query, I am currently porting some Matlab code I have written, which implements a form of IGMM by overriding the call to the default EM algorithm, so being able to simpler with MLPACK and thus only have to code up the "fitting" method would be ideal.

Not sure I would call it "Cluster"...as this maybe confused with the Component Assignment / Classify method discussed in the other thread. Maybe just "Fitting Method".

rcurtin · 2014-12-30T01:40:31Z

Commented by rcurtin on 21 Nov 42277407 13:04 UTC
Okay; I've been working on this for a while. Now GMM is a templated class which allows any number of fitting methods. The default is the EM algorithm (which can use any initial clustering mechanism, other than the default KMeans):

template<typename FittingType = EMFit<> >
class GMM<FittingType>;

The FittingType class needs to implement the following two functions:

void Estimate(const arma::mat& observations,
              std::vector<arma::vec>& means,
              std::vector<arma::mat>& covariances,
              arma::vec& weights);

void Estimate(const arma::mat& observations,
              const arma::vec& probabilities,
              std::vector<arma::vec>& means,
              std::vector<arma::mat>& covariances,
              arma::vec& weights);

These functions should produce a trained GMM from the given observations and probabilities. These may modify the size of the model (by increasing the size of the mean and covariance vectors as well as the weight vectors), but
the method should expect that these vectors are already set to the size of the GMM as specified in the constructor.

The EMFit type, in gmm/em_fit.hpp and gmm/em_fit_impl.hpp provide an example implementation of the EMFit class.

So hopefully, if you are writing an IGMM fitting type, you can plug it right in without having to modify the GMM code at all.

If the API needs any changes, let me know. Sorry this took so long; I kept stumbling over simple compilation issues.

rcurtin · 2015-01-09T02:02:35Z

Closed for inactivity.

rcurtin self-assigned this Dec 29, 2014

rcurtin removed their assignment Dec 30, 2014

rcurtin added C: mlpack t: feature request labels Jan 1, 2015

rcurtin removed the C: mlpack label Jan 8, 2015

rcurtin closed this as completed Jan 9, 2015

rcurtin added the s: inactive label Jan 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested additional functionality to GMM #205

Suggested additional functionality to GMM #205

rcurtin commented Dec 29, 2014

rcurtin commented Dec 30, 2014

rcurtin commented Dec 30, 2014

rcurtin commented Dec 30, 2014

rcurtin commented Jan 9, 2015

Suggested additional functionality to GMM #205

Suggested additional functionality to GMM #205

Comments

rcurtin commented Dec 29, 2014

rcurtin commented Dec 30, 2014

rcurtin commented Dec 30, 2014

rcurtin commented Dec 30, 2014

rcurtin commented Jan 9, 2015