Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested additional functionality to GMM #205

Closed
rcurtin opened this issue Dec 29, 2014 · 4 comments
Closed

Suggested additional functionality to GMM #205

rcurtin opened this issue Dec 29, 2014 · 4 comments

Comments

@rcurtin
Copy link
Member

rcurtin commented Dec 29, 2014

Reported by Adam on 4 Dec 42211383 00:14 UTC
Within the GMM methods, I noticed that you are using the standard EM algorithm to estimate / fit the data. I think it would be extremely useful to have an additional more flexible estimation / fitting method that allows the use of priors and / or unknown numbers of clusters. The obvious choice would be the IGMM (infinite Gaussian Mixture Model), also known as Diriclet process mixture model.

More info of what I am talking about / envision here,

http://scikit-learn.sourceforge.net/dev/modules/mixture.html

Migrated-From: http://trac.research.cc.gatech.edu/fastlab/ticket/211

@rcurtin rcurtin self-assigned this Dec 29, 2014
@rcurtin
Copy link
Member Author

rcurtin commented Dec 30, 2014

Commented by rcurtin on 6 Nov 42236434 16:09 UTC
The way to do this would be to modify the GMM class to take a template class for the clustering:

class GMM {
  // ...

  template<typename ClusteringMethod = EMCluster> // or some similar name
  void Estimate(const arma::mat& observations);

  // ...
}

and then in the class,

template<typename ClusteringMethod>
void GMM::Estimate(const arma::mat& observations)
{
  // All of the hard work is done by the clustering method.
  // Maybe some other administrivia goes in this method.
  // We should have to tell the clustering method where to store its results.
  ClusteringMethod::Cluster(observations, means, covariances, weights);
}

or something similar to that. I can easily split off the EM algorithm into its own file, but I don't have time to implement the IGMM training as a separate class, unfortunately. If you are interested in doing that implementation, we'll just have to agree on a uniform API call to ClusteringMethod::Cluster() and then you'll need to write the IGMMClustering class (or whatever name might suit it better) and I'll take care of integrating it all.

There is a very old L2 loss implementation of GMM training floating around this lab somewhere; if we move forward with this I'll have to see if I can dig it out of its grave.

@rcurtin
Copy link
Member Author

rcurtin commented Dec 30, 2014

Commented by Adam on 27 Nov 42236520 13:16 UTC
As I stated in my other reply to GMM Clustering query, I am currently porting some Matlab code I have written, which implements a form of IGMM by overriding the call to the default EM algorithm, so being able to simpler with MLPACK and thus only have to code up the "fitting" method would be ideal.

Not sure I would call it "Cluster"...as this maybe confused with the Component Assignment / Classify method discussed in the other thread. Maybe just "Fitting Method".

@rcurtin
Copy link
Member Author

rcurtin commented Dec 30, 2014

Commented by rcurtin on 21 Nov 42277407 13:04 UTC
Okay; I've been working on this for a while. Now GMM is a templated class which allows any number of fitting methods. The default is the EM algorithm (which can use any initial clustering mechanism, other than the default KMeans):

template<typename FittingType = EMFit<> >
class GMM<FittingType>;

The FittingType class needs to implement the following two functions:

void Estimate(const arma::mat& observations,
              std::vector<arma::vec>& means,
              std::vector<arma::mat>& covariances,
              arma::vec& weights);

void Estimate(const arma::mat& observations,
              const arma::vec& probabilities,
              std::vector<arma::vec>& means,
              std::vector<arma::mat>& covariances,
              arma::vec& weights);

These functions should produce a trained GMM from the given observations and probabilities. These may modify the size of the model (by increasing the size of the mean and covariance vectors as well as the weight vectors), but
the method should expect that these vectors are already set to the size of the GMM as specified in the constructor.

The EMFit type, in gmm/em_fit.hpp and gmm/em_fit_impl.hpp provide an example implementation of the EMFit class.

So hopefully, if you are writing an IGMM fitting type, you can plug it right in without having to modify the GMM code at all.

If the API needs any changes, let me know. Sorry this took so long; I kept stumbling over simple compilation issues.

@rcurtin
Copy link
Member Author

rcurtin commented Jan 9, 2015

Closed for inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant