New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggested additional functionality to GMM #205
Comments
Commented by rcurtin on 6 Nov 42236434 16:09 UTC
and then in the class,
or something similar to that. I can easily split off the EM algorithm into its own file, but I don't have time to implement the IGMM training as a separate class, unfortunately. If you are interested in doing that implementation, we'll just have to agree on a uniform API call to There is a very old L2 loss implementation of GMM training floating around this lab somewhere; if we move forward with this I'll have to see if I can dig it out of its grave. |
Commented by Adam on 27 Nov 42236520 13:16 UTC Not sure I would call it "Cluster"...as this maybe confused with the Component Assignment / Classify method discussed in the other thread. Maybe just "Fitting Method". |
Commented by rcurtin on 21 Nov 42277407 13:04 UTC
The FittingType class needs to implement the following two functions:
These functions should produce a trained GMM from the given observations and probabilities. These may modify the size of the model (by increasing the size of the mean and covariance vectors as well as the weight vectors), but The EMFit type, in So hopefully, if you are writing an IGMM fitting type, you can plug it right in without having to modify the GMM code at all. If the API needs any changes, let me know. Sorry this took so long; I kept stumbling over simple compilation issues. |
Closed for inactivity. |
Reported by Adam on 4 Dec 42211383 00:14 UTC
Within the GMM methods, I noticed that you are using the standard EM algorithm to estimate / fit the data. I think it would be extremely useful to have an additional more flexible estimation / fitting method that allows the use of priors and / or unknown numbers of clusters. The obvious choice would be the IGMM (infinite Gaussian Mixture Model), also known as Diriclet process mixture model.
More info of what I am talking about / envision here,
http://scikit-learn.sourceforge.net/dev/modules/mixture.html
Migrated-From: http://trac.research.cc.gatech.edu/fastlab/ticket/211
The text was updated successfully, but these errors were encountered: