New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hmm_train number of gaussians #479
Comments
Hi Davud, Consider running with '-v' (--verbose) for more output. I think that one issue might be that you don't have labeled points from every state. If that was the problem, it should be fixed in 2eface7. Another possible issue would have been that invalid labels were specified; if so, it should be fixed in fa89192. If neither of those fix your issue, if you can get me a copy of There is no restriction on the number of Gaussians, but note that as you add more and more Gaussians to the GMM, if you don't have many samples, you may have stability issues with the empirical covariance matrices (the debug messages that mention that the covariance matrix is not positive definite could be an indicator of this). |
Hi Ryan, [INFO ] GMM::Estimate(): log-likelihood of trained GMM is 9248.34. error: Mat::col(): index out of bounds terminate called after throwing an instance of 'std::logic_error' Program received signal SIGABRT, Aborted. There is a "no such file" - error, but I don't think it comes from me. I am pretty that I have labels for every state and I don't get invalid label errors. (you will have to delete the .txt extension) How many samples are about "enough" samples? I just run with 4000. I will get some more but I don't think I will have more than 50k. Will that be enough to do some more gaussians? |
Hey Davud, Thanks for the backtrace; judging by its output, it looks like this bug is exactly what was fixed in #481 a few days ago. I used the dataset that you linked to, and tested with the current git master and had no problem, then tested with git master before #481 was merged (specifically, a7d8231), and had the exact same issue that you had here. So I believe the issue is solved if you update to the newest git master. If you do try to keep increasing the number of Gaussians, though, note that you can't increase it past the number of samples in your smallest class. For instance, in your data, here is the class breakdown:
So you can't increase the number of Gaussians above 26, because class 7 only has 26 observations. If you, for instance, specify |
Thanks for the help and the good explanation. Now everything works fine. |
Hi,
I am just playing around with hmm_train and experience some problems with the number of gaussians. I don't have a strategy on how to decide the number of gaussians, yet. So I just tried some.
With values around five everything seems to be fine. But with 10 and above I get
error: Mat::col(): index out of bounds
terminate called after throwing an instance of 'std::logic_error'
I run it like this:
./hmm_train -i observationPKM.csv -n 13 -t gmm -g 10 -o hmm.xml -l labels.csv
Also the Debug informations look like this:
[DEBUG] Covariance matrix is not positive definite. Adding perturbation.
[DEBUG] Covariance matrix is not positive definite. Adding perturbation.
[DEBUG] Covariance matrix is not positive definite. Adding perturbation.
[DEBUG] Covariance matrix is not positive definite. Adding perturbation.
[DEBUG] Covariance matrix is not positive definite. Adding perturbation.
[DEBUG] Covariance matrix is not positive definite. Adding perturbation.
[DEBUG] Covariance matrix is not positive definite. Adding perturbation.
[DEBUG] Point 4 assigned to empty cluster 0.
[DEBUG] Point 5 assigned to empty cluster 2.
[DEBUG] Point 6 assigned to empty cluster 3.
...
Is this normal? And is there an upper limit to the number of gaussian?
Greetings
Davud
The text was updated successfully, but these errors were encountered: