I'm trying to use "mlpack_hmm_train", but there's something strange. My data for testing is1 11 111\n 2 22 222\n 3 33 333\n 4 44 444\n 1 11 111\n 2 22 222\n ... saved in "input.txt", and with the command line mlpack_hmm_train -i input.txt -t gaussian -n 4 -o model.txt, but no model could be trained out, unless a label file like0\n 1\n 2\n 3\n 0\n 1\n ... is specified. Does the unsupervised training method really work?
1 11 111\n 2 22 222\n 3 33 333\n 4 44 444\n 1 11 111\n 2 22 222\n ...
mlpack_hmm_train -i input.txt -t gaussian -n 4 -o model.txt
0\n 1\n 2\n 3\n 0\n 1\n ...
Thanks for reporting the issue. It turns out that there was a problem. If your data was normalized to be all in the range [0, 1], training would have worked... :)
The emission distributions were being initialized to a zero-mean unit-covariance Gaussian, but your data takes values very far away from that Gaussian. Thus during the forward-backward algorithm, the probabilities of each point would be 0, and this caused the log-likelihood to be -inf, which caused training to fail.
In 1b3f401 I committed a fix to this problem. Now the emission distributions are initialized using the full dataset. You can check out the newest version of mlpack from git. Can you check and see that that fixes your issue with your full dataset?
Thank you for your reply. I've tried the latest version. Good news is I did get a model, using the same testing data and command(-o model.txt was replaced by -M model.txt). But another problem is the viterbi algorithm, mlpack_hmm_viterbi -m model.txt -i input.txt -o vtb.txt (notice that I'm using the input.txt file, which has been used training the model), the result turns out to be 0 0 0 0..., which means they all belong to the same state. By the way, the data has a size of 3×1056, I think it's quite enough. Again, if a label file is specified, everything would be ok.
mlpack_hmm_viterbi -m model.txt -i input.txt -o vtb.txt
0 0 0 0...
Try training with --random_initialization; this should fix the issue. Without that option, the transition probabilities are initialized to all equal values and this can mess up the optimization. When mlpack 2.1.1 is released the option will be removed because I will make random initialization the default---uniform initialization has caused too many problems to be reliable.
Yeah, it's really a good suggestion, many thanks.
Great to hear it worked. I will release mlpack 2.1.1 with these fixes then.