Using the Gigaword LM
The Gigaword v5 learned language models are located at
To use the LM, in the
[LM:toy] (or whatever) section comment out everything
lm = /nas/data/english-gigaword-v5/processed/6gram.lm.bin
Don't erase anything in the base
[LM] section or it messes up the tuning for
It would be good to test the LM on German-English translation to see if they have been generated correctly. Using the large LM should increase the BLEU score. It doesn't for Finnish, but perhaps that's because the translation model is just too bad to even get close to a fluent English version.
If for some reason Moses looks for the LM in its lm/ directory and can't find it, you can just symlink it there from the NAS. Not sure why this happens when you specify a premade LM.
Run these commands
git clone https://github.com/aalto-speech/morfessor.git cd morfessor python setup.py install --user
Then, add this line to your .bashrc
Make sure to source the .bashrc again.
Using Morfessor Models
I have several Morfessor models trained. Let me know if you'd like access to them. One of them is trained on a huge list of Finnish words and their frequencies. The others are trained on Europarl with various parameters.