code and latex for wmt15
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
chase
latex
sean
.gitignore
readme.md

readme.md

Using the Gigaword LM

The Gigaword v5 learned language models are located at

/nas/data/english-gigaword-v5/processed/5gram.lm.bin
/nas/data/english-gigaword-v5/processed/6gram.lm.bin

To use the LM, in the [LM:toy] (or whatever) section comment out everything except

lm = /nas/data/english-gigaword-v5/processed/6gram.lm.bin

Don't erase anything in the base [LM] section or it messes up the tuning for some reason.

It would be good to test the LM on German-English translation to see if they have been generated correctly. Using the large LM should increase the BLEU score. It doesn't for Finnish, but perhaps that's because the translation model is just too bad to even get close to a fluent English version.

If for some reason Moses looks for the LM in its lm/ directory and can't find it, you can just symlink it there from the NAS. Not sure why this happens when you specify a premade LM.

Installing Morfessor

Run these commands

git clone https://github.com/aalto-speech/morfessor.git
cd morfessor
python setup.py install --user

Then, add this line to your .bashrc

export PYTHONPATH=$PYTHONPATH:/home/NETID/morfessor

Make sure to source the .bashrc again.

Using Morfessor Models

I have several Morfessor models trained. Let me know if you'd like access to them. One of them is trained on a huge list of Finnish words and their frequencies. The others are trained on Europarl with various parameters.