Recaser trainer updated to support IRSTLM as well. #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A per discussions on the mailing list, everyone who wants to use IRSTLM instead of SRILM for training the recaser was modifying train-recaser.perl. Rather than always doing this (and probably see other messages about this on the list), I thought it might be worth updating the script.
Note that by default, the script will still use SRILM, which prevent from breakage any existing script calling the current version of train-recaser.perl.
To use IRSTLM instead of SRILM, only adding "-lm irstlm" on the command line is enough.
In case build-lm.sh is not in $PATH, there is also a new option -build-lm which allows one to specify the given path of the script to use (with build-lm.sh command line syntax).
If anyone wants to add other language models (for instance KenLM would be great, after all default in Moses!), that will be easy using the -lm option.
Thanks.