The current state of the art (in terms of Alignment Error Rate) from Berkeley. However, not much
better than GIZA++ in terms of actual machine translation performance (which is what word
alignments are most commonly used for).
However, from personal experience, I can state that one would not want to wish upon themselves
the nightmare of porting GIZA++. So, a better candidate for implementation is this word aligner
that works pretty well.
Java implementation and links to papers: http://nlp.cs.berkeley.edu/Main.html#WordAligner
License: GPL [so might have the same issues as we have been discussing for the Stanford Parser]
Unfortunately, the papers aren't very useful in understanding the algorithms unless you actually
have a pretty good background in word alignment research including the inversion transduction
grammar [ITG} formalism proposed by Dekai Wu back in the late 1990s. Perhaps, faster to build
MIght also be a good idea to use the same Aligner interface that we have also been discussing.
Migrated from http://code.google.com/p/nltk/issues/detail?id=517
johnstcn said, at 2010-07-22T11:08:29.000Z:
Perhaps instead of re-implementing the whole thing from scratch, something like
Bumping this issue up, align/translate has gone through quite some changes. Wrapping modern day word alignments would totally help in improving NLTK usage in Machine Translation.