Berkeley Word Aligner [Unsupervised as well as Supervised] #102

Open
alexrudnick opened this Issue Jan 17, 2012 · 1 comment

Projects

None yet

2 participants

@alexrudnick
Natural Language Toolkit member

The current state of the art (in terms of Alignment Error Rate) from Berkeley. However, not much
better than GIZA++ in terms of actual machine translation performance (which is what word
alignments are most commonly used for).

However, from personal experience, I can state that one would not want to wish upon themselves
the nightmare of porting GIZA++. So, a better candidate for implementation is this word aligner
that works pretty well.

Java implementation and links to papers: http://nlp.cs.berkeley.edu/Main.html#WordAligner
License: GPL [so might have the same issues as we have been discussing for the Stanford Parser]

Unfortunately, the papers aren't very useful in understanding the algorithms unless you actually
have a pretty good background in word alignment research including the inversion transduction
grammar [ITG} formalism proposed by Dekai Wu back in the late 1990s. Perhaps, faster to build
a wrapper.

MIght also be a good idea to use the same Aligner interface that we have also been discussing.

Migrated from http://code.google.com/p/nltk/issues/detail?id=517


earlier comments

johnstcn said, at 2010-07-22T11:08:29.000Z:

Perhaps instead of re-implementing the whole thing from scratch, something like

@alvations

Bumping this issue up, align/translate has gone through quite some changes. Wrapping modern day word alignments would totally help in improving NLTK usage in Machine Translation.

@alvations alvations added the SMT label Nov 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment