Skip to content
Alex Rudnick edited this page Oct 29, 2013 · 3 revisions

There's been some interesting related work in using crowdsourcing-like techniques to build corpora.

Here's what we know about so far:

MonoTrans

http://www.cs.umd.edu/hcil/monotrans/

"... an iterative protocol in which monolingual human participants work together to improve imperfect machine translations."

Vamshi Ambati's work

http://www.cs.cmu.edu/~vamshi/Vamshi/Publications.html

"Active Learning for Machine Translation in Scarce Data Scenarios" -- using active learning with Mechanical Turk to bootstrap an en-es corpus for training SMT.

Tradubi

http://wiki.apertium.org/wiki/Tradubi

(project seems to have died, though)

Traduwiki

http://traduwiki.org

This looks awesome. If we could just bring up an instance of this, that would be close to the right thing.

Tatoeba

http://tatoeba.org/eng/home

Really nice site for collecting example sentences translated into a variety of languages.

OPUS

"the open parallel corpus".

http://opus.lingfil.uu.se/

WikiBhasha

From MSR. Seems really relevant. They're about getting the content from English into smaller wikipedias. But only English for some reason? Relates to Bing Translator somehow. Project seems dormant.