RelatedWork

There's been some interesting related work in using crowdsourcing-like techniques to build corpora.

Here's what we know about so far:

MonoTrans

http://www.cs.umd.edu/hcil/monotrans/

"... an iterative protocol in which monolingual human participants work together to improve imperfect machine translations."

Vamshi Ambati's work

http://www.cs.cmu.edu/~vamshi/Vamshi/Publications.html

"Active Learning for Machine Translation in Scarce Data Scenarios" -- using active learning with Mechanical Turk to bootstrap an en-es corpus for training SMT.

Tradubi

http://wiki.apertium.org/wiki/Tradubi

(project seems to have died, though)

Traduwiki

http://traduwiki.org

This looks awesome. If we could just bring up an instance of this, that would be close to the right thing.

Tatoeba

http://tatoeba.org/eng/home

Really nice site for collecting example sentences translated into a variety of languages.

OPUS

"the open parallel corpus".

http://opus.lingfil.uu.se/

WikiBhasha

From MSR. Seems really relevant. They're about getting the content from English into smaller wikipedias. But only English for some reason? Relates to Bing Translator somehow. Project seems dormant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly