Skip to content
Alex Rudnick edited this page Oct 1, 2013 · 3 revisions

Alex discussing with Mike about the system and the LREC paper we want to write about it.

these things have to be pluggable for different language pairs

  • input verification: is the user putting in obviously wrong text?
    • character set identification
    • unicode checks
    • langid -- maybe with langid.py by default?
  • sentence segmenters for your language: this should be pluggable.
    • check out OmegaT's segmenters and how they work.
  • Also text normalization routines. Let's write about GuaraniTextProcessing in a separate page...

how does the comment system work?

how does logging in work?

  • can we just use oauth or whatever? how does it work to log in to, eg, AskBot?

accounts: what are they for?

  • Will they prevent spam?
    • devil's advocate: could we just include a captcha?
  • what do we want to keep track of, for each user?

thinking about what kind of content we want to allow people to post and privacy concerns

  • Can you anonymously upload documents?
  • Can you anonymously provide translations?
  • Other sites (installs of Guampa) might want to do political things. There's that site for leftist translators already -- what is it? Does Mike have the link?

documents should have some tags so we/users know what they're for

  • tag: who's the intended audience?
  • tag: what is the genre of this document?

interface i18n

  • maybe the interface should be in the document target language
  • this should be configurable for sure. Is there support for l10n in angular builtin?
  • For me and Mike, for example, it would be easy to translate Spanish articles into English, and easiest for us if the interface is in English.

Should the owner of a document be able to lock a sentence?

  • How do we prevent (or at least channel) edit wars?

what's novel, that we can write about in the paper?

  • Well, one thing is Guampa is novel in part because it's FOSS and there doesn't seem to be a good FOSS tool for this.
  • unlike tatoeba:
    • we are for TM, MT, and CAT (in the long run). But we're explicitly about MT.
    • We are for translating documents.
  • unlike traduwiki:
    • we are for MT and have proper sentence segmentation
    • traduwiki is for translating a document too.
  • deep question: what's the benefit for volunteer translators?
    • Well, we do have activists and students...
  • why is this different than Pootle?
    • Pootle's interface is kind of nonobvious, and it's meant for UI strings, it seems like. Not documents? And it's not for reading.
    • but it does have terminology come up...

think about how to demo the thing

  • canned data?
  • maybe we should use it ourselves, translate some wikipedia documents es->en maybe.