Code for "Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation" at NAACL 2019
Untranslated Term Annotations
Untranslated term annotations for the NAIST Simultaneous Translation Corpus will be provided upon request (via email) after confirmation that you have access to the corpus, available at https://ahcweb01.naist.jp/resource/stc/.
The feature code also requires the EIJIRO English-Japanese bilingual dictionary, which you'll need to purchase here: http://www.eijiro.jp/get-144.htm. Alternatively, you could construct your own bilingual dictionary using NLP tools.
Ngram word frequencies are obtained using the Google 1T Ngrams corpus: https://catalog.ldc.upenn.edu/LDC2006T13.