Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 751 Bytes

learning-to-translate-real-time.md

File metadata and controls

3 lines (2 loc) · 751 Bytes

The authors propose a framework where a Reinforcement Learning agents makes decisions of reading the next input words or producing the next output word to trade off translation quality and time delay (caused by read operations). The reward function is based on both quality (BLEU score) and delay (various metrics and hyperparameters). The authors use Policy Gradient to optimize the model, which is initialized from a pre-trained translation model. They apply to approach to WMT'15 EN-DE and EN-RU translation and show that the model increases translation quality in all settings and is able to trade off effectively between quality and delay.