Hybrid Self-Attention Network for Machine Translation #1

zomux · 2019-01-28T22:00:17Z

Main Authors / Organization

Kaitao Song, Xu Tan, Furong Peng, Jianfeng Lu

Nanjing University of Science and Technology
Microsoft Research
Institute of Big Data Science and Industry, Shanxi University

Adding relative positional information to self-attention improves Transformer.

Using directional mask and local mask then fuse the representations together in the self-attention mechanism.

WMT14 De-En Translation task: + 0.4 BLEU
WMT17 Ch-En Translation task: +1.0 BLEU

zomux added the Machine Translation label Jan 28, 2019