We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kaitao Song, Xu Tan, Furong Peng, Jianfeng Lu
Nanjing University of Science and Technology Microsoft Research Institute of Big Data Science and Industry, Shanxi University
https://arxiv.org/pdf/1811.00253.pdf
Adding relative positional information to self-attention improves Transformer.
Using directional mask and local mask then fuse the representations together in the self-attention mechanism.
WMT14 De-En Translation task: + 0.4 BLEU WMT17 Ch-En Translation task: +1.0 BLEU
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Main Authors / Organization
Kaitao Song, Xu Tan, Furong Peng, Jianfeng Lu
Nanjing University of Science and Technology
Microsoft Research
Institute of Big Data Science and Industry, Shanxi University
PDF link
https://arxiv.org/pdf/1811.00253.pdf
Hypothesis
Adding relative positional information to self-attention improves Transformer.
Approach
Using directional mask and local mask then fuse the representations together in the self-attention mechanism.
Main Experimental Result
WMT14 De-En Translation task: + 0.4 BLEU
WMT17 Ch-En Translation task: +1.0 BLEU
The text was updated successfully, but these errors were encountered: