"Attention is All You Need" ,(Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). This implementation is done in keras with tensorflow.
The paper presented a novel sequence to sequence framework that engaged the self-attention mechanism with feed forward network instead of Recurrent network structure, and achieve the state-of-the-art performance on WMT 2014 English-to-German translation task. (2017/06/12)
- Multihead Self Attention Layer
- Feed Forward Layer
- Feed Forward layer
- Encoder to Decoder layer
- Self Attention Layer.
batch_size=64
d_inner_hid=1024
d_k=64
d_v=64
d_model=512
d_word_vec=512
dropout=0.1
embs_share_weight=False
n_head=8
n_layers=6
n_warmup_steps=4000
proj_share_weight=True
- Python 3
- Numpy
- Tensorflow
- Keras