Explicit-Sparse-Transformer

In Explicit Sparse Transformer, we propose an algorithm which sparse attention weights in transformer according to their activations.

2020 1/4 we upload code for explicit sparse transformer in tensor2tensor and fairseq, see t2t_envi_est.sh and fairseq_deen_est.sh for details.

2021 1/14 we address an import error related to SparseActivatedMultiheadAttention

2021 5/9 In the preprint, we shown that top-k attention is additive with block sparse method "transformer-xl" which has the static local attention span. Here we find that top-k attention is also additive with an adaptive local sparse attention method "Adaptive Attention Span in Transformers" https://arxiv.org/abs/1905.07799?context=cs.LG and the top-k method can further reduce the length of the learned attention span and thus improves attention efficiency. See the directory of 'adaptive-span' for the detailed implementation. Here is an illustration drawn from training logs:

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
adaptive-span		adaptive-span
fairseq_deen_ende		fairseq_deen_ende
tensor2tensor_envi		tensor2tensor_envi
Extremely Sparse Transformer.pdf		Extremely Sparse Transformer.pdf
README.md		README.md
Top_k_Selective_Attention__NeuroComputing_.pdf		Top_k_Selective_Attention__NeuroComputing_.pdf
fairseq_deen_est.sh		fairseq_deen_est.sh
t2t-avg-all		t2t-avg-all
t2t-bleu		t2t-bleu
t2t-datagen		t2t-datagen
t2t-decoder		t2t-decoder
t2t-trainer		t2t-trainer
t2t-translate-all		t2t-translate-all
t2t_envi_est.sh		t2t_envi_est.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adaptive-span

adaptive-span

fairseq_deen_ende

fairseq_deen_ende

tensor2tensor_envi

tensor2tensor_envi

Extremely Sparse Transformer.pdf

Extremely Sparse Transformer.pdf

README.md

README.md

Top_k_Selective_Attention__NeuroComputing_.pdf

Top_k_Selective_Attention__NeuroComputing_.pdf

fairseq_deen_est.sh

fairseq_deen_est.sh

t2t-avg-all

t2t-avg-all

t2t-bleu

t2t-bleu

t2t-datagen

t2t-datagen

t2t-decoder

t2t-decoder

t2t-trainer

t2t-trainer

t2t-translate-all

t2t-translate-all

t2t_envi_est.sh

t2t_envi_est.sh

Repository files navigation

Explicit-Sparse-Transformer

About

Releases

Packages

Languages

lancopku/Explicit-Sparse-Transformer

Folders and files

Latest commit

History

Repository files navigation

Explicit-Sparse-Transformer

About

Resources

Stars

Watchers

Forks

Languages