Stochastically Subsampled Self-Attention (SSA)

This is the official GitHub repository for the paper "The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles." (arXiv:2306.01705) The paper proposes the Stochastically Subsampled Self-Attention (SSA) algorithm, which can reduce the computational and memory requirement of transformers while also serving as a regularization method.

News

Our paper has been accepted at KDD '23. The preprint is available on arXiv. The code and implementation for the SSA algorithm will be released soon in this repository. Stay tuned for updates!

Citation

If you find this work useful or refer to it in your research, please consider citing:

@article{hussain2023information,
  title={The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles},
  author={Hussain, Md Shamim and Zaki, Mohammed J and Subramanian, Dharmashankar},
  journal={arXiv preprint arXiv:2306.01705},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stochastically Subsampled Self-Attention (SSA)

News

Citation

About

Releases

Packages

shamim-hussain/ssa

Folders and files

Latest commit

History

Repository files navigation

Stochastically Subsampled Self-Attention (SSA)

News

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages