This is the official GitHub repository for the paper "The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles." (arXiv:2306.01705) The paper proposes the Stochastically Subsampled Self-Attention (SSA) algorithm, which can reduce the computational and memory requirement of transformers while also serving as a regularization method.
- Our paper has been accepted at KDD '23. The preprint is available on arXiv. The code and implementation for the SSA algorithm will be released soon in this repository. Stay tuned for updates!
If you find this work useful or refer to it in your research, please consider citing:
@article{hussain2023information,
title={The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles},
author={Hussain, Md Shamim and Zaki, Mohammed J and Subramanian, Dharmashankar},
journal={arXiv preprint arXiv:2306.01705},
year={2023}
}