Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1) Add support for positional embeddings in attention 2) Add support for sqrt-scaling in attention for identity nonlinearity. 3) Add support for various nonlinearities in 1/d attention 4) Improve numerical accuracy of attention layer by manually fusing multiplicative constants, i.e. doing (c1 * c2) * (A @ B) instead of (c1 * A) @ (c2 * B). 5) Minor tweaks to docstrings. Co-authored-by: Jiri Hron <jirihron@google.com> PiperOrigin-RevId: 321938062
- Loading branch information