Skip to content

Commit

Permalink
Attention improvements:
Browse files Browse the repository at this point in the history
1) Add support for positional embeddings in attention
2) Add support for sqrt-scaling in attention for identity nonlinearity.
3) Add support for various nonlinearities in 1/d attention
4) Improve numerical accuracy of attention layer by manually fusing multiplicative constants, i.e. doing (c1 * c2) * (A @ B) instead of (c1 * A) @ (c2 * B).
5) Minor tweaks to docstrings.

Co-authored-by: Jiri Hron <jirihron@google.com>
PiperOrigin-RevId: 321938062
  • Loading branch information
romanngg and Jiri Hron committed Jul 23, 2020
1 parent 841a33a commit 253ddea
Show file tree
Hide file tree
Showing 4 changed files with 813 additions and 289 deletions.

0 comments on commit 253ddea

Please sign in to comment.