minGPT with Synthesizer Attention

Andrej Kaparthy's minGPT model buit in PyTorch. Two types of attention are available for use in the model: a standard masked multi-headed self attention or a Synthesizer self attention. The attention classes are found in attention.py.

Synthesizer attention: $Y_{i} = softmax(ReLU(XA_{i} + b_{1})B_{i} + b_{2})(XV_{i})$

The synthesizer variant eschews the pairwise dot products and directly computes the ℓ × ℓ matrix of attention scores by mapping each d-dimensional vector of each head for X to an ℓ-dimesional vector of unnormalized attention weights.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
attention.py		attention.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

attention.py

attention.py

model.py

model.py

Repository files navigation

minGPT with Synthesizer Attention

About

Releases

Packages

Languages

rgivhan/minGPT-with-Synthesizer-Attention

Folders and files

Latest commit

History

Repository files navigation

minGPT with Synthesizer Attention

About

Resources

Stars

Watchers

Forks

Languages