GitHub - lucidrains/all-normalization-transformer at https://githubhelp.com

Transformer with Normalized Attention

A Transformer that consists of only normalization as its sole non-linearity, as proposed in the paper Normalized Attention Without Probability Cage. This repository will build on the paper's contributions and attempt to make it work for the auto-regressive case.

Update - It works. You can have an entire language model built on only matrix multiplies and normalization.

Pre-requisites

$ pip install -r requirements.txt

Train

$ python train_enwik8.py

Citations

@misc{richter2020normalized,
    title={Normalized Attention Without Probability Cage},
    author={Oliver Richter and Roger Wattenhofer},
    year={2020},
    eprint={2005.09561},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
all_normalization_transformer		all_normalization_transformer
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_enwik8.py		train_enwik8.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all_normalization_transformer

all_normalization_transformer

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

train_enwik8.py

train_enwik8.py

Repository files navigation

Transformer with Normalized Attention

Pre-requisites

Train

Citations

About

Releases

Packages

Languages

License

lucidrains/all-normalization-transformer

Folders and files

Latest commit

History

Repository files navigation

Transformer with Normalized Attention

Pre-requisites

Train

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Languages