Finish project renovation with new QKV formulation #8

mtanghu · 2022-08-29T07:52:33Z

Still working on the follow change to finish

Redo fastformer testing
Note in original README how the code and annotated and should be easy to understand
Try replacing gating with head-wise cosine similarity? -- I think this may have better backwards gradient flow and increase explainability (over simple gating). I suppose then you would have 2 "key" vectors -- think about renaming these so the math makes sense
Make a dedicated LEAP attention block that people can use, document this in new README when you have time (I suppose for now it will be decoder only, but you can easily make it bidirectional by reversing the token order and running it through again)
don't have separate linear layers to generate qkv or qffv, just have one where the output is chunked and write a comment about it
Make a short section on STRONG scaling for both readmes
Write a new README with new math and the benefit/development sections from old README, also mentioning the legacy fastformerLM that is still in the repo. Make sure to note that for now the focus is on masked attention for decoders.
Put a section about development/contributing making sure to tell people about pip install -e .

mtanghu · 2022-09-03T21:24:59Z

Checklist finished!

mtanghu mentioned this issue Aug 29, 2022

Implement a qkv formulation using additive attention #5

Merged

mtanghu added documentation Improvements or additions to documentation enhancement New feature or request labels Sep 3, 2022

mtanghu closed this as completed Sep 3, 2022

Provide feedback