You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note in original README how the code and annotated and should be easy to understand
Try replacing gating with head-wise cosine similarity? -- I think this may have better backwards gradient flow and increase explainability (over simple gating). I suppose then you would have 2 "key" vectors -- think about renaming these so the math makes sense
Make a dedicated LEAP attention block that people can use, document this in new README when you have time (I suppose for now it will be decoder only, but you can easily make it bidirectional by reversing the token order and running it through again)
don't have separate linear layers to generate qkv or qffv, just have one where the output is chunked and write a comment about it
Make a short section on STRONG scaling for both readmes
Write a new README with new math and the benefit/development sections from old README, also mentioning the legacy fastformerLM that is still in the repo. Make sure to note that for now the focus is on masked attention for decoders.
Put a section about development/contributing making sure to tell people about pip install -e .
The text was updated successfully, but these errors were encountered:
Still working on the follow change to finish
The text was updated successfully, but these errors were encountered: