-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maybe scale is wrong #3
Comments
ohh no, that is actually the learned temperature from a variant of attention (cosine similarity attention) https://github.com/lucidrains/x-transformers#query-key-normalization the temperature is in log space, exponentiated here https://github.com/lucidrains/memorizing-transformers-pytorch/blob/main/memorizing_transformers_pytorch/memorizing_transformers_pytorch.py#L235 |
@denadai2 ohh if you were looking for the sigmoid gating, i removed that, since it was not working well for me and another researcher (thought that was one of the weak parts of the paper). i went with the other researcher's suggestion of attending across the similarities, local and distant (softmax across the attention logits concatted) |
thanks for the prompt answer! I saw it now :) btw this increases the complexity I'd say... it makes sense though |
memorizing-transformers-pytorch/memorizing_transformers_pytorch/memorizing_transformers_pytorch.py
Line 237 in 83fa147
Shouldn't this be (1-scale)?
The text was updated successfully, but these errors were encountered: