Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe scale is wrong #3

Open
denadai2 opened this issue May 20, 2022 · 3 comments
Open

Maybe scale is wrong #3

denadai2 opened this issue May 20, 2022 · 3 comments

Comments

@denadai2
Copy link

denadai2 commented May 20, 2022

sim = einsum('b h i d, b j d -> b h i j', q, k) * scale

Shouldn't this be (1-scale)?

@denadai2 denadai2 changed the title Maybe sim is wrong Maybe scale is wrong May 20, 2022
@lucidrains
Copy link
Owner

ohh no, that is actually the learned temperature from a variant of attention (cosine similarity attention) https://github.com/lucidrains/x-transformers#query-key-normalization the temperature is in log space, exponentiated here https://github.com/lucidrains/memorizing-transformers-pytorch/blob/main/memorizing_transformers_pytorch/memorizing_transformers_pytorch.py#L235

@lucidrains
Copy link
Owner

lucidrains commented May 20, 2022

@denadai2 ohh if you were looking for the sigmoid gating, i removed that, since it was not working well for me and another researcher (thought that was one of the weak parts of the paper). i went with the other researcher's suggestion of attending across the similarities, local and distant (softmax across the attention logits concatted)

@denadai2
Copy link
Author

denadai2 commented May 20, 2022

thanks for the prompt answer! I saw it now :)

btw this increases the complexity I'd say... it makes sense though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants