Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are there so many position embeddings? #34

Closed
jasperhyp opened this issue Dec 6, 2022 · 5 comments
Closed

Why are there so many position embeddings? #34

jasperhyp opened this issue Dec 6, 2022 · 5 comments

Comments

@jasperhyp
Copy link

Hi! Thanks for your great work, it's very helpful for my project! I was just curious why there are so many position embeddings. Essentially it looks like the sequence is also being added a (1 to n) pos emb initially in the RETRO class, and then in each attention module rotary embeddings are added again. I thought just two in the Attention and CCA would be quite enough. Thanks in advance!

@lucidrains
Copy link
Owner

one is absolute positional embedding, the other is relative positional embedding (you need the relative positional embeddings for the CCA to work well)

rotary embeddings is one of the strongest relative positional embeddings out there

@jasperhyp
Copy link
Author

Makes sense, thank you! And meanwhile only the sequence being modeled is added the absolute position embedding (the context/retrieved is not), is that also deliberate?

@jasperhyp
Copy link
Author

Also, another unrelated question: Just to confirm, sequences are already retrieved before training (both retrieval corpus and the training sequences are encoded by frozen BERT), is this correct?

@lucidrains
Copy link
Owner

@jasperhyp yup, that is correct

the retrieved content undergoes relative positional embedding during cross attention iirc

yes, the retrieval is done prior to training for efficiency

@jasperhyp
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants