Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retnet parameter dimension #57

Closed
allanj opened this issue Aug 14, 2023 · 2 comments
Closed

Retnet parameter dimension #57

allanj opened this issue Aug 14, 2023 · 2 comments
Assignees

Comments

@allanj
Copy link

allanj commented Aug 14, 2023

I wonder why we need twice dimensions for $\mathbf{W}_V$

image
@Yuxin-CV
Copy link

Yuxin-CV commented Aug 14, 2023

Please note that the MSR block includes an additional swish gate compared to the MHSA block in the vanilla Transformer. If we do not double the dimension of v, the MSR block will have 5d^2 parameters, while the MHSA block in the vanilla Transformer only has 4d^2 parameters. Given this scenario, it becomes challenging to determine the width and depth of a retnet for fair comparison with a baseline vanilla Transformer of the same size. Therefore, the authors decide to double the value of W_v and halve the value of d_ffn to maintain the overall parameters of each retnet block equal to 12d^2.

Alternatively, another option is to keep W_v the same as W_k and set d_ffn to 3.5d. However, it is preferable to have a wider swish gate rather than a wider mlp as ffn. For more details, please refer to https://arxiv.org/abs/2202.10447. I believe it is even better to use MSR block only and set d_v = 3.33d.

@allanj
Copy link
Author

allanj commented Aug 14, 2023

Cool, pretty much makes sense to me. Thanks for the thorough explanationa.

@allanj allanj closed this as completed Aug 14, 2023
@donglixp donglixp self-assigned this Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants