Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMSnorm Implementation #101

Closed
gdevos010 opened this issue Aug 7, 2022 · 6 comments
Closed

RMSnorm Implementation #101

gdevos010 opened this issue Aug 7, 2022 · 6 comments

Comments

@gdevos010
Copy link

gdevos010 commented Aug 7, 2022

Hi lucidrains,
I was looking at adding the ScaleNorm and RMSNorm to another repo, and the implementations look almost identical. I have linked to the official implementation below. Am I missing something about the implementation? Thanks for all the great work.

https://github.com/bzhangGo/rmsnorm

@lucidrains
Copy link
Owner

@gdevos010 Hi Greg

It is a bit subtle, but the only difference is that scale norm has one shared gamma multiplier across the entire feature dimension, while rms norm has gamma in the same dimension as the model dimensions https://github.com/lucidrains/x-transformers/blob/main/x_transformers/x_transformers.py#L352 vs https://github.com/lucidrains/x-transformers/blob/main/x_transformers/x_transformers.py#L363

@lucidrains
Copy link
Owner

@gdevos010 i would recommend rms norm, as it has been proven in a number of large language models out of deepmind

@hrzn
Copy link

hrzn commented Aug 10, 2022

Thanks, that makes sense. However looking at the scale norm paper, I'm wondering whether this scaling is needed; it seems that it's 1 in the paper (referring to Eq. (5) here), but I might be missing something of course.

@lucidrains
Copy link
Owner

@hrzn ohh actually yes that appears to be an error on my part! thank you for catching that!

@lucidrains
Copy link
Owner

lucidrains commented Aug 10, 2022

@hrzn @gdevos010 here is a paper that does some head to head runs of the different types of normalizations https://arxiv.org/abs/2102.11972 may be informative for you two

@hrzn
Copy link

hrzn commented Aug 10, 2022

@hrzn @gdevos010 here is a paper that does some head to head runs of the different types of normalizations https://arxiv.org/abs/2102.11972 may be informative for you two

Oh nice, thanks. That's a very welcome paper!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants