Conversation
Something is off. The code shows Llama4TextL2Norm function name, but it computes RMS Norm. Which one is the correct one for LLAMA4? RMS or L2? If so, the function should be renamed (if RMS) or modified (if L2) accordingly.
|
Good point, this function is probably wrong! The main reason it wasn't visible as a bug is that if self.config.use_qk_norm and self.use_rope:
self.qk_norm = Llama4TextL2Norm(config.rms_norm_eps)I believe |
|
@Rocketknight1 on the config.json for LLAMA4 Scout, use_qk_norm is set as true. But then if you look at the TextAttention function, it also requires that use_rope is true (line 310). But then use_rope is always true, because in config.json the noropelayers is an empty list. Therefore, qk_norm is always computed for llama 4 scout. My question though is: is the norm supposed to be actual L2 norm or RMS norm? |
|
Hi @alfredo-etched, I didn't realize that If |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: llama4 |
I hoped I could get a confirmation of which normalization is the correct one from someone at Meta who helped writing this code, rather than some guesswork. Exactly because the difference is small, it might be that the models behave ALMOST the same, but one is correct and the other one is not, and it might be complex to ascertain it by testing the model for correctness of inference. |
|
Hi @alfredo-etched, although the difference is small in terms of code changes, it's very large numerically! The difference between ``sqrt(sum()) |
|
Feel free to test it, though - change |
Something is off. The code shows Llama4TextL2Norm function name, but it computes RMS Norm. Which one is the correct one for LLAMA4? RMS or L2? If so, the function should be renamed (if RMS) or modified (if L2) accordingly.
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.