[Modular Dependencies] Fixup qwen rms norms#43772
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Super nice, happy to get rid of bad modular patterns!
| @use_kernel_forward_from_hub("RMSNorm") | ||
| class Dots1RMSNorm(nn.Module): | ||
| def __init__(self, hidden_size, eps: float = 1e-6) -> None: | ||
| def __init__(self, hidden_size, eps=1e-6): |
There was a problem hiding this comment.
the type hints were pretty, can we add them in llama so it's copied to all models?
| self.norm1 = Qwen2RMSNorm(config.hidden_size, eps=1e-6) | ||
| self.norm2 = Qwen2RMSNorm(config.hidden_size, eps=1e-6) | ||
| self.norm1 = Qwen2_5OmniRMSNorm(config.hidden_size, eps=1e-6) | ||
| self.norm2 = Qwen2_5OmniRMSNorm(config.hidden_size, eps=1e-6) |
There was a problem hiding this comment.
finally, have been annoyed by this as well!
ArthurZucker
left a comment
There was a problem hiding this comment.
we should still keep the alias IMO
| "Qwen2PreTrainedModel", | ||
| "Qwen2Model", | ||
| "Qwen2ForCausalLM", | ||
| "Qwen2RMSNorm", |
There was a problem hiding this comment.
this is breaking haha
There was a problem hiding this comment.
Fair enough, it shouldn't have been this way in the beginning but it's not worth to break
|
[For maintainers] Suggested jobs to run (before merge) run-slow: afmoe, aimv2, apertus, arcee, aria, bamba, bitnet, blt, chameleon, clvp, csm, cwm, deepseek_v2, deepseek_v3, dia, diffllama |
* fix * type hints
As per title, this led to weird dependencies where modeling files used direct imports