fix(models, testing): Fix Llama4 vision rotary meta tensor initialization and MyT5 get_tokenizer signature#44581
Conversation
|
The failing test |
|
Hi @harshaljanjani, I don't like this fix unfortunately! The The standard way for handling meta device initialization is to do param initialization in |
|
Thanks for your time @Rocketknight1. If I understand your point correctly, you're referring to this pattern, among others right? I'll remove the forward meta device check and have |
| elif isinstance(module, Llama4VisionRotaryEmbedding): | ||
| module.freqs_ci = module._compute_freqs_ci(module.config) |
There was a problem hiding this comment.
Sorry, I should have been clearer. The problem is that in the __init__ you register a buffer, but this line clobbers the freqs_ci attribute with a totally new tensor., which makes the __init__() line pointless. What you probably want to do is module.freqs_ci.copy_(module._compute_freqs_ci(module.config)), which will preserve the tensor object and simply initialize the right values for it, which is how weight init is supposed to work!
There was a problem hiding this comment.
Ahh gotchya, thanks for taking the time! Hopefully this should be better 🤗
|
[For maintainers] Suggested jobs to run (before merge) run-slow: llama4, myt5 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
The following issues were identified and fixed in this PR:
→ Llama-4 Vision: freqs_ci is stored as a plain attr in Llama4VisionRotaryEmbedding. When
from_pretrainedinitializes the model withdevice_map="auto", all tensors become meta tensors, butfreqs_ciis not registered and never materialized to device giving an error when copying out of the meta tensor. Fixed by registering it as a buffer and adding a meta-device recompute guard inforward.→ MyT5: 05c0e1d ("rm slow tokenizers") refactored TokenizerTesterMixin.get_tokenizer to accept
pretrained_nameas a positional argument and changed the call sites in the base class tests accordingly, but MyT5TokenizationTest.get_tokenizer was never updated to match; this should fix that. Took the canonical pattern from the other models (BARTpho, CANINE, CLVP, etc.) and used it here.cc: @Rocketknight1 @itazap
CI Failures
Before the fixes (feel free to cross-check; these errors are reproducible):
After the fixes (feel free to cross-check):
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.