-
Notifications
You must be signed in to change notification settings - Fork 31.2k
🚨 Generalize get_decoder() for multimodal and delete redundant code 🔪
#42156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
get_decoder() for multimodal and delete redundant code 🔪 get_decoder() for multimodal and delete redundant code 🔪
| model = GPT2LMHeadModel(cfg) | ||
| dec = model.get_decoder() | ||
|
|
||
| assert dec is model, f"GPT2 get_decoder() should return self (fallback), got {type(dec)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prev helper didn't cover all edge cases! This should be the base model, if we compare with other LLMs (e.g. llama)
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice unbloating 🔪
OK for me, just would be cool to add to the make style/ruff rules/quality check to reduce cognitive load
| Symmetric setter. Mirrors the lookup logic used in `get_encoder`. | ||
| """ | ||
|
|
||
| # NOTE: new models need to use existing names for layers if possible, so this list doesn't grow infinitely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To note, this should be enforced in make fixup in code consistency part to save ourselves the hassle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, isn't it going to be a huge limitation for contributors if we force it and auto-renam with fix-copies? Imo we need to communicate it when reviewing and explain why it's important. It's only a few ppl reviewing VLMs currently, so it might be easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the make fixup updated message (or rather code-quality check on the CI, same) would be informative enough, saying "decoder layer names should be part of this list: ..." rather than auto-renaming. Could be a ruff warning if we think it's too restrictive as an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, lemme see where I can fit this in a non-disruptive way. Not sure if users actually read the warnings, we should be more strict in review process in any case imo 😆
|
Merge conflicts after a big refactor 😢 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aria, autoformer, aya_vision, bart, bigbird_pegasus, blenderbot, blenderbot_small, blip_2, cohere2_vision, colqwen2, conditional_detr, d_fine, dab_detr, deformable_detr, detr, dia |
|
hey @jackzhxng, I remember you requested this feature for multimodal_model.get_decoder() -> returns the decoding LM
multimodal_model.get_encoder() -> returns the encoding LM if any
multimodal_model.get_encoder(modality="image") -> returns the encoding vision tower if any
multimodal_model.get_encoder(modality="audio") -> returns the encoding audio tower if anyalso cc @hmellor, we also discussed it re vLLM |
|
Hi @zucchini-nlp this PR causes an issue with PEFT as (at least some) decoder models now have from transformers import AutoModelForCausalLM
model_id = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(model_id)
assert not hasattr(model, "get_encoder")
# after this PR, model.get_encoder() returns model.modelThis works with the previous commit ( |
|
@zucchini-nlp this is amazing thank you! |
|
@BenjaminBossan ideally it should return All models will have |
Great, please let me know when the PR is there.
We can modify PEFT to take this into account. But at least to me, this API feels a bit strange to be honest. |
|
@BenjaminBossan yeah, it is because we have |
When using mixed adapter batches (i.e. using different LoRA adapters in the same batch), users have to pass adapter_names. When simultaneously using beam search, these adapter names have to be extended by the number of beams. For encoder-decoder models, even when applying beam search, the encoder part of the model should, however, not use the extended adapter_names. This is because the encoder still uses the original, non-extended samples. The need for this used to be checked by calling model.get_encoder(). However, with transformers v5, every PretrainedModel will have a get_encoder method. The new convention is that it will return self if there is no encoder. This is now what's being checked. huggingface/transformers#42156 Note that said PR contains a small bug that leads to self not always being returned. Therefore, for the full fix of the issue on transformers main, we also need to await this PR: huggingface/transformers#42295
When using mixed adapter batches (i.e. using different LoRA adapters in the same batch), users have to pass adapter_names. When simultaneously using beam search, these adapter names have to be extended by the number of beams. For encoder-decoder models, even when applying beam search, the encoder part of the model should, however, not use the extended adapter_names. This is because the encoder still uses the original, non-extended samples. The need for this used to be checked by calling model.get_encoder(). However, with transformers v5, every PretrainedModel will have a get_encoder method. The new convention is that it will return self if there is no encoder. This is now what's being checked. huggingface/transformers#42156 Note that said PR contains a small bug that leads to self not always being returned. Therefore, for the full fix of the issue on transformers main, we also need to await this PR: huggingface/transformers#42295
What does this PR do?
As per title, blocked by #41589 for VLMs! We should be able to use
get_decoder()to get the LM part of any model after this and have much less duplicate code. Same foes for theget_encoder()to get the encoder if the model has a separate encoding module. In comparison to decoder, we can have specific encoder per modality so the helper will acceptmodalityas argUniversal helper first reduces duplicate code, nudges us to use standardized names for major modules and can be used by 3rd party libraries. Right now we have 5 ways to name a vision encoder!
🚨 Breaking changes, ig we can break helpers for v5:
self.language_modeldirectly from task-model and users will need to callself.get_decoder()get_text_encoderandget_audio_encoderin some audio models because functionality is covered now by theget_encoder()