[Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter)#45661
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Makes a lot of sense to have a class based mapping!
| for module_name, submodule in model.named_modules(): | ||
| if not isinstance(submodule, PreTrainedModel): | ||
| continue |
There was a problem hiding this comment.
this is something we use a lot across the repo, let's store it once at init time please! 🤗
| source_key: str, | ||
| weight_renamings: list[WeightRenaming], | ||
| weight_converters: list[WeightConverter], | ||
| weight_transforms: list[WeightTransform], |
There was a problem hiding this comment.
I think vllm neeeded a separation from both
There was a problem hiding this comment.
I don't see any call to rename_source_key in vllm? not sure if I understood the comment
| Transforms are applied in their natural interleaved order (the order they appear in the list). | ||
| When a ``WeightConverter`` matches, it is recorded as the source pattern and remaining | ||
| ``WeightRenaming`` transforms continue to run, which is required when a scoped | ||
| ``WeightConverter`` must fire *before* a renaming that strips the scope prefix. |
There was a problem hiding this comment.
this is extremely important, can you give an example in the doc. This is a departure from what we had before and it was kinda clearer
|
semi-related, I am refactoring a bit |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Just a few thoughts, before checking the exact mappings (where we need to be EXTREMELY careful)!
| if model_type is not None: | ||
| model_specific_conversions = get_checkpoint_conversion_mapping(model_type) | ||
| # In this case, add the prefix to `PrefixChange` instances, in order to know where to add/remove the prefix | ||
| if model_specific_conversions is not None and model_prefix != "": | ||
| for i, conversion in enumerate(model_specific_conversions): | ||
| # In this case, add the prefix, as otherwise we don't know where we need to re-add it exactly in the module name chain | ||
| if isinstance(conversion, PrefixChange): | ||
| model_specific_conversions[i] = conversion.with_submodel_prefix(model_prefix) | ||
| return model_specific_conversions | ||
| return None |
There was a problem hiding this comment.
Why remove the prefix when using submodels registry? For now it was only added for PrefixChange, but logically it would make sense for all Transforms
| is_root_model = module_name == "" | ||
| if not is_root_model: | ||
| # Scope each transform so it only matches keys under this sub-module's prefix. | ||
| for transform in conversions: | ||
| transform.scope_prefix = module_name | ||
| weight_conversions.extend(conversions) |
There was a problem hiding this comment.
Ha yes ok, see my previous comment as well. I think it was cleaner with the with_submodel_prefix (that we can maybe rename to from_submodel_prefix?), instead of adding the attribute like this. I was already thinking about adding it for all Transforms before
| # When scoped, only process keys under the prefix; patterns operate on the bare suffix. | ||
| prefix_dot = None | ||
| key_to_match = source_key | ||
| if self.scope_prefix is not None: | ||
| prefix_dot = self.scope_prefix + "." | ||
| if not source_key.startswith(prefix_dot): | ||
| return source_key, None | ||
| key_to_match = source_key[len(prefix_dot) :] |
There was a problem hiding this comment.
See previous comment, I think rebuilding the Transform directly with the prefix is cleaner
There was a problem hiding this comment.
It adds a lot of complexity though, as adding a prefix to a pattern can break the pattern (e.g. if we match only the start of a word with ^ in a child source pattern, adding a prefix breaks the pattern).
Removing the prefix before the search is more robust imo
| for transform in weight_transforms: | ||
| if isinstance(transform, WeightConverter): | ||
| if source_pattern is not None: | ||
| # Already matched a converter; skip subsequent converters. | ||
| continue | ||
| renamed_key, sp = transform.rename_source_key(renamed_key) | ||
| if sp is not None: | ||
| source_pattern = sp | ||
| else: | ||
| renamed_key, _ = transform.rename_source_key(renamed_key) |
There was a problem hiding this comment.
WeightRenaming and WeightConverter are and should be independent from each other. If WeightConverter matches, it is responsible for the Renaming as well. So any following WeightRenaming in the list should NOT match
If you have a counter-example, happy to see it, but it probably means the conversion is ill-written I believe, or the logic collapses somewhere
There was a problem hiding this comment.
I don't think we can really enforce this independence for nested models.
For nested models, you can end up with a root rename and a scoped converter chained on the same key. E.g. a composite model that maps old_prefix → model.vlm at the root, and whose model.vlm sub-module has a q_proj → qkv_proj converter:
# load: old_prefix.q_proj →[rename]→ model.vlm.q_proj →[converter]→ model.vlm.qkv_proj
# save (list inverted):
# model.vlm.q_proj →[rev converter]→ model.vlm.qkv_proj →[rev rename]→ old_prefix.qkv_projAny fixed two-phase ordering breaks one direction: "all renames first" works on load but on save the reversed rename fires before the reversed converter sees model.vlm.*, so the converter misses. "All converters first" is the mirror problem. Interleaved list order is the only way both directions are correct.
Btw I have pretty much this exact scenario in the RF DETR PR
|
Cc @zucchini-nlp for viz, this PR allows to remove a lot of test_reverse_mapping overrides, and also fixes loading/saving weights with all the base VLM models (e.g. LlavaModel, currently broken on main) |
|
run-slow: aya_vision, colpali, colqwen2, conditional_detr, detr, emu3, fuyu, gemma3, got_ocr2, internvl, llava, llava_next, llava_next_video, llava_onevision, maskformer, mistral3, mllama, paligemma, qwen2_5_vl, qwen2_vl, shieldgemma2, video_llava, vipllava, pp_chart2table |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aya_vision, colpali, colqwen2, emu3, fuyu, gemma3, got_ocr2, internvl, llava, llava_next, llava_next_video, llava_onevision, mistral3, mllama, paligemma, qwen2_5_vl |
|
This comment contains models: ["models/aya_vision", "models/colpali", "models/colqwen2", "models/conditional_detr", "models/detr", "models/emu3", "models/fuyu", "models/gemma3", "models/got_ocr2", "models/internvl", "models/llava", "models/llava_next", "models/llava_next_video", "models/llava_onevision", "models/maskformer", "models/mistral3", "models/mllama", "models/paligemma", "models/pp_chart2table", "models/qwen2_5_vl", "models/qwen2_vl", "models/shieldgemma2", "models/video_llava", "models/vipllava"] |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45661&sha=f0eeb8 |
What does this PR do?
This PR aims to solve three different issues with the existing conversion mapping system.
The registry only accepted
model_typestrings. This made it impossible to have different conversions for e.g.DetrForSegmentationvsDetrForObjectDetectionboth map to "detr", so the segmentation-specific mask-head and bbox-attention renames ended up polluting the shared entry and running on all DETR variants. The registry now accepts class names too, with class name taking priority overmodel_typeduring lookup.Sub-models transforms weren't scoped, and a child model mapping could leak into a parent model's: A
WeightRenamingregistered for a childViTModelloaded withAutoModelwould be applied to the full key space of whatever model contained it.PrefixChangehad a partial fix (with_submodel_prefix) but plain renames were left global. This PR adds ascope_prefixfield onWeightTransform: when set,rename_source_keystrips the prefix before matching and re-attaches it after. All sub-module transforms get theirscope_prefixset inget_model_conversion_mapping.rename_source_keypreviously applied all renamings first, then converters as a separate phase regardless of their registered order. The function now now takes a singleweight_transformslist and processes it in order, so renames and converters naturally interleave as intended.Mappings with regex targeting the beginning of the string ("^") would be ignored in a child model, which is not what we want imo. This issue was hiding some other inconsistencies in the current mappings for VLMs (which still causes issue on the main branch, for example we get missing weights when loading llava weights in
LlavaModelon the main branch). This is now fix inconversion_mappings, with class specific mappings to distinguish base model mappings andForConditionalGenerationmappings in vlms.