Skip to content

[Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter)#45661

Open
yonigozlan wants to merge 10 commits intohuggingface:mainfrom
yonigozlan:improve-weight-converter
Open

[Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter)#45661
yonigozlan wants to merge 10 commits intohuggingface:mainfrom
yonigozlan:improve-weight-converter

Conversation

@yonigozlan
Copy link
Copy Markdown
Member

@yonigozlan yonigozlan commented Apr 27, 2026

What does this PR do?

This PR aims to solve three different issues with the existing conversion mapping system.

  • The registry only accepted model_type strings. This made it impossible to have different conversions for e.g. DetrForSegmentation vs DetrForObjectDetection both map to "detr", so the segmentation-specific mask-head and bbox-attention renames ended up polluting the shared entry and running on all DETR variants. The registry now accepts class names too, with class name taking priority over model_type during lookup.

  • Sub-models transforms weren't scoped, and a child model mapping could leak into a parent model's: A WeightRenaming registered for a child ViTModel loaded with AutoModel would be applied to the full key space of whatever model contained it. PrefixChange had a partial fix (with_submodel_prefix) but plain renames were left global. This PR adds a scope_prefix field on WeightTransform: when set, rename_source_key strips the prefix before matching and re-attaches it after. All sub-module transforms get their scope_prefix set in get_model_conversion_mapping.

  • rename_source_key previously applied all renamings first, then converters as a separate phase regardless of their registered order. The function now now takes a single weight_transforms list and processes it in order, so renames and converters naturally interleave as intended.

  • Mappings with regex targeting the beginning of the string ("^") would be ignored in a child model, which is not what we want imo. This issue was hiding some other inconsistencies in the current mappings for VLMs (which still causes issue on the main branch, for example we get missing weights when loading llava weights in LlavaModel on the main branch). This is now fix in conversion_mappings, with class specific mappings to distinguish base model mappings and ForConditionalGeneration mappings in vlms.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan yonigozlan changed the title [Weight Converter] More fine-grained mappings on classes, scoping for every transforms [Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter) Apr 27, 2026
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense to have a class based mapping!

Comment thread src/transformers/conversion_mapping.py Outdated
Comment on lines +681 to +683
for module_name, submodule in model.named_modules():
if not isinstance(submodule, PreTrainedModel):
continue
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something we use a lot across the repo, let's store it once at init time please! 🤗

source_key: str,
weight_renamings: list[WeightRenaming],
weight_converters: list[WeightConverter],
weight_transforms: list[WeightTransform],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think vllm neeeded a separation from both

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any call to rename_source_key in vllm? not sure if I understood the comment

Comment thread src/transformers/core_model_loading.py Outdated
Comment on lines +1134 to +1137
Transforms are applied in their natural interleaved order (the order they appear in the list).
When a ``WeightConverter`` matches, it is recorded as the source pattern and remaining
``WeightRenaming`` transforms continue to run, which is required when a scoped
``WeightConverter`` must fire *before* a renaming that strips the scope prefix.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is extremely important, can you give an example in the doc. This is a departure from what we had before and it was kinda clearer

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@tarekziade
Copy link
Copy Markdown
Collaborator

semi-related, I am refactoring a bit WeightConverter see #45635

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few thoughts, before checking the exact mappings (where we need to be EXTREMELY careful)!

Comment on lines -638 to -647
if model_type is not None:
model_specific_conversions = get_checkpoint_conversion_mapping(model_type)
# In this case, add the prefix to `PrefixChange` instances, in order to know where to add/remove the prefix
if model_specific_conversions is not None and model_prefix != "":
for i, conversion in enumerate(model_specific_conversions):
# In this case, add the prefix, as otherwise we don't know where we need to re-add it exactly in the module name chain
if isinstance(conversion, PrefixChange):
model_specific_conversions[i] = conversion.with_submodel_prefix(model_prefix)
return model_specific_conversions
return None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the prefix when using submodels registry? For now it was only added for PrefixChange, but logically it would make sense for all Transforms

Comment on lines +724 to +729
is_root_model = module_name == ""
if not is_root_model:
# Scope each transform so it only matches keys under this sub-module's prefix.
for transform in conversions:
transform.scope_prefix = module_name
weight_conversions.extend(conversions)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha yes ok, see my previous comment as well. I think it was cleaner with the with_submodel_prefix (that we can maybe rename to from_submodel_prefix?), instead of adding the attribute like this. I was already thinking about adding it for all Transforms before

Comment thread src/transformers/core_model_loading.py Outdated
Comment on lines +687 to +694
# When scoped, only process keys under the prefix; patterns operate on the bare suffix.
prefix_dot = None
key_to_match = source_key
if self.scope_prefix is not None:
prefix_dot = self.scope_prefix + "."
if not source_key.startswith(prefix_dot):
return source_key, None
key_to_match = source_key[len(prefix_dot) :]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment, I think rebuilding the Transform directly with the prefix is cleaner

Copy link
Copy Markdown
Member Author

@yonigozlan yonigozlan Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds a lot of complexity though, as adding a prefix to a pattern can break the pattern (e.g. if we match only the start of a word with ^ in a child source pattern, adding a prefix breaks the pattern).
Removing the prefix before the search is more robust imo

Comment on lines +1142 to +1151
for transform in weight_transforms:
if isinstance(transform, WeightConverter):
if source_pattern is not None:
# Already matched a converter; skip subsequent converters.
continue
renamed_key, sp = transform.rename_source_key(renamed_key)
if sp is not None:
source_pattern = sp
else:
renamed_key, _ = transform.rename_source_key(renamed_key)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WeightRenaming and WeightConverter are and should be independent from each other. If WeightConverter matches, it is responsible for the Renaming as well. So any following WeightRenaming in the list should NOT match
If you have a counter-example, happy to see it, but it probably means the conversion is ill-written I believe, or the logic collapses somewhere

Copy link
Copy Markdown
Member Author

@yonigozlan yonigozlan Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can really enforce this independence for nested models.
For nested models, you can end up with a root rename and a scoped converter chained on the same key. E.g. a composite model that maps old_prefix → model.vlm at the root, and whose model.vlm sub-module has a q_proj → qkv_proj converter:

# load:  old_prefix.q_proj  →[rename]→  model.vlm.q_proj  →[converter]→  model.vlm.qkv_proj
# save (list inverted):
#        model.vlm.q_proj  →[rev converter]→  model.vlm.qkv_proj  →[rev rename]→  old_prefix.qkv_proj

Any fixed two-phase ordering breaks one direction: "all renames first" works on load but on save the reversed rename fires before the reversed converter sees model.vlm.*, so the converter misses. "All converters first" is the mirror problem. Interleaved list order is the only way both directions are correct.

Btw I have pretty much this exact scenario in the RF DETR PR

@yonigozlan
Copy link
Copy Markdown
Member Author

Cc @zucchini-nlp for viz, this PR allows to remove a lot of test_reverse_mapping overrides, and also fixes loading/saving weights with all the base VLM models (e.g. LlavaModel, currently broken on main)

@yonigozlan
Copy link
Copy Markdown
Member Author

run-slow: aya_vision, colpali, colqwen2, conditional_detr, detr, emu3, fuyu, gemma3, got_ocr2, internvl, llava, llava_next, llava_next_video, llava_onevision, maskformer, mistral3, mllama, paligemma, qwen2_5_vl, qwen2_vl, shieldgemma2, video_llava, vipllava, pp_chart2table

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aya_vision, colpali, colqwen2, emu3, fuyu, gemma3, got_ocr2, internvl, llava, llava_next, llava_next_video, llava_onevision, mistral3, mllama, paligemma, qwen2_5_vl

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/aya_vision", "models/colpali", "models/colqwen2", "models/conditional_detr", "models/detr", "models/emu3", "models/fuyu", "models/gemma3", "models/got_ocr2", "models/internvl", "models/llava", "models/llava_next", "models/llava_next_video", "models/llava_onevision", "models/maskformer", "models/mistral3", "models/mllama", "models/paligemma", "models/pp_chart2table", "models/qwen2_5_vl", "models/qwen2_vl", "models/shieldgemma2", "models/video_llava", "models/vipllava"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45661&sha=f0eeb8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants