[Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter) by yonigozlan · Pull Request #45661 · huggingface/transformers

yonigozlan · 2026-04-27T19:03:34Z

What does this PR do?

This PR aims to solve three different issues with the existing conversion mapping system.

The registry only accepted model_type strings. This made it impossible to have different conversions for e.g. DetrForSegmentation vs DetrForObjectDetection both map to "detr", so the segmentation-specific mask-head and bbox-attention renames ended up polluting the shared entry and running on all DETR variants. The registry now accepts class names too, with class name taking priority over model_type during lookup.
Sub-models transforms weren't scoped, and a child model mapping could leak into a parent model's: A WeightRenaming registered for a child ViTModel loaded with AutoModel would be applied to the full key space of whatever model contained it. PrefixChange had a partial fix (with_submodel_prefix) but plain renames were left global. This PR adds a scope_prefix field on WeightTransform: when set, rename_source_key strips the prefix before matching and re-attaches it after. All sub-module transforms get their scope_prefix set in get_model_conversion_mapping.
rename_source_key previously applied all renamings first, then converters as a separate phase regardless of their registered order. The function now now takes a single weight_transforms list and processes it in order, so renames and converters naturally interleave as intended.
Mappings with regex targeting the beginning of the string ("^") would be ignored in a child model, which is not what we want imo. This issue was hiding some other inconsistencies in the current mappings for VLMs (which still causes issue on the main branch, for example we get missing weights when loading llava weights in LlavaModel on the main branch). This is now fix in conversion_mappings, with class specific mappings to distinguish base model mappings and ForConditionalGeneration mappings in vlms.

HuggingFaceDocBuilderDev · 2026-04-27T19:15:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…erter

…rleaving renames and converts ops

ArthurZucker

Makes a lot of sense to have a class based mapping!

ArthurZucker · 2026-04-28T02:11:04Z

+    for module_name, submodule in model.named_modules():
+        if not isinstance(submodule, PreTrainedModel):
+            continue


this is something we use a lot across the repo, let's store it once at init time please! 🤗

ArthurZucker · 2026-04-28T02:13:16Z

    source_key: str,
-    weight_renamings: list[WeightRenaming],
-    weight_converters: list[WeightConverter],
+    weight_transforms: list[WeightTransform],


I think vllm neeeded a separation from both

I don't see any call to rename_source_key in vllm? not sure if I understood the comment

ArthurZucker · 2026-04-28T02:14:48Z

+    Transforms are applied in their natural interleaved order (the order they appear in the list).
+    When a ``WeightConverter`` matches, it is recorded as the source pattern and remaining
+    ``WeightRenaming`` transforms continue to run, which is required when a scoped
+    ``WeightConverter`` must fire *before* a renaming that strips the scope prefix.


this is extremely important, can you give an example in the doc. This is a departure from what we had before and it was kinda clearer

tarekziade · 2026-04-28T07:30:03Z

semi-related, I am refactoring a bit WeightConverter see #45635

Cyrilvallez

Just a few thoughts, before checking the exact mappings (where we need to be EXTREMELY careful)!

Cyrilvallez · 2026-04-28T08:51:26Z

-    if model_type is not None:
-        model_specific_conversions = get_checkpoint_conversion_mapping(model_type)
-        # In this case, add the prefix to `PrefixChange` instances, in order to know where to add/remove the prefix
-        if model_specific_conversions is not None and model_prefix != "":
-            for i, conversion in enumerate(model_specific_conversions):
-                # In this case, add the prefix, as otherwise we don't know where we need to re-add it exactly in the module name chain
-                if isinstance(conversion, PrefixChange):
-                    model_specific_conversions[i] = conversion.with_submodel_prefix(model_prefix)
-        return model_specific_conversions
-    return None


Why remove the prefix when using submodels registry? For now it was only added for PrefixChange, but logically it would make sense for all Transforms

Cyrilvallez · 2026-04-28T08:55:00Z

+        is_root_model = module_name == ""
+        if not is_root_model:
+            # Scope each transform so it only matches keys under this sub-module's prefix.
+            for transform in conversions:
+                transform.scope_prefix = module_name
+        weight_conversions.extend(conversions)


Ha yes ok, see my previous comment as well. I think it was cleaner with the with_submodel_prefix (that we can maybe rename to from_submodel_prefix?), instead of adding the attribute like this. I was already thinking about adding it for all Transforms before

Cyrilvallez · 2026-04-28T08:56:37Z

+        # When scoped, only process keys under the prefix; patterns operate on the bare suffix.
+        prefix_dot = None
+        key_to_match = source_key
+        if self.scope_prefix is not None:
+            prefix_dot = self.scope_prefix + "."
+            if not source_key.startswith(prefix_dot):
+                return source_key, None
+            key_to_match = source_key[len(prefix_dot) :]


See previous comment, I think rebuilding the Transform directly with the prefix is cleaner

It adds a lot of complexity though, as adding a prefix to a pattern can break the pattern (e.g. if we match only the start of a word with ^ in a child source pattern, adding a prefix breaks the pattern).
Removing the prefix before the search is more robust imo

Cyrilvallez · 2026-04-28T09:02:45Z

+    for transform in weight_transforms:
+        if isinstance(transform, WeightConverter):
+            if source_pattern is not None:
+                # Already matched a converter; skip subsequent converters.
+                continue
+            renamed_key, sp = transform.rename_source_key(renamed_key)
+            if sp is not None:
+                source_pattern = sp
+        else:
+            renamed_key, _ = transform.rename_source_key(renamed_key)


WeightRenaming and WeightConverter are and should be independent from each other. If WeightConverter matches, it is responsible for the Renaming as well. So any following WeightRenaming in the list should NOT match
If you have a counter-example, happy to see it, but it probably means the conversion is ill-written I believe, or the logic collapses somewhere

I don't think we can really enforce this independence for nested models.
For nested models, you can end up with a root rename and a scoped converter chained on the same key. E.g. a composite model that maps old_prefix → model.vlm at the root, and whose model.vlm sub-module has a q_proj → qkv_proj converter:

# load: old_prefix.q_proj →[rename]→ model.vlm.q_proj →[converter]→ model.vlm.qkv_proj # save (list inverted): # model.vlm.q_proj →[rev converter]→ model.vlm.qkv_proj →[rev rename]→ old_prefix.qkv_proj

Any fixed two-phase ordering breaks one direction: "all renames first" works on load but on save the reversed rename fires before the reversed converter sees model.vlm.*, so the converter misses. "All converters first" is the mirror problem. Interleaved list order is the only way both directions are correct.

Btw I have pretty much this exact scenario in the RF DETR PR

yonigozlan · 2026-04-28T21:25:38Z

Cc @zucchini-nlp for viz, this PR allows to remove a lot of test_reverse_mapping overrides, and also fixes loading/saving weights with all the base VLM models (e.g. LlavaModel, currently broken on main)

yonigozlan · 2026-04-28T21:29:25Z

run-slow: aya_vision, colpali, colqwen2, conditional_detr, detr, emu3, fuyu, gemma3, got_ocr2, internvl, llava, llava_next, llava_next_video, llava_onevision, maskformer, mistral3, mllama, paligemma, qwen2_5_vl, qwen2_vl, shieldgemma2, video_llava, vipllava, pp_chart2table

github-actions · 2026-04-28T21:30:30Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aya_vision, colpali, colqwen2, emu3, fuyu, gemma3, got_ocr2, internvl, llava, llava_next, llava_next_video, llava_onevision, mistral3, mllama, paligemma, qwen2_5_vl

github-actions · 2026-04-28T21:30:47Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/aya_vision", "models/colpali", "models/colqwen2", "models/conditional_detr", "models/detr", "models/emu3", "models/fuyu", "models/gemma3", "models/got_ocr2", "models/internvl", "models/llava", "models/llava_next", "models/llava_next_video", "models/llava_onevision", "models/maskformer", "models/mistral3", "models/mllama", "models/paligemma", "models/pp_chart2table", "models/qwen2_5_vl", "models/qwen2_vl", "models/shieldgemma2", "models/video_llava", "models/vipllava"]
quantizations: []

github-actions · 2026-04-28T21:47:38Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45661&sha=f0eeb8

mappings on classes, scoping for every transforms

7ad712a

yonigozlan requested review from ArthurZucker, Cyrilvallez and vasqu April 27, 2026 19:03

fix style

c63a7d8

yonigozlan added 3 commits April 27, 2026 19:22

Merge remote-tracking branch 'upstream/main' into improve-weight-conv…

f2d7154

…erter

Fix deduplication removes submodel mappings of the same type

8f726c7

Fix scoped WeightConverter not applied in the correct order, now inte…

fd2c613

…rleaving renames and converts ops

yonigozlan changed the title ~~[Weight Converter] More fine-grained mappings on classes, scoping for every transforms~~ [Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter) Apr 27, 2026

temp fix paligemma

28ed270

ArthurZucker reviewed Apr 28, 2026

View reviewed changes

Fix incompatible mappings between head and base model for VLMs

c75244e

tarekziade mentioned this pull request Apr 28, 2026

[Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter) tarekziade/tarekziade-transformers-reviewer-test#11

Open

Cyrilvallez reviewed Apr 28, 2026

View reviewed changes

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

yonigozlan added 2 commits April 28, 2026 14:47

fix gemma3 mapping

1dbd3da

Fix more issues, address reviews

b799f32

Merge branch 'main' into improve-weight-converter

f0eeb8f

Conversation

yonigozlan commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 27, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tarekziade commented Apr 28, 2026

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan commented Apr 28, 2026

Uh oh!

yonigozlan commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yonigozlan commented Apr 27, 2026 •

edited

Loading

yonigozlan Apr 28, 2026 •

edited

Loading

yonigozlan Apr 28, 2026 •

edited

Loading