[loading] Fix Transpose Operation, and qwen3_vl_moe mapping #43307

Cyrilvallez · 2026-01-15T15:50:53Z

What does this PR do?

As per the title. Transpose needs to be more general in order to be used in reverse mode in a chain of several operations.
Supersedes #43201 as it needed more work on the Transpose Operation!

Will also help #43227

IlyasMoutawwakil · 2026-01-15T16:16:20Z

src/transformers/core_model_loading.py

+    def get_target_pattern(
+        self, input_dict: dict[str, torch.Tensor], source_patterns: list[str], target_patterns: list[str]
+    ) -> str:
+        if len(input_dict) != 1:
+            raise ValueError("Undefined Operation encountered!")
+        # Here it's the first operation of a chain, so return the source
+        if len(target_patterns) > 1:
+            # Here it's the first operation of a chain, so return the source
+            if len(source_patterns) == 1:
+                return source_patterns[0]
+            else:
+                raise ValueError("Undefined Operation encountered!")
+        # Here it's the only operation, or the last operation in a chain, so we return the target
+        else:
+            return target_patterns[0]


so it has to either be the first or the last in a chain of ops ? I can see the dequantization ops breaking this as they extend the chain 🥲 correct me if i'm wrong

Nop, should be alright with quantization! They do not change the targets/sources!

HuggingFaceDocBuilderDev · 2026-01-15T16:27:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Sounds good but concerned about TP + Transpose, we need a more abstract solution like updating the shard dim based on op at init of ops?

ArthurZucker · 2026-01-15T17:08:23Z

src/transformers/core_model_loading.py

+    def __init__(self, dim0: int = 0, dim1: int = 1):
+        self.dim0 = dim0
+        self.dim1 = dim1


should we pass the full converter to change the shard dim at init?

Hmm, but aren't you assuming that we need to change the shard dim based on transpose? At least the transpose that do use it make it so that they are aligned with other models, i.e. they will need to be sharded the same way as intended.

There might be a few models that shard differently tho, I would consider this not part of the conversion op tho - otherwise we will mix in our assumptions 🤔

It's due to ops order - sharding happens before Transpose, so then if you Transpose the dim that was sharded on, you've got an issue

Ah I messed up the order, I thought it was transpose then shard

Life would be too nice... 😆

Cyrilvallez · 2026-01-15T18:10:54Z

Actually I checked and the params on which Transpose are used are not part of the TP plan anyway. So i'd rather merge this PR now as it fixes the model and unblocks other PRs, then we can see in the future about the TP shard dim if the issue comes up

vasqu · 2026-01-15T18:12:51Z

Can we add a comment at least, TODO/FIXME or similar at least? This is quite important (although not used with TP atm)

Cyrilvallez · 2026-01-15T18:16:10Z

Done. I think it's easier this way as it's not as straightforward as I thought it would be to always switch the shard dim if required

…lly - tp fully broken rn

vasqu

Discussed internally, we will need to double-check many current ops that we already have

github-actions · 2026-01-16T11:26:08Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43307&sha=9a937a

Cyrilvallez added 3 commits January 15, 2026 16:49

fix

873cf5b

oups

001c604

style

093713e

IlyasMoutawwakil reviewed Jan 15, 2026

View reviewed changes

remove duplicated comment

8f51f29

ArthurZucker reviewed Jan 15, 2026

View reviewed changes

add TODO

73f13e2

oupsi style

306333b

vasqu mentioned this pull request Jan 15, 2026

Transformer version: 5.0.0.dev0 breaks Qwen3VL Moe models loading #43299

Closed

4 tasks

vasqu linked an issue Jan 16, 2026 that may be closed by this pull request

Transformer version: 5.0.0.dev0 breaks Qwen3VL Moe models loading #43299

Closed

4 tasks

remove TODO as basically every op will need to be checked very carefu…

9a937ac

…lly - tp fully broken rn

vasqu approved these changes Jan 16, 2026

View reviewed changes

Cyrilvallez merged commit dfdede2 into main Jan 16, 2026
24 of 26 checks passed

Cyrilvallez deleted the fix-transpose branch January 16, 2026 11:27

Cyrilvallez mentioned this pull request Jan 16, 2026

[bugfix] fix qwen3vlmoe loading #43201

Closed

[loading] Fix Transpose Operation, and qwen3_vl_moe mapping #43307

[loading] Fix Transpose Operation, and qwen3_vl_moe mapping #43307

Conversation

Cyrilvallez commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

IlyasMoutawwakil Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jan 15, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez commented Jan 15, 2026

Uh oh!

vasqu commented Jan 15, 2026

Uh oh!

Cyrilvallez commented Jan 15, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Cyrilvallez commented Jan 15, 2026 •

edited

Loading

IlyasMoutawwakil Jan 15, 2026 •

edited

Loading