Skip to content

Add a dim check mechanism in Transpose and fix qwen3_vl_moe weight mapping#44037

Merged
Cyrilvallez merged 6 commits intomainfrom
qwen3-v2
Feb 16, 2026
Merged

Add a dim check mechanism in Transpose and fix qwen3_vl_moe weight mapping#44037
Cyrilvallez merged 6 commits intomainfrom
qwen3-v2

Conversation

@Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Feb 16, 2026

What does this PR do?

As per the title. Supersedes #43913

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO much better!

# In this case, check the shapes before transposing
else:
# NOTE: this rely on the first param name, so cannot be used for many-to-one operation
expected_shape = kwargs["model"].get_parameter(kwargs["full_layer_name"]).shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, we should probably make the TP op overwrite the shape ? or actually the shape you check is never TPed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TP is still not defined for only a Transpose anyway... IMO we should wait for support before taking a decision on this

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_vl_moe

@Cyrilvallez Cyrilvallez changed the title Add a sentinel mechanism in Transpose and fix qwen3_vl_moe weight mapping Add a dim check mechanism in Transpose and fix qwen3_vl_moe weight mapping Feb 16, 2026
@Cyrilvallez Cyrilvallez merged commit 2546978 into main Feb 16, 2026
24 of 26 checks passed
@Cyrilvallez Cyrilvallez deleted the qwen3-v2 branch February 16, 2026 16:01
aman-coder03 pushed a commit to aman-coder03/transformers that referenced this pull request Feb 17, 2026
…pping (huggingface#44037)

* start

* test

* betetr

* fix

* change name

* cannot revert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments