Fix some MoE routers by IlyasMoutawwakil · Pull Request #43445 · huggingface/transformers

IlyasMoutawwakil · 2026-01-23T12:44:25Z

What does this PR do?

Same as #43288
Also fixes phimoe and its integration tests

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2026-01-23T12:56:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

IlyasMoutawwakil · 2026-01-23T15:12:27Z

src/transformers/models/hunyuan_v1_moe/modular_hunyuan_v1_moe.py

-        )
-        return selected_experts, routing_weights.to(hidden_states.dtype)
-
        return selected_experts, routing_weights.to(hidden_states.dtype)


it's interesting that this dead code is not caught by styling

IlyasMoutawwakil · 2026-01-26T09:40:23Z

src/transformers/models/phimoe/modeling_phimoe.py

-            router_logits,
-            jitter_eps=self.router_jitter_noise,
-            training=self.training,
+            router_logits, jitter_eps=self.router_jitter_noise, training=self.training, top_k=self.top_k


top_k was not passed and defaulted to 2

IlyasMoutawwakil · 2026-01-26T09:48:45Z

run-slow: hunyuan_v1_moe, phimoe

github-actions · 2026-01-26T09:49:59Z

This comment contains run-slow, running the specified jobs:

models: ["models/hunyuan_v1_moe", "models/phimoe"]
quantizations: []

github-actions · 2026-01-26T10:33:12Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

IlyasMoutawwakil · 2026-01-26T11:33:32Z

both models integration tests pass locally now

IlyasMoutawwakil · 2026-01-26T11:34:14Z

src/transformers/models/phimoe/modeling_phimoe.py

+        # Phimoe uses nn.LayerNorm
+        self.input_layernorm = nn.LayerNorm(config.hidden_size, eps=config.rms_norm_eps, elementwise_affine=True)
+        self.post_attention_layernorm = nn.LayerNorm(
+            config.hidden_size, eps=config.rms_norm_eps, elementwise_affine=True
+        )


Phimoe uses nn.LayerNorm with bias

vasqu

Thanks for checking another set of models, I guess we have them all fixed with this?

The phimoe one is an interesting case 😅

vasqu · 2026-01-26T13:37:32Z

src/transformers/integrations/moe.py

-    expert_ids = top_k_index.reshape(-1)
-    token_idx = torch.arange(num_tokens, device=device).unsqueeze(1).expand(-1, num_top_k).reshape(-1)
-
-    # Resolve routing weights per selected sample, allowing top_k_weights to be either:


Glad to remove this 🙏

yeah much cleaner !

vasqu · 2026-01-26T13:40:58Z

src/transformers/models/phimoe/modular_phimoe.py

+        # Phimoe uses nn.LayerNorm
+        self.input_layernorm = nn.LayerNorm(config.hidden_size, eps=config.rms_norm_eps, elementwise_affine=True)
+        self.post_attention_layernorm = nn.LayerNorm(
+            config.hidden_size, eps=config.rms_norm_eps, elementwise_affine=True
+        )


Oh wow, that's an insane find - this model must have been broken for a long time

yeh ppl must've continued to use the remote code one

vasqu · 2026-01-26T13:44:09Z

tests/models/hunyuan_v1_moe/test_modeling_hunyuan_v1_moe.py

    def test_model_generation(self):
-        # we will compele this when model file change over
-        # pass


I have a PR here #43411 which fixes some wrong RoPE init, will this still work with that fix?

i guess, here I only removed a comment, I might have misunderstood but the initialization added in the PR is already in the init of the class, why is it necessary in init_weights as well ?

Apparently, it doesn't matter anymore what you init in __init__ and _init_weights will overwrite in any case - meaning that if there is custom logic in init, it will not be applied. I want to refactor this so that it is no longer the case or rather that we depend on another init function for rope that allows users to do whatever they want to as init

vasqu · 2026-01-26T13:45:38Z

tests/models/phimoe/test_modeling_phimoe.py

+                [-3.4844, -2.4688, -1.1719, 0.5703, -0.4902, -0.0942, 0.7773, -0.2539, 0.3223, -1.0234],
+                [-0.9805, 0.0811, -0.5273, 2.3438, 0.6914, 3.0781, 0.3164, 0.2197, 0.5312, -2.1094],


I guess that's the gpu diff between t4 and a10

ah yes, rerunning with eager experts impl to see if grouped is also contributing to the diff.

hmm, grouped and eager are equivalent in terms of logits here (on A100). should I revert this change ? or maybe use the expectation class.

We should check against A10 runners - let's revert for now and check with run-slow first. If it needs a change, I can update it

IlyasMoutawwakil · 2026-01-26T14:49:58Z

run-slow: hunyuan_v1_moe, phimoe

github-actions · 2026-01-26T14:51:08Z

This comment contains run-slow, running the specified jobs:

models: ["models/hunyuan_v1_moe", "models/phimoe"]
quantizations: []

github-actions · 2026-01-26T15:42:37Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

IlyasMoutawwakil · 2026-01-26T16:00:15Z

the ci is struggling with model loading again 😭, even though device_map="auto"

github-actions · 2026-01-27T07:30:24Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: hunyuan_v1_moe, phimoe

IlyasMoutawwakil · 2026-01-27T13:22:29Z

src/transformers/integrations/moe.py

-    sample_weights = sample_weights.reshape(-1, 1)  # (S, 1)
+
+    # Reshape for easier indexing
+    # S is the number of selected tokens-experts pairs (S = num_tokens * num_top_k)


clarification taken from #43439

IlyasMoutawwakil added 2 commits January 23, 2026 13:28

fix routers that returned the entire distribution of experts

4a43553

add comment

1b7f9a0

IlyasMoutawwakil commented Jan 23, 2026

View reviewed changes

IlyasMoutawwakil commented Jan 26, 2026

View reviewed changes

Merge branch 'main' into fix-broken-eager-experts

8d78496

fix broken phimoe rope

ceba5d1

IlyasMoutawwakil added 2 commits January 26, 2026 11:58

fix phimoe

ed48f49

hunyuan_v1_moe integration tests passing

780e03b

IlyasMoutawwakil commented Jan 26, 2026

View reviewed changes

IlyasMoutawwakil requested review from ArthurZucker, Rocketknight1 and vasqu January 26, 2026 11:48

vasqu approved these changes Jan 26, 2026

View reviewed changes

revert logits and use eager

6a6263b

Merge branch 'main' into fix-broken-eager-experts

6cfe840

IlyasMoutawwakil commented Jan 27, 2026

View reviewed changes

vasqu merged commit 0dbb56e into main Jan 27, 2026
21 of 26 checks passed

vasqu deleted the fix-broken-eager-experts branch January 27, 2026 13:33

MekkCyber mentioned this pull request Feb 2, 2026

[MoE] add clarifying comments to moe integration #43439

Closed

		[-3.4844, -2.4688, -1.1719, 0.5703, -0.4902, -0.0942, 0.7773, -0.2539, 0.3223, -1.0234],
		[-0.9805, 0.0811, -0.5273, 2.3438, 0.6914, 3.0781, 0.3164, 0.2197, 0.5312, -2.1094],

Conversation

IlyasMoutawwakil commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

CI Results

Uh oh!

IlyasMoutawwakil commented Jan 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

CI Results

Uh oh!

IlyasMoutawwakil commented Jan 26, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IlyasMoutawwakil commented Jan 23, 2026 •

edited

Loading