Support more MoE expert routing load imbalance patterns by XZman · Pull Request #6 · platformxlab/NeuSim

XZman · 2026-06-22T20:37:17Z

No description provided.

Port the MoE expert load-imbalance features into NeuSim's DeepSeek path: - MoELLMConfig: add expert_load_imbalance_factor (-1.0 sentinel -> worst-case E/K), all_to_all_load_imbalance_aware (default off), and num_worst_case_experts, plus get_effective_expert_tokens(), the effective-factor property, and a model validator. - create_all_to_all_op: add receiver_skew to scale the bandwidth-bound ICI time; _all_to_all_receiver_skew() derives the dispatch/combine incast skew under expert load imbalance. - create_ffn_deepseek_moe: replace the per-expert loop with a worst-case-device model (W hot experts + remaining experts) and apply the all-to-all skew to the dispatch/combine exchanges. Diverge intentionally from the trace_util source by computing the skew ratio and the remaining-expert token split in real (un-floored) units, applying the >=1 floor only to the matmul seqlen. This fixes a decode/small-token over-inflation (balanced load reported skew 32x at T=1; decode FFN modeled every expert active instead of the real routed count). Prefill / large-T behavior is unchanged. Default config (flags off) leaves the all-to-all latency identical; only the DeepSeek expert-compute model changes. Regression vs HEAD: DeepSeek decode ~-9% (de-inflation), prefill ~flat; all non-DeepSeek experiments byte-identical. Adds 22 tests (15 in test_moe_routing.py, 7 MoELLMConfig tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y8Ju7ry5zebVoCNLAKBPT8

Four small, behavior-preserving fixes from the xhigh review (no shipped-config output changes; regression vs the migration commit matches across all 7 experiments): 1. MoELLMConfig.__hash__: include expert_load_imbalance_factor, all_to_all_load_imbalance_aware, and num_worst_case_experts so configs that generate different op graphs no longer collide on the config hash. 2. Unify the worst-case-device expert count between the skew model and the compute model via _num_experts_on_worst_case_device() = ceil(E/EP). Previously the skew used E/EP (float) and the compute used E//EP (floor): they disagreed when EP did not divide E, dropped the remainder experts, and modeled ZERO MoE compute when EP > E. ceil is also the correct count for the busiest device. 3. _validate_expert_load_imbalance_factor: guard num_routed_experts <= 0 and num_activated_routed_experts_per_token <= 0 with a clear ValueError before the E/K division, instead of a bare ZeroDivisionError at construction. 4. Clarify the num_worst_case_experts docstring: W only sets how many experts are hot; the per-hot-expert load is governed by expert_load_imbalance_factor, so W=K yields the documented absolute worst case only when f is also at E/K. Adds 6 tests (hash distinctness, K=0/E=0 guard, ceil helper edges incl. EP>E, and an EP-indivisible skew anchor). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y8Ju7ry5zebVoCNLAKBPT8

Remove the all_to_all_load_imbalance_aware opt-in flag and always apply the dispatch/combine receiver skew for MoE. The skew still degrades to 1.0 when expert parallelism is off or the load is balanced (expert_load_imbalance_factor = 1.0); the default factor sentinel (-1.0) resolves to the E/K worst case, so the all-to-all path now matches the compute path's default instead of silently assuming a balanced exchange. - MoELLMConfig: drop the all_to_all_load_imbalance_aware field (and its hash entry); _all_to_all_receiver_skew no longer gates on it. - Update docstrings/comments and tests accordingly. Regression vs the prior branch HEAD: change is confined to DeepSeek. Prefill (bandwidth-bound) dispatch/combine all-to-all rises with the now-applied skew (EP=2 ~1.16x, EP=4 ~1.48x) -> TTFT ~+1.5%; decode (latency-bound) is unchanged; all non-DeepSeek experiments are byte-identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Y8Ju7ry5zebVoCNLAKBPT8

XZman and others added 3 commits June 19, 2026 21:04

XZman marked this pull request as ready for review June 22, 2026 22:35

XZman merged commit 58fbd3f into main Jun 22, 2026
6 checks passed

XZman deleted the moe-expert-routing-load-imbalance branch June 22, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support more MoE expert routing load imbalance patterns#6

Support more MoE expert routing load imbalance patterns#6
XZman merged 3 commits into
mainfrom
moe-expert-routing-load-imbalance

XZman commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

XZman commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant