Skip to content

Fix incorrect attribute mapping relationships in GLM MoE DSA Config#46338

Merged
ArthurZucker merged 4 commits into
huggingface:mainfrom
Dovis01:fix-glm-moe-dsa
Jun 2, 2026
Merged

Fix incorrect attribute mapping relationships in GLM MoE DSA Config#46338
ArthurZucker merged 4 commits into
huggingface:mainfrom
Dovis01:fix-glm-moe-dsa

Conversation

@Dovis01
Copy link
Copy Markdown
Contributor

@Dovis01 Dovis01 commented Jun 2, 2026

Fix GlmMoeDsaConfig legacy head_dim overwriting qk_rope_head_dim

Summary

Loading the same GLM-5 config.json yields different attention dimensions between transformers v5.3.0 and v5.4.0. v5.4.0 silently corrupts qk_rope_head_dim due to a new attribute_map entry.

v5.3.0 vs v5.4.0 and after

For example: Checkpoint config.json:

"head_dim": 192,
"qk_nope_head_dim": 192,
"qk_rope_head_dim": 64
v5.3.0 v5.4.0
Config class Custom __init__ @strict dataclass, inherits Glm4MoeLiteConfig
attribute_map No head_dim mapping "head_dim": "qk_rope_head_dim" (inherited)
qk_rope_head_dim after load 64 192 ✗ (overwritten by legacy head_dim)
head_dim in config output 192 (kept as separate legacy field) absent (aliased into qk_rope_head_dim)
qk_head_dim 256 (192+64) ✓ 384 (192+192) ✗

Why v5.3.0 works

Custom __init__ sets qk_rope_head_dim from the explicit JSON field first; legacy head_dim=192 is stored separately and does not overwrite qk_rope_head_dim.

Why v5.4.0 breaks

  1. qk_rope_head_dim=64 is set as a dataclass field
  2. Legacy head_dim=192 arrives later in __post_init__
  3. attribute_map redirects head_dim → qk_rope_head_dim, overwriting 64 with 192

Downstream impact (v5.4.0 only)

  • MLA attention projection shapes use wrong rope dim (192 vs 64)
  • Indexer Q/K split along rope dim is wrong
  • RoPE inv_freq length is wrong (getattr(config, "head_dim") also resolves to 192)
  • Inference engines reading config.qk_rope_head_dim (e.g. SGLang NSA/MLA backends) get incorrect KV cache / RoPE layout

Fix

  1. Override attribute_map in modular_glm_moe_dsa.py — drop "head_dim": "qk_rope_head_dim" for GlmMoeDsaConfig

Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

39f751a is what changed it, we might have to fix a test maybe? otherwise makes sense

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Dovis01 added 3 commits June 2, 2026 10:45
Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm_moe_dsa

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As its urgent fine by me!

@ArthurZucker ArthurZucker merged commit 3163718 into huggingface:main Jun 2, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants