Fix incorrect attribute mapping relationships in GLM MoE DSA Config#46338
Merged
Conversation
Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com>
ArthurZucker
approved these changes
Jun 2, 2026
Collaborator
ArthurZucker
left a comment
There was a problem hiding this comment.
39f751a is what changed it, we might have to fix a test maybe? otherwise makes sense
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: glm_moe_dsa |
ArthurZucker
approved these changes
Jun 2, 2026
Collaborator
ArthurZucker
left a comment
There was a problem hiding this comment.
As its urgent fine by me!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix GlmMoeDsaConfig legacy
head_dimoverwritingqk_rope_head_dimSummary
Loading the same GLM-5
config.jsonyields different attention dimensions between transformers v5.3.0 and v5.4.0. v5.4.0 silently corruptsqk_rope_head_dimdue to a newattribute_mapentry.v5.3.0 vs v5.4.0 and after
For example: Checkpoint
config.json:__init__@strictdataclass, inheritsGlm4MoeLiteConfigattribute_maphead_dimmapping"head_dim": "qk_rope_head_dim"(inherited)qk_rope_head_dimafter loadhead_dim)head_dimin config outputqk_rope_head_dim)qk_head_dimWhy v5.3.0 works
Custom
__init__setsqk_rope_head_dimfrom the explicit JSON field first; legacyhead_dim=192is stored separately and does not overwriteqk_rope_head_dim.Why v5.4.0 breaks
qk_rope_head_dim=64is set as a dataclass fieldhead_dim=192arrives later in__post_init__attribute_mapredirectshead_dim → qk_rope_head_dim, overwriting 64 with 192Downstream impact (v5.4.0 only)
inv_freqlength is wrong (getattr(config, "head_dim")also resolves to 192)config.qk_rope_head_dim(e.g. SGLang NSA/MLA backends) get incorrect KV cache / RoPE layoutFix
attribute_mapinmodular_glm_moe_dsa.py— drop"head_dim": "qk_rope_head_dim"forGlmMoeDsaConfig