Skip to content

Fix GlmMoeDsaConfig default mlp_layer_types in modular conversion#43876

Merged
zucchini-nlp merged 3 commits intohuggingface:mainfrom
OiPunk:codex/transformers-43864-glm-moe-config-default
Feb 10, 2026
Merged

Fix GlmMoeDsaConfig default mlp_layer_types in modular conversion#43876
zucchini-nlp merged 3 commits intohuggingface:mainfrom
OiPunk:codex/transformers-43864-glm-moe-config-default

Conversation

@OiPunk
Copy link
Contributor

@OiPunk OiPunk commented Feb 10, 2026

Summary

This PR fixes #43864 by preserving the GlmMoeDsaConfig default mlp_layer_types from the modular source.

GlmMoeDsaConfig should default to dense MLP for the first 3 layers and sparse afterward. During modular conversion, the parent init body was being inlined and overwrote that default with the parent pattern (["dense"] + ["sparse"] * ...).

Changes

  • In modular_glm_moe_dsa.py, call PreTrainedConfig.__init__(self, **kwargs) instead of super().__init__(**kwargs) to avoid inlining parent init logic.
  • Regenerated configuration_glm_moe_dsa.py via modular converter, which removes the duplicated parent default block.
  • Added a regression test in tests/models/glm_moe_dsa/test_configuration_glm_moe_dsa.py to assert the expected default pattern for num_hidden_layers=8.

Validation

  • PYTHONPATH=src python3 utils/modular_model_converter.py glm_moe_dsa
  • PYTHONPATH=src python3 utils/check_modular_conversion.py --files src/transformers/models/glm_moe_dsa/modular_glm_moe_dsa.py
  • PYTHONPATH=src python3 -m pytest tests/models/glm_moe_dsa/test_configuration_glm_moe_dsa.py -q
  • PYTHONPATH=src python3 -m trace --count --summary --module unittest tests.models.glm_moe_dsa.test_configuration_glm_moe_dsa | grep -E "configuration_glm_moe_dsa|test_configuration_glm_moe_dsa"
    • output: configuration_glm_moe_dsa ... 100%

@OiPunk
Copy link
Contributor Author

OiPunk commented Feb 10, 2026

Thanks for the detailed review. I pushed commit a10f430 to address the requested changes.

What I changed:

  • Removed duplicate attribute assignments in modular_glm_moe_dsa.py so each config field is assigned once.
  • Regenerated configuration_glm_moe_dsa.py from the modular source to keep generated output in sync.
  • Moved the default mlp_layer_types regression test into test_modeling_glm_moe_dsa.py::GlmMoeDsaModelTest and removed the standalone configuration test file.

Validation run locally:

  • PYTHONPATH=src python3 utils/modular_model_converter.py glm_moe_dsa
  • PYTHONPATH=src python3 utils/check_modular_conversion.py --files src/transformers/models/glm_moe_dsa/modular_glm_moe_dsa.py
  • PYTHONPATH=src python3 -m ruff check src/transformers/models/glm_moe_dsa/modular_glm_moe_dsa.py src/transformers/models/glm_moe_dsa/configuration_glm_moe_dsa.py tests/models/glm_moe_dsa/test_modeling_glm_moe_dsa.py
  • PYTHONPATH=src pytest -q tests/models/glm_moe_dsa/test_modeling_glm_moe_dsa.py -k default_mlp_layer_types

I also verified both mlp_layer_types paths (None and explicit list) execute in the config initializer.

@zucchini-nlp
Copy link
Member

run-slow: glm_moe_dsa

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/glm_moe_dsa"]
quantizations: []

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 2415753c merge commit
PR a10f4303 branch commit
main 884749a1 base commit

✅ No failing test specific to this PR 🎉 👏 !

@zucchini-nlp
Copy link
Member

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

Style fix bot fixed some files and pushed the changes.

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm_moe_dsa

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@zucchini-nlp zucchini-nlp merged commit 476600a into huggingface:main Feb 10, 2026
19 checks passed
jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
…ggingface#43876)

* Fix GlmMoeDsaConfig default mlp layer pattern

* fix(glm-moe-dsa): dedupe config init and colocate test

* Apply repo consistency fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GlmMoeDsaConfig: mlp_layer_types default overwritten by inlined parent init

3 participants