Skip to content

[Configs] Fix layer type validation to include its mlp counterpart#46220

Merged
vasqu merged 5 commits into
mainfrom
fix-layer-types-validation
May 27, 2026
Merged

[Configs] Fix layer type validation to include its mlp counterpart#46220
vasqu merged 5 commits into
mainfrom
fix-layer-types-validation

Conversation

@vasqu
Copy link
Copy Markdown
Contributor

@vasqu vasqu commented May 26, 2026

Slight rewrite to make the validation of layer type to include mlp layer types (2 separate lists - 1 is attn (e.g. full vs swa) 1 is mlp (dense vs sparse)). Probably got lost in the refactor for strict validation

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks! Would be nice to separate mlp and attention in the future though probably!

Comment thread src/transformers/configuration_utils.py Outdated
Comment on lines +487 to +488
elif not all(layer_type in ALLOWED_LAYER_TYPES for layer_type in layers):
raise ValueError(f"The `(mlp)_layer_types` entries must be in {ALLOWED_LAYER_TYPES} but got {layers}")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's raise more precise error, does not cost much haha

Suggested change
elif not all(layer_type in ALLOWED_LAYER_TYPES for layer_type in layers):
raise ValueError(f"The `(mlp)_layer_types` entries must be in {ALLOWED_LAYER_TYPES} but got {layers}")
elif not all(layer_type in ALLOWED_LAYER_TYPES for layer_type in layers):
raise ValueError(f"The `{layer_types}` entries must be in {ALLOWED_LAYER_TYPES} but got {layers}")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, also fixed the message below

f"`num_hidden_layers` ({self.num_hidden_layers}) must be equal to the number of layer types "
f"({len(self.layer_types)})"
)
for layer_types in ["layer_types", "mlp_layer_types"]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlp layer types have different set of acceptable names, no? I'd prefer to add a separate self.validate_mlp_layer_types in this PR before merging since it's not a huge change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be different sets but apparently they got merged into one 🤷‍♂️

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALLOWED_LAYER_TYPES = (
"full_attention",
"sliding_attention",
"chunked_attention",
"compressed_sparse_attention", # CSA, used in deepseek_v4
"heavily_compressed_attention", # HCA, used in deepseek_v4
"linear_attention", # used in minimax
"conv", # used in LFMv2
"mamba",
"attention",
"sparse",
"dense",
"hybrid", # for layers that have both mamba and attention in zamba and zamba2
"moe", # for nemotron_h, which uses either attention, mamba or moe
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I still merge or do we want to be breaking here 😓 it's kind of a weird situation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with merging, I assumed there are already non-overlapping sets for two keys. We can create an issue and fix it later for the sake of "correctness"

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented May 27, 2026

Opened #46245 to keep track, this will be likely breaking

@vasqu vasqu added this pull request to the merge queue May 27, 2026
Merged via the queue into main with commit 70257e9 May 27, 2026
31 checks passed
@vasqu vasqu deleted the fix-layer-types-validation branch May 27, 2026 16:17
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request May 28, 2026
…uggingface#46220)

* fix to include mlp layer types as well

* slightly adjust the err msg

* specify the message a bit
kashif pushed a commit to kashif/transformers that referenced this pull request Jun 1, 2026
…uggingface#46220)

* fix to include mlp layer types as well

* slightly adjust the err msg

* specify the message a bit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants