Skip to content

🚨 Validate config attributes#41250

Merged
zucchini-nlp merged 114 commits intohuggingface:mainfrom
zucchini-nlp:config-validation
Mar 16, 2026
Merged

🚨 Validate config attributes#41250
zucchini-nlp merged 114 commits intohuggingface:mainfrom
zucchini-nlp:config-validation

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Oct 1, 2025

What does this PR do?

As per title. Continues from #40793 and supersedes #36534

NOTE: config classes can't accept positional args anymore! I don't think anyone would use pos args anyway but marring the PR as breaking


Note

High Risk
Refactors PreTrainedConfig and many model config classes to @dataclass + huggingface_hub @strict validation, which can change initialization/serialization behavior and reject previously-accepted configs. Also enforces save-time validation and updates defaults/deprecations (e.g., use_return_dict), risking backward-compatibility across model loading and downstream integrations.

Overview
Adds strict config validation. PreTrainedConfig is converted to a @dataclass with huggingface_hub’s @strict, introduces built-in validators (architecture consistency, special token id ranges, layer type checks, output_attentions vs attn_implementation), and runs validate() automatically on save_pretrained.

Modernizes and standardizes model configs. Many model configuration classes are migrated from custom __init__ logic to dataclass fields + __post_init__, moving compatibility logic (e.g., defaulting sub-configs, key/value casting for JSON) into post-init and adding model-specific validate_architecture where needed.

API/behavior tweaks. Deprecates use_return_dict in favor of return_dict (and updates multiple model forward paths accordingly), adjusts RoPE validation ignore-key handling, narrows AutoTokenizer fallback exception handling, and bumps the minimum huggingface-hub requirement to >=1.5.0.

Written by Cursor Bugbot for commit 07095f3. This will update automatically on new commits. Configure here.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member Author

Blocked by #41541 (comment) for now

@zucchini-nlp
Copy link
Member Author

Tieme to revive this branch

@zucchini-nlp
Copy link
Member Author

Nice, much better and easy to maintain BC with remote code now!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice!

Comment on lines +196 to +205
# Keys are always strings in JSON so convert ids to int here for id2label and pruned_heads
if self.id2label is None:
self._create_id_label_maps(kwargs.get("num_labels", 2))
else:
if kwargs.get("num_labels") is not None and len(self.id2label) != kwargs.get("num_labels"):
logger.warning(
f"You passed `num_labels={kwargs.get('num_labels')}` which is incompatible to "
f"the `id2label` map of length `{len(self.id2label)}`."
)
self.id2label = {int(key): value for key, value in self.id2label.items()}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a good time to get rid of these general attributes and only have them for models that actually require them?

@zucchini-nlp
Copy link
Member Author

@bot /repo

@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

Repo. Consistency bot fixed some files and pushed the changes.

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, aimv2, albert, align, altclip, apertus, arcee, aria, audio_spectrogram_transformer, audioflamingo3, auto, autoformer, aya_vision, bamba, bark, bart

@zucchini-nlp zucchini-nlp enabled auto-merge March 16, 2026 13:10
@zucchini-nlp zucchini-nlp added this pull request to the merge queue Mar 16, 2026
Merged via the queue into huggingface:main with commit 39f751a Mar 16, 2026
15 of 16 checks passed
@zucchini-nlp zucchini-nlp deleted the config-validation branch March 16, 2026 13:49
@zucchini-nlp zucchini-nlp restored the config-validation branch March 16, 2026 19:08
michaelzhang-ai added a commit to michaelzhang-ai/sglang that referenced this pull request Mar 17, 2026
…GLM-5 nightly tests

Transformers PR huggingface/transformers#41250 (merged Mar 16) converts
PretrainedConfig subclasses to @DataClass via __init_subclass__, which
breaks sglang's DeepseekVL2Config (non-default field ordering) and
prevents the server from starting at all.

Remove `pip install git+https://github.com/huggingface/transformers.git`
from all Qwen 3.5 and GLM-5 CI jobs (MI30x, MI35x, ROCm 7.0 and 7.2).
Use the stable transformers shipped in the docker image instead, matching
all other nightly jobs (Grok2, DeepSeek-V3.2, etc.).

Keep mistral-common and lm-eval[api] for Qwen 3.5 tests that need them.
aashay-sarvam added a commit to aashay-sarvam/transformers that referenced this pull request Mar 17, 2026
- Remove torch_dtype="auto" from docs (now default)
- Simplify modular_sarvam_mla.py to only override defaults that differ
  from DeepseekV3Config (no __init__, no workarounds)
- Add @strict(accept_kwargs=True) for config validation (huggingface#41250)
- Regenerate configuration_sarvam_mla.py with dataclass fields and
  __post_init__ pattern
- Hub config.json changes needed: remove head_dim/q_head_dim, change
  rope_scaling.type to "yarn", update architectures

Made-with: Cursor
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Mar 17, 2026
Resolves the failing tests on transformers main branch.

After the change in
huggingface/transformers#41250, the
num_hidden_layers attribute is no longer part of the model config when
serialized to a dict. The _prepare_prompt_learning_config function was
using this attribute. Therefore, we now pass the config before
converting it into a dict and extract the attribute from it.
michaelzhang-ai added a commit to sgl-project/sglang that referenced this pull request Mar 17, 2026
Transformers PR huggingface/transformers#41250 (merged Mar 16) converts
PretrainedConfig subclasses to @DataClass via __init_subclass__, which
breaks sglang's DeepseekVL2Config and prevents the server from starting.

For Qwen 3.5: remove git+transformers entirely — stable version in the
docker image is sufficient (verified passing).

For GLM-5: pin to commit 96f807a33b75 (last commit before the breaking
change) since GLM-5 needs the glm_moe_dsa model type which is only in
the transformers dev branch, not in stable releases yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants