:rotating_light: Validate config attributes by zucchini-nlp · Pull Request #41250 · huggingface/transformers

zucchini-nlp · 2025-10-01T11:50:47Z

What does this PR do?

As per title. Continues from #40793 and supersedes #36534

NOTE: config classes can't accept positional args anymore! I don't think anyone would use pos args anyway but marring the PR as breaking

Note

High Risk
Refactors PreTrainedConfig and many model config classes to @dataclass + huggingface_hub @strict validation, which can change initialization/serialization behavior and reject previously-accepted configs. Also enforces save-time validation and updates defaults/deprecations (e.g., use_return_dict), risking backward-compatibility across model loading and downstream integrations.

Overview
Adds strict config validation. PreTrainedConfig is converted to a @dataclass with huggingface_hub’s @strict, introduces built-in validators (architecture consistency, special token id ranges, layer type checks, output_attentions vs attn_implementation), and runs validate() automatically on save_pretrained.

Modernizes and standardizes model configs. Many model configuration classes are migrated from custom __init__ logic to dataclass fields + __post_init__, moving compatibility logic (e.g., defaulting sub-configs, key/value casting for JSON) into post-init and adding model-specific validate_architecture where needed.

API/behavior tweaks. Deprecates use_return_dict in favor of return_dict (and updates multiple model forward paths accordingly), adjusts RoPE validation ignore-key handling, narrows AutoTokenizer fallback exception handling, and bumps the minimum huggingface-hub requirement to >=1.5.0.

^{Written by Cursor Bugbot for commit 07095f3. This will update automatically on new commits. Configure here.}

HuggingFaceDocBuilderDev · 2025-10-01T11:59:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/models/bart/configuration_bart.py

zucchini-nlp · 2025-12-10T13:12:32Z

Blocked by #41541 (comment) for now

zucchini-nlp · 2026-02-03T13:12:25Z

Tieme to revive this branch

…mote code, tmrw

zucchini-nlp · 2026-02-04T17:05:43Z

Nice, much better and easy to maintain BC with remote code now!

ArthurZucker

Very very nice!

ArthurZucker · 2026-02-05T07:51:48Z

src/transformers/configuration_utils.py

+        # Keys are always strings in JSON so convert ids to int here for id2label and pruned_heads
+        if self.id2label is None:
+            self._create_id_label_maps(kwargs.get("num_labels", 2))
+        else:
+            if kwargs.get("num_labels") is not None and len(self.id2label) != kwargs.get("num_labels"):
+                logger.warning(
+                    f"You passed `num_labels={kwargs.get('num_labels')}` which is incompatible to "
+                    f"the `id2label` map of length `{len(self.id2label)}`."
+                )
+            self.id2label = {int(key): value for key, value in self.id2label.items()}


is it a good time to get rid of these general attributes and only have them for models that actually require them?

src/transformers/configuration_utils.py

src/transformers/models/bart/configuration_bart.py

src/transformers/models/esm/configuration_esm.py

zucchini-nlp · 2026-03-16T12:45:01Z

@bot /repo

github-actions · 2026-03-16T12:45:40Z

Repo. Consistency bot fixed some files and pushed the changes.

github-actions · 2026-03-16T13:08:05Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, aimv2, albert, align, altclip, apertus, arcee, aria, audio_spectrogram_transformer, audioflamingo3, auto, autoformer, aya_vision, bamba, bark, bart

This reverts commit 125624a.

…GLM-5 nightly tests Transformers PR huggingface/transformers#41250 (merged Mar 16) converts PretrainedConfig subclasses to @DataClass via __init_subclass__, which breaks sglang's DeepseekVL2Config (non-default field ordering) and prevents the server from starting at all. Remove `pip install git+https://github.com/huggingface/transformers.git` from all Qwen 3.5 and GLM-5 CI jobs (MI30x, MI35x, ROCm 7.0 and 7.2). Use the stable transformers shipped in the docker image instead, matching all other nightly jobs (Grok2, DeepSeek-V3.2, etc.). Keep mistral-common and lm-eval[api] for Qwen 3.5 tests that need them.

@strict

- Remove torch_dtype="auto" from docs (now default) - Simplify modular_sarvam_mla.py to only override defaults that differ from DeepseekV3Config (no __init__, no workarounds) - Add @strict(accept_kwargs=True) for config validation (huggingface#41250) - Regenerate configuration_sarvam_mla.py with dataclass fields and __post_init__ pattern - Hub config.json changes needed: remove head_dim/q_head_dim, change rope_scaling.type to "yarn", update architectures Made-with: Cursor

Resolves the failing tests on transformers main branch. After the change in huggingface/transformers#41250, the num_hidden_layers attribute is no longer part of the model config when serialized to a dict. The _prepare_prompt_learning_config function was using this attribute. Therefore, we now pass the config before converting it into a dict and extract the attribute from it.

Transformers PR huggingface/transformers#41250 (merged Mar 16) converts PretrainedConfig subclasses to @DataClass via __init_subclass__, which breaks sglang's DeepseekVL2Config and prevents the server from starting. For Qwen 3.5: remove git+transformers entirely — stable version in the docker image is sufficient (verified passing). For GLM-5: pin to commit 96f807a33b75 (last commit before the breaking change) since GLM-5 needs the glm_moe_dsa model type which is only in the transformers dev branch, not in stable releases yet.

initial commit

2ecd9f5

zucchini-nlp added 9 commits October 1, 2025 19:00

just push for now

5758b7c

maybe not do it for all models, lets see how many models fail now

e9f940b

update

684e799

lets see what esle fails now

43eb47c

nit

868cac6

merge main

b7db732

style

17739ff

delete rope validation

19a2ba1

bart

6095a39

ArthurZucker reviewed Dec 10, 2025

View reviewed changes

src/transformers/models/bart/configuration_bart.py Outdated Show resolved Hide resolved

zucchini-nlp added 5 commits February 3, 2026 14:29

rebase

26892c1

make style

bfe2998

provate rope valid for now, hub complains

d039be1

more updates

a82d894

i love backwards compatibility! Let's check if this will work with re…

b7b0492

…mote code, tmrw

zucchini-nlp mentioned this pull request Feb 4, 2026

Pass kwargs to post init in dataclasses huggingface/huggingface_hub#3771

Merged

pin hf hub 1.4.0

b9aec45

ArthurZucker approved these changes Feb 5, 2026

View reviewed changes

zucchini-nlp added 7 commits February 6, 2026 15:30

merge main

7edc1a2

want to check tests

40d2128

why do we even keep use_return_dict from 6 hyear ago?

e241202

special eos token can be a list in many cases, fix type hints

7b24e38

batch

b79200f

batch

b4e93e3

batch

f011bd4

zucchini-nlp added 6 commits March 16, 2026 11:06

fix new models' typing hints

801bf4a

oops, that is a property

6f96d54

Merge remote-tracking branch 'upstream/main' into config-validation

cf39486

and one more new model just merged

b101899

actually, non-dataclass child is really not the way so

b027d8f

dont' replace all matches!

81cd131

Apply repo consistency fixes

125624a

zucchini-nlp added 2 commits March 16, 2026 14:08

Revert "Apply repo consistency fixes"

43187f4

This reverts commit 125624a.

fix repo, would be great to fix this in style

d31003d

zucchini-nlp enabled auto-merge March 16, 2026 13:10

why I cant fix all failures from repo at once

d561953

zucchini-nlp added this pull request to the merge queue Mar 16, 2026

Merged via the queue into huggingface:main with commit 39f751a Mar 16, 2026
15 of 16 checks passed

zucchini-nlp deleted the config-validation branch March 16, 2026 13:49

This was referenced Mar 16, 2026

[Model] Add PP-Chart2Table Model Support #43767

Open

Add VidEoMT #44285

Open

Add SarvamMLA model (sarvamai/sarvam-105b) #44569

Open

zucchini-nlp restored the config-validation branch March 16, 2026 19:08

This was referenced Mar 17, 2026

[AMD] CI: stop installing transformers from git main in Qwen 3.5 and GLM-5 nightly tests sgl-project/sglang#20748

Closed

[AMD] CI: stop installing transformers from git main in Qwen 3.5 and GLM-5 nightly tests sgl-project/sglang#20750

Open

BenjaminBossan mentioned this pull request Mar 17, 2026

FIX Deal with missing attribute on model config huggingface/peft#3109

Open

hmellor mentioned this pull request Mar 17, 2026

Fix Mistral yarn warning in Transformers v5 vllm-project/vllm#37292

Open

Conversation

zucchini-nlp commented Oct 1, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2025

Uh oh!

Uh oh!

zucchini-nlp commented Dec 10, 2025

Uh oh!

zucchini-nlp commented Feb 3, 2026

Uh oh!

zucchini-nlp commented Feb 4, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zucchini-nlp commented Oct 1, 2025 •

edited by cursor bot

Loading

github-actions bot commented Mar 16, 2026 •

edited

Loading