Support Modular (!!) + Configs in `check_auto_docstrings` by yonigozlan · Pull Request #44803 · huggingface/transformers

yonigozlan · 2026-03-17T22:40:44Z

What does this PR do?

(Finally) add support for checking+fixing both generated files and modular files in check_auto_docstrings.

Also auto_docstring was recently added to configs, and this PR updates check_auto_docstrings to support configs.
Currently, auto_docstring documents all config attributes in the docs, which result in illegible docs. This PR also excludes attributes inherited from PreTrainedConfig in the docs (as it was without auto_docstring)

This means that we shouldn't have PR merged with docstring related errors/warning anymore + inconsistencies between modular and check_auto_docstrings overwriting eachother.

Cc @zucchini-nlp for configs

HuggingFaceDocBuilderDev · 2026-03-18T21:21:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…hub.com/yonigozlan/transformers into add-config-suport-check-auto-docstrings

ArthurZucker

Ty, IDK if its best to fix generate file then modular, it looks weird and slow!

ArthurZucker · 2026-03-19T14:48:20Z

utils/check_docstrings.py

+    """After fixing docstrings in a generated file, propagate the same fixes to the
+    corresponding modular_*.py source file.
+


is it not simpler to fix modular first?

or is it because its too complicated since modular does not have full scope?

or is it because its too complicated since modular does not have full scope?

Yes exactly, with super kwargs, inherited docstrings, inherited attributes (in configs and other dataclasses), it's very difficult to fix modular first, much simpler and correct that way

ArthurZucker · 2026-03-19T14:48:48Z

utils/check_docstrings.py

        output_docstring_indent = 4
        source_args_doc = [ModelOutputArgs]
+    elif item.is_config:
+        # Config class (PreTrainedConfig subclass) - args are class-level type annotations,


yonigozlan · 2026-03-19T14:53:58Z

For the slow part at least, it's really not a problem imo since this is a tool for contributors to run a few times before merging. and not executed at every imports like auto_docstring. Also it's not that slow 😁

zucchini-nlp

Super nice, thanks a lot for working on modular as well. I had it in mind for long but caught up with model reviews 😓

I checked a few first models to see if auto-generation is correct. I think it is deleting Example code in some cases and I am not sure if we should document explicitly those args that are in autodocstring::ConfigArgs

zucchini-nlp · 2026-03-20T10:50:43Z

utils/check_docstrings.py

 from transformers import logging
 from transformers.utils import direct_transformers_import
 from transformers.utils.auto_docstring import (
+    ConfigArgs,


oh nice, I noticed this too later and assumed all checks happen in auto_docstring.py file

zucchini-nlp · 2026-03-20T10:55:32Z

utils/check_docstrings.py

+    """Return the modular_*.py path for any generated model file, or None.
+
+    Handles modeling_*, configuration_*, processing_*, image_processing_* (including
+    the _fast suffix variant).


ultra nit: _pil suffix after your refactor 😄 Also in code

zucchini-nlp · 2026-03-20T11:14:32Z

src/transformers/models/bitnet/configuration_bitnet.py

-    r"""
-    ```python
-    >>> from transformers import BitNetModel, BitNetConfig
-
-    >>> # Initializing a BitNet style configuration


why is it deleted?

I think that was accidentally deleted during one of the earlier commit to add config supports in check_auto_docstrings, which didn't work fully. We shouldn't have this problem with the last version, but I'll restore all the deleted examples and try running again to see if everything work as expected. Thanks for catching that!

zucchini-nlp · 2026-03-20T11:15:30Z

src/transformers/models/clip/configuration_clip.py

+    text_config (`dict`, *optional*):
+        Dictionary of configuration options used to initialize [`CLIPTextConfig`].
+    vision_config (`dict`, *optional*):
+        Dictionary of configuration options used to initialize [`CLIPVisionConfig`].


btw, does the "check" delete args that are in "autodoctsing"? IIRC all three args here are auto-documented

It should, if the docstrings are exactly the same. If the docstring of args existing in auto_docstring are overriden with different description, they won't be deleted by check_auto_docstring

ahhh I see, makes sense

zucchini-nlp · 2026-03-20T11:16:10Z

src/transformers/models/cwm/configuration_cwm.py

-    r"""
-    ```python
-    >>> from transformers import CwmModel, CwmConfig
-
-    >>> # Initializing a Cwm cwm-7b style configuration
-    >>> configuration = CwmConfig()
-
-    >>> # Initializing a model from the cwm-7b style configuration
-    >>> model = CwmModel(configuration)
-
-    >>> # Accessing the model configuration
-    >>> configuration = model.config


same here, I think the check is not working correctly with bare "Example" docs.

edit: Actually not, I also saw the whole doc deleted with args and the Example, so no idea what might have gone wrong

zucchini-nlp · 2026-03-20T11:18:28Z

src/transformers/models/dpt/configuration_dpt.py

-    readout_type (`str`, *optional*, defaults to `"project"`):
-        The readout type to use when processing the readout token (CLS token) of the intermediate hidden states of
-        the ViT backbone. Can be one of [`"ignore"`, `"add"`, `"project"`].
-        - "ignore" simply ignores the CLS token.
-        - "add" passes the information from the CLS token to all other tokens by adding the representations.
-        - "project" passes information to the other tokens by concatenating the readout to all other tokens before
-          projecting the
-        representation to the original feature dimension D using a linear layer followed by a GELU non-linearity.


I am not sure if the checker handles well huge blocks of descriptions. I had issues when they used colon inside a description 😅

In this case, readout_type is part of attr and is deleted. Can you check if smth went wrong when auto-generating?

Looks like it was just moved above to fit the order of the attributes in the config no?

zucchini-nlp · 2026-03-20T11:19:48Z

src/transformers/models/glm_image/modular_glm_image.py

+    r"""
+    Example:
+
+    ```python
+    >>> from transformers import GlmImageVisionConfig, GlmImageVisionModel
+
+    >>> # Initializing a GlmImageVisionConfig GLM-4.1V-9B style configuration
+    >>> configuration = GlmImageVisionConfig()
+
+    >>> # Initializing a model (with random weights) from the GLM-4.1V-9B configuration
+    >>> model = GlmImageVisionModel(configuration)
+


nice, if we can auto-add examples based on what we have. I wonder where the checkpoint comes from tho, because the auto_docstring has "zai-org/GLM-Image"

I copied this from Glm4vVisionConfig as this is a bit of an edge case where the docstring inherited contains an arg that we remove here (out_hidden_size ). But agreed we can automatically add examples (we already do for models mapping to a pipeline)

zucchini-nlp · 2026-03-20T11:21:10Z

src/transformers/models/granite/configuration_granite.py

-    ```python
-    >>> from transformers import GraniteModel, GraniteConfig
-
-    >>> # Initializing a Granite granite-3b style configuration
-    >>> configuration = GraniteConfig()


i'd prefer to keep docs for "Example", after deleting the unnecessary args

zucchini-nlp · 2026-03-20T11:22:25Z

src/transformers/models/janus/configuration_janus.py

    num_hidden_layers: int = 2
    hidden_act: str = "gelu"
-    image_token_embed_dim = 2048
+    image_token_embed_dim: int = 2048


is this auto-generated based on default value's type? 🤯

No unfortunately 😅, I fixed this manually

zucchini-nlp · 2026-03-23T13:33:02Z

Oh, and we could add one new line in https://huggingface.co/docs/transformers/en/auto_docstring at the end, that config subclasses are also to be decorated

…heck-auto-docstrings

… document config attributes

yonigozlan · 2026-03-24T00:14:25Z

@zucchini-nlp Thanks for the review! Everything should be good now :)

zucchini-nlp

Niice, I just looked on auto-generated config and for the rest I'll trust in you

Btw, can you rebase main because I merged yesterday a PR changing annotations for architectures from ClassVar to normal field. I had to skip auto-doc complaining about it manually and left a TODO

zucchini-nlp · 2026-03-24T13:37:46Z

docs/source/en/auto_docstring.md

+</hfoption>
+<hfoption id="config classes">
+
+Place `@auto_docstring` directly above a `PreTrainedConfig` subclass, alongside `@strict(accept_kwargs=True)` from `huggingface_hub.dataclasses`. Config parameters are declared as **class-level annotations** (not as `__init__` arguments) — the `@strict` dataclass pattern used throughout Transformers. The class docstring documents model-specific parameters and optionally a usage example.


Thanks for adding the strict one in docs as well

btw, we are removing the (accept_kwargs=True) and instead wrapping subclasses always. I'm merging the PR today, so let's remove it from docs

zucchini-nlp · 2026-03-24T13:38:02Z

docs/source/en/auto_docstring.md

+from ...utils import auto_docstring
+
+@auto_docstring(checkpoint="org/my-model-checkpoint")
+@strict(accept_kwargs=True)


zucchini-nlp · 2026-03-24T13:38:16Z

docs/source/en/auto_docstring.md

+<hfoption id="processor classes">
+
+**Multimodal processors** (`ProcessorMixin` subclasses, `processing_*.py`) always use bare `@auto_docstring`. The class intro is auto-generated. Only document `__init__` parameters not already covered by `ProcessorArgs` (`image_processor`, `tokenizer`, `chat_template`, etc.) — omit the docstring entirely if all parameters are standard. Decorate `__call__` with `@auto_docstring` as well; its body docstring contains only a `Returns:` section plus any extra model-specific call arguments. `return_tensors` is automatically appended.


zucchini-nlp · 2026-03-24T13:39:32Z

docs/source/en/auto_docstring.md

+    attributes = ["image_processor", "tokenizer"]
+    image_processor_class = "AutoImageProcessor"
+    tokenizer_class = "AutoTokenizer"


do we want to define these, iirc you deprecated it after v5 🤔

Ah no thanks for catching that, I was too quick in rereading the claude generated docs :)

zucchini-nlp · 2026-03-24T13:40:05Z

src/transformers/models/clip/configuration_clip.py

+    text_config (`dict`, *optional*):
+        Dictionary of configuration options used to initialize [`CLIPTextConfig`].
+    vision_config (`dict`, *optional*):
+        Dictionary of configuration options used to initialize [`CLIPVisionConfig`].


ahhh I see, makes sense

yonigozlan · 2026-03-24T16:07:27Z

Btw, can you rebase main because I merged yesterday a PR changing annotations for architectures from ClassVar to normal field. I had to skip auto-doc complaining about it manually and left a TODO

I think It's already up to date with this PR, but now the logic is to go through the mro to get the attributes to document (while still excluding ClassVar), and stop when we reach PreTrainedConfig, so it should fix this issue as well (I'm not getting any warning or error with auto_docstring)
https://github.com/huggingface/transformers/pull/44803/changes#diff-d97925c2010758ce53c8478fd470afc267001018945b28f8e0ea73cf68005c28R4219-R4229

…heck-auto-docstrings

github-actions · 2026-03-24T16:10:42Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, albert, align, bamba, bigbird_pegasus, bit, bitnet, blip, blt, bridgetower, chameleon, chmv2, clap, clip, clvp, cohere

Support config in check_auto_docstrings, fix all configs docstrings

5d253cd

yonigozlan force-pushed the add-config-suport-check-auto-docstrings branch from 25eec1c to 5d253cd Compare March 17, 2026 22:41

yonigozlan requested review from ArthurZucker, Cyrilvallez, vasqu and zucchini-nlp and removed request for ArthurZucker, Cyrilvallez, vasqu and zucchini-nlp March 17, 2026 22:41

yonigozlan changed the title ~~Support Configs in check_auto_docstrings~~ [WIP] Support Configs in check_auto_docstrings Mar 17, 2026

Add checking and fixing in modular

61fd5ab

yonigozlan requested review from ArthurZucker, Cyrilvallez, vasqu and zucchini-nlp March 18, 2026 21:12

yonigozlan changed the title ~~[WIP] Support Configs in check_auto_docstrings~~ Support Modular (!!) + Configs in check_auto_docstrings Mar 18, 2026

yonigozlan and others added 4 commits March 18, 2026 21:27

fix style

f071e94

Merge branch 'main' into add-config-suport-check-auto-docstrings

8547915

fix document ownly own attributes in configs, not inherited ones

2653144

Merge branch 'add-config-suport-check-auto-docstrings' of https://git…

6aad563

…hub.com/yonigozlan/transformers into add-config-suport-check-auto-docstrings

ArthurZucker reviewed Mar 19, 2026

View reviewed changes

yonigozlan requested a review from ArthurZucker March 19, 2026 18:23

zucchini-nlp reviewed Mar 20, 2026

View reviewed changes

yonigozlan added 3 commits March 23, 2026 22:27

Merge remote-tracking branch 'upstream/main' into add-config-suport-c…

cea1014

…heck-auto-docstrings

Add back removed examples, look through mro until PreTrainedConfig to…

6837f51

… document config attributes

Update auto_docstring docs + nit

c7c98f9

yonigozlan requested a review from zucchini-nlp March 24, 2026 00:14

zucchini-nlp approved these changes Mar 24, 2026

View reviewed changes

yonigozlan added 2 commits March 24, 2026 16:09

fix doc

f03b128

Merge remote-tracking branch 'upstream/main' into add-config-suport-c…

a301a6f

…heck-auto-docstrings

nit test

83e8e00

yonigozlan enabled auto-merge March 24, 2026 16:13

yonigozlan added this pull request to the merge queue Mar 24, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 24, 2026

yonigozlan added this pull request to the merge queue Mar 24, 2026

Merged via the queue into huggingface:main with commit 28e1cc5 Mar 24, 2026
29 checks passed

yonigozlan deleted the add-config-suport-check-auto-docstrings branch March 24, 2026 17:59

		"""After fixing docstrings in a generated file, propagate the same fixes to the
		corresponding modular_*.py source file.

		<hfoption id="processor classes">

		Multimodal processors (`ProcessorMixin` subclasses, `processing_*.py`) always use bare `@auto_docstring`. The class intro is auto-generated. Only document `__init__` parameters not already covered by `ProcessorArgs` (`image_processor`, `tokenizer`, `chat_template`, etc.) — omit the docstring entirely if all parameters are standard. Decorate `__call__` with `@auto_docstring` as well; its body docstring contains only a `Returns:` section plus any extra model-specific call arguments. `return_tensors` is automatically appended.

Conversation

yonigozlan commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan commented Mar 19, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Mar 23, 2026

Uh oh!

yonigozlan commented Mar 24, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 24, 2026

yonigozlan commented Mar 17, 2026 •

edited

Loading

zucchini-nlp Mar 20, 2026 •

edited

Loading

yonigozlan Mar 23, 2026 •

edited

Loading

yonigozlan commented Mar 24, 2026 •

edited

Loading