Skip to content

Test models instantiate from default config on meta + fix failing models#46355

Open
ArthurZucker wants to merge 1 commit into
mainfrom
model-default-config-meta-init
Open

Test models instantiate from default config on meta + fix failing models#46355
ArthurZucker wants to merge 1 commit into
mainfrom
model-default-config-meta-init

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Adds test_can_be_initialized_on_meta_with_default_config (ModelTesterMixin): every architecture must build from its default config on meta. Fixes ~35 models that failed — eager kernel loading on init + config-default bugs.

Tested: pytest tests/models/ -k test_can_be_initialized_on_meta_with_default_config → all pass. AI-assisted; every line reviewed.

Add `test_can_be_initialized_on_meta_with_default_config` to `ModelTesterMixin`:
every architecture must build from its *default* (release-scale) config on the
`meta` device. The existing meta test only used the hand-tuned tiny config, which
hides config-default bugs.

Fixes for models that failed the new test:
- `lazy_load_kernel`: degrade to the Python path on *any* hub-resolution failure
  (was only `FileNotFoundError`/`AssertionError`), so model init never needs the
  network. Fixes the whole mamba/SSM-hybrid family on `meta`/offline.
- ~35 config-default fixes: `num_key_value_heads`/`head_dim` defaulting, `None`
  MoE sizes, `vocab_size`, special-token vs vocab size, `list`+`tuple` concat,
  `None` sub-config guards, `prediction_length`, default backbones, layer-type
  schedules, missing config fields/attributes, etc.
- Two test overrides for irreducible shared-config conflicts (one config serving
  mutually-exclusive heads): t5gemma (enc-dec vs encoder-only) and beit (backbone
  contract vs semantic-segmentation `out_indices`).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, autoformer, aya_vision, beit, bridgetower, chameleon, convnext, convnextv2, dbrx, deepseek_ocr2, dots1, emu3, esm, focalnet, granite4_vision, hiera

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants