[meta issue] Systematic model/pipeline review findings / tracking

# [meta issue] Systematic model/pipeline review findings / tracking

Commit tested: `0f1abc4ae8b0eb2a3b40e82a310507281144c423`

Review performed against the repository review rules.

## Summary

- Reviewed 76 model/pipeline/shared/infrastructure targets
- Aggregated 498 issue-level findings into recurring cross-family patterns
- Findings suggest systemic inconsistencies rather than isolated bugs

These patterns are already generating duplicate low-effort PRs (often agent-generated) for the same underlying issues, increasing maintainer review load without addressing root causes.

## Duplicate Check

Searches for broad/meta tracking issues or PRs did not find an existing systematic tracker. Some individual patterns are partially known through targeted issues/PRs, for example [#11762](https://github.com/huggingface/diffusers/issues/11762), [#9371](https://github.com/huggingface/diffusers/issues/9371), [#8989](https://github.com/huggingface/diffusers/issues/8989), [#12533](https://github.com/huggingface/diffusers/issues/12533), and [PR #13532](https://github.com/huggingface/diffusers/pull/13532), but those do not address the recurring root causes across families.

### Pattern 1: Batch and Conditioning Expansion Drift

Description:
Many pipelines accept batched prompts, images, masks, latents, or `num_images_per_prompt` / `num_videos_per_prompt`, but only expand part of the conditioning state.

Root cause:
Batch construction is duplicated per pipeline instead of enforced by a shared invariant after prompt/image/control/mask preparation.

Impact:
Incorrect conditioning, crashes, silently ignored extra outputs, and non-reproducible batched generation.

Representative examples:
- [wan review](https://github.com/huggingface/diffusers/issues/13578)
- [hunyuan_video1_5 review](https://github.com/huggingface/diffusers/issues/13582)
- [flux review](https://github.com/huggingface/diffusers/issues/13584)
- [consisid review](https://github.com/huggingface/diffusers/issues/13586)
- [cogvideo review](https://github.com/huggingface/diffusers/issues/13622)

### Pattern 2: Public Arguments Accepted but Ignored

Description:
Several public APIs validate or document arguments such as `latents`, `attention_kwargs`, `cross_attention_kwargs`, `max_sequence_length`, `timesteps`, `num_frames`, masks, or callbacks, but do not actually consume them.

Root cause:
Signatures and validation are often copied from related pipelines without shared checks that accepted inputs affect execution.

Impact:
Silent no-op behavior is worse than an explicit error because users believe they controlled generation when they did not.

Representative examples:
- [stable_diffusion review](https://github.com/huggingface/diffusers/issues/13592)
- [helios review](https://github.com/huggingface/diffusers/issues/13604)
- [hidream_image review](https://github.com/huggingface/diffusers/issues/13617)
- [ovis_image review](https://github.com/huggingface/diffusers/issues/13630)
- [controlnet review](https://github.com/huggingface/diffusers/issues/13600)

### Pattern 3: Mask Handling Is Inconsistent Across Layers

Description:
`attention_mask`, prompt masks, VAE masks, IP-Adapter masks, and padding masks are frequently accepted but dropped, duplicated in the wrong order, or passed into attention code with incompatible shapes.

Root cause:
Mask semantics are not centralized. Pipeline encoders, model forwards, and custom attention processors each implement partial conventions.

Impact:
Padded tokens can affect outputs, regional conditioning can silently fail, and valid shorter masks can crash.

Representative examples:
- [qwenimage review](https://github.com/huggingface/diffusers/issues/13581)
- [ltx review](https://github.com/huggingface/diffusers/issues/13585)
- [stable_diffusion review](https://github.com/huggingface/diffusers/issues/13592)
- [audioldm2 review](https://github.com/huggingface/diffusers/issues/13602)
- [model_transformers_shared review](https://github.com/huggingface/diffusers/issues/13651)

### Pattern 4: Optional Parameters Are Not Actually Optional

Description:
Documented defaults such as `None`, omitted optional dependencies, or default constructor values often crash before fallback logic runs.

Root cause:
Validation order and `kwargs.pop(...)` patterns assume loader or caller internals rather than the public API contract.

Impact:
Public APIs fail on documented paths, offline/local-only workflows can unexpectedly hit the network, and dependency errors become confusing Python exceptions.

Representative examples:
- [pipeline_infrastructure review](https://github.com/huggingface/diffusers/issues/13653)
- [cosmos review](https://github.com/huggingface/diffusers/issues/13607)
- [controlnet review](https://github.com/huggingface/diffusers/issues/13600)
- [visualcloze review](https://github.com/huggingface/diffusers/issues/13623)
- [stable_diffusion review](https://github.com/huggingface/diffusers/issues/13592)

### Pattern 5: Dtype, Device, and Config Assumptions Leak

Description:
Provided tensors are often not moved/cast to execution dtype, helpers create float64/float32 tensors unconditionally, and pipelines hardcode VAE scale factors or latent channel counts.

Root cause:
Low-level model/config invariants are not enforced at pipeline boundaries, and shared dtype/device helpers are used unevenly.

Impact:
Mixed precision, NPU/MPS, CPU offload, `device_map`, and reproducibility paths fail or produce inconsistent behavior.

Representative examples:
- [longcat_audio_dit review](https://github.com/huggingface/diffusers/issues/13580)
- [glm_image review](https://github.com/huggingface/diffusers/issues/13587)
- [stable_cascade review](https://github.com/huggingface/diffusers/issues/13589)
- [lumina2 review](https://github.com/huggingface/diffusers/issues/13613)
- [model_infrastructure review](https://github.com/huggingface/diffusers/issues/13655)

### Pattern 6: Output and Cleanup Contracts Diverge

Description:
`output_type="latent"`, `return_dict=False`, output class exports, lazy imports, watermarking, and `maybe_free_model_hooks()` are handled differently across related families.

Root cause:
Finalization branches are duplicated and often return early before shared cleanup/output wrapping.

Impact:
Offload hooks can leak, return types become non-standard, imports fail, and downstream code cannot rely on pipeline output contracts.

Representative examples:
- [ernie-image review](https://github.com/huggingface/diffusers/issues/13577)
- [flux2 review](https://github.com/huggingface/diffusers/issues/13579)
- [stable_diffusion_xl review](https://github.com/huggingface/diffusers/issues/13610)
- [t2i_adapter review](https://github.com/huggingface/diffusers/issues/13626)
- [stable_video_diffusion review](https://github.com/huggingface/diffusers/issues/13627)

### Pattern 7: Validation Does Not Match Runtime Requirements

Description:
Input validation accepts dimensions, scheduler paths, image types, or tensor/list combinations that later fail in patchification, latent packing, scheduler stepping, or preprocessing.

Root cause:
Validation is copied from neighboring pipelines instead of derived from actual transformer patch size, VAE scale factor, scheduler requirements, and supported input processors.

Impact:
Users get late runtime failures, silent truncation, or invalid generation states instead of actionable input errors.

Representative examples:
- [flux review](https://github.com/huggingface/diffusers/issues/13584)
- [cogvideo review](https://github.com/huggingface/diffusers/issues/13622)
- [pixart_alpha review](https://github.com/huggingface/diffusers/issues/13631)
- [cosmos review](https://github.com/huggingface/diffusers/issues/13607)
- [mochi review](https://github.com/huggingface/diffusers/issues/13615)

### Pattern 8: Copy-Paste Divergence and Hidden Coupling

Description:
Variant pipelines drift from base pipelines, modular pipelines import classic pipeline internals, and generated docs or TODO placeholders remain in user-facing artifacts.

Root cause:
Families evolve through parallel copies rather than shared helpers or parity tests. Modular and classic implementations are not cleanly separated.

Impact:
Fixes land in one variant but not another, refactors create hidden breakage, and docs/tests stop reflecting actual public APIs.

Representative examples:
- [flux review](https://github.com/huggingface/diffusers/issues/13584)
- [pag review](https://github.com/huggingface/diffusers/issues/13594)
- [cogview4 review](https://github.com/huggingface/diffusers/issues/13621)
- [flux2 review](https://github.com/huggingface/diffusers/issues/13579)
- [modular_pipeline_infrastructure review](https://github.com/huggingface/diffusers/issues/13650)

### Pattern 9: Shared Infrastructure Invariants Are Weak

Description:
Shared model/pipeline APIs assume attention processors, cache contexts, offload hooks, QKV fuse/unfuse state, lazy exports, and `_no_split_modules` metadata are implemented consistently.

Root cause:
Mixins expose common public APIs, but custom model families can bypass required integration points without a shared compliance test.

Impact:
Optimization APIs become unreliable across families, and failures show up only when users enable attention backends, offload, parallelism, or device maps.

Representative examples:
- [model_infrastructure review](https://github.com/huggingface/diffusers/issues/13655)
- [stable_diffusion review](https://github.com/huggingface/diffusers/issues/13592)
- [ddim review](https://github.com/huggingface/diffusers/issues/13591)
- [stable_audio review](https://github.com/huggingface/diffusers/issues/13629)
- [hunyuandit review](https://github.com/huggingface/diffusers/issues/13641)

### Pattern 10: Slow and Integration Coverage Is Uneven

Description:
Fast tests often exist, but many are dummy-only, skipped, placeholder-based, nightly-only, or absent for public variants. Slow tests are missing for many real checkpoint paths.

Root cause:
Coverage is family-local and variant-local; there is no enforced matrix for exported public pipelines/models, real checkpoint smoke tests, output contracts, dtype/device paths, and batch/CFG behavior.

Impact:
Bugs survive in exactly the paths users exercise: real tokenizers, real schedulers, offload, mixed precision, latent outputs, batched generation, and model loading.

Representative examples:
- [flux review](https://github.com/huggingface/diffusers/issues/13584)
- [sana_video review](https://github.com/huggingface/diffusers/issues/13606)
- [helios review](https://github.com/huggingface/diffusers/issues/13604)
- [mochi review](https://github.com/huggingface/diffusers/issues/13615)
- [model_infrastructure review](https://github.com/huggingface/diffusers/issues/13655)

Many of these issues can be addressed at the shared/infrastructure layer (e.g. batch construction, mask propagation, dtype/device normalization) rather than per-pipeline. Fixing them centrally would eliminate repeated PRs and prevent reintroduction across families.

## Cross-Layer Connections

- Mask bugs repeatedly cross the pipeline/model boundary: pipelines build masks, but model forwards or attention processors drop or reshape them inconsistently.
- Dtype/device bugs appear both in pipeline inputs and shared model helpers, suggesting shared casting/config enforcement should happen before family-specific code runs.
- Attention backend issues are model-level omissions that surface as pipeline API failures because public backend toggles appear to succeed.
- Modular pipeline issues connect generated docs, block IO contracts, classic-pipeline imports, and infrastructure selection logic.

## Test Coverage Analysis

Fast tests are present for many families, but they often cover tiny happy paths and do not exercise real checkpoint loading, public variant exports, mixed precision, CPU offload, callback mutation, or batch/CFG edge cases.

Slow/integration gaps correlate strongly with discovered bugs. Families with missing or weak slow coverage repeatedly contain failures in `num_images_per_prompt`, `num_videos_per_prompt`, `output_type="latent"`, precomputed embeddings, and real tokenizer/scheduler behavior.

Explicit skipped TODO slow tests were called out for:
- [helios](https://github.com/huggingface/diffusers/issues/13604)
- [sana_video](https://github.com/huggingface/diffusers/issues/13606)

Other weak-test patterns include placeholder assertions in [consisid](https://github.com/huggingface/diffusers/issues/13586), random/placeholder expected outputs in [mochi](https://github.com/huggingface/diffusers/issues/13615), passing TODO stubs in [hunyuandit](https://github.com/huggingface/diffusers/issues/13641), skipped offload/batch paths in [shap_e](https://github.com/huggingface/diffusers/issues/13593), and non-meaningful decode coverage in [allegro](https://github.com/huggingface/diffusers/issues/13647).

## Suggested Prioritization

1. Batch/conditioning invariants (Pattern 1)
2. Ignored public arguments (Pattern 2)
3. Mask propagation (Pattern 3)
4. Dtype/device normalization (Pattern 5)
5. Optional parameter handling (Pattern 4)
6. Shared infrastructure invariants (Pattern 9)
7. Validation/runtime alignment (Pattern 7)
8. Output/cleanup consistency (Pattern 6)
9. Copy-paste divergence (Pattern 8)
10. Test coverage (Pattern 10)

## Tracking

- [ ] #13577 ernie-image
- [ ] #13578 wan
- [ ] #13579 flux2
- [ ] #13580 longcat_audio_dit
- [ ] #13581 qwenimage
- [ ] #13582 hunyuan_video1_5
- [ ] #13583 z_image
- [ ] #13584 flux
- [ ] #13585 ltx
- [ ] #13586 consisid
- [ ] #13587 glm_image
- [ ] #13588 hunyuan_video
- [ ] #13589 stable_cascade
- [ ] #13590 nucleusmoe_image
- [ ] #13591 ddim
- [ ] #13592 stable_diffusion
- [ ] #13593 shap_e
- [ ] #13594 pag
- [ ] #13595 latte
- [ ] #13596 kandinsky2_2
- [ ] #13597 kandinsky
- [ ] #13598 llada2
- [ ] #13599 animatediff
- [ ] #13600 controlnet
- [ ] #13601 ltx2
- [ ] #13602 audioldm2
- [ ] #13603 latent_diffusion
- [ ] #13604 helios
- [ ] #13605 hunyuan_image
- [ ] #13606 sana_video
- [ ] #13607 cosmos
- [ ] #13608 prx
- [ ] #13609 skyreels_v2
- [ ] #13610 stable_diffusion_xl
- [ ] #13611 stable_diffusion_3
- [ ] #13612 kolors
- [ ] #13613 lumina2
- [ ] #13614 sana
- [ ] #13615 mochi
- [ ] #13616 dit
- [ ] #13617 hidream_image
- [ ] #13618 bria_fibo
- [ ] #13619 chroma
- [ ] #13620 chronoedit
- [ ] #13621 cogview4
- [ ] #13622 cogvideo
- [ ] #13623 visualcloze
- [ ] #13624 aura_flow
- [ ] #13625 bria
- [ ] #13626 t2i_adapter
- [ ] #13627 stable_video_diffusion
- [ ] #13628 marigold
- [ ] #13629 stable_audio
- [ ] #13630 ovis_image
- [ ] #13631 pixart_alpha
- [ ] #13632 lucy
- [ ] #13633 omnigen
- [ ] #13634 lumina
- [ ] #13635 ledits_pp
- [ ] #13636 longcat_image
- [ ] #13637 latent_consistency_models
- [ ] #13638 easyanimate
- [ ] #13639 kandinsky5
- [ ] #13640 kandinsky3
- [ ] #13641 hunyuandit
- [ ] #13643 consistency_models
- [ ] #13646 deepfloyd_if
- [ ] #13647 allegro
- [ ] #13648 cogview3
- [ ] #13649 ddpm
- [ ] #13650 modular_pipeline_infrastructure
- [ ] #13651 model_transformers_shared
- [ ] #13652 model_autoencoders_shared
- [ ] #13653 pipeline_infrastructure
- [ ] #13654 model_unets_shared
- [ ] #13655 model_infrastructure

This issue is intended as a tracking and coordination layer for already identified problems. Individual issues contain reproductions and fixes and can be addressed incrementally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[meta issue] Systematic model/pipeline review findings / tracking #13656

[meta issue] Systematic model/pipeline review findings / tracking

Summary

Duplicate Check

Pattern 1: Batch and Conditioning Expansion Drift

Pattern 2: Public Arguments Accepted but Ignored

Pattern 3: Mask Handling Is Inconsistent Across Layers

Pattern 4: Optional Parameters Are Not Actually Optional

Pattern 5: Dtype, Device, and Config Assumptions Leak

Pattern 6: Output and Cleanup Contracts Diverge

Pattern 7: Validation Does Not Match Runtime Requirements

Pattern 8: Copy-Paste Divergence and Hidden Coupling

Pattern 9: Shared Infrastructure Invariants Are Weak

Pattern 10: Slow and Integration Coverage Is Uneven

Cross-Layer Connections

Test Coverage Analysis

Suggested Prioritization

Tracking

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[meta issue] Systematic model/pipeline review findings / tracking #13656

Description

[meta issue] Systematic model/pipeline review findings / tracking

Summary

Duplicate Check

Pattern 1: Batch and Conditioning Expansion Drift

Pattern 2: Public Arguments Accepted but Ignored

Pattern 3: Mask Handling Is Inconsistent Across Layers

Pattern 4: Optional Parameters Are Not Actually Optional

Pattern 5: Dtype, Device, and Config Assumptions Leak

Pattern 6: Output and Cleanup Contracts Diverge

Pattern 7: Validation Does Not Match Runtime Requirements

Pattern 8: Copy-Paste Divergence and Hidden Coupling

Pattern 9: Shared Infrastructure Invariants Are Weak

Pattern 10: Slow and Integration Coverage Is Uneven

Cross-Layer Connections

Test Coverage Analysis

Suggested Prioritization

Tracking

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions