Open
Conversation
- Add .ai/pipelines.md: pipeline conventions and gotchas (config-derived values, no_grad discipline, reinventing scheduler logic, subclassing variants, # Copied from annotations). - models.md: add Attention masks subsection inside Attention pattern; fold reference-implementations skim into conventions; consolidate __init__.py / _import_structure gotchas; trim gotchas covered by AGENTS.md (silent fallbacks, config serialization gap) or pipelines.md (no_grad, guider/scheduler reuse). - review-rules.md: collapse to a short reviewer checklist that points into AGENTS / models / pipelines / modular gotchas; only LLM-specific pattern (ephemeral context) lives here directly. - AGENTS.md: collapse defensive-code / unused-params / backwards-compat / deprecation rules into one umbrella bullet; replace inline pipeline bullet list with a pointer to pipelines.md. - SKILL.md (model-integration): trim pre-PR self-review to a one-line pointer. Sourced from the ACE-Step PR (#13095) review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yiyixuxu
commented
Apr 27, 2026
Co-authored-by: YiYi Xu <yixu310@gmail.com>
yiyixuxu
commented
Apr 27, 2026
yiyixuxu
commented
Apr 27, 2026
| ## Common model conventions | ||
|
|
||
| - Models use `ModelMixin` with `register_to_config` for config serialization | ||
| When adding a new transformer (or reviewing one), skim `transformer_flux.py`, `transformer_flux2.py`, `transformer_qwenimage.py`, `transformer_wan.py` first to establish the pattern. Most conventions (mixin set, file structure, naming, gradient-checkpointing implementation, `_no_split_modules` / `_supports_cache_class` settings, etc.) are easiest to internalize by comparison rather than from a fixed list. |
Collaborator
Author
There was a problem hiding this comment.
I think it works better this way, we provide the list of canonical references for them and explicitly to ask them to derive common conventions themselves
yiyixuxu
commented
Apr 27, 2026
| @@ -0,0 +1,62 @@ | |||
| # Pipeline conventions and rules | |||
Collaborator
Author
There was a problem hiding this comment.
added a pipeline doc for agent
sayakpaul
reviewed
Apr 27, 2026
| - Support `generator` parameter for reproducibility | ||
| - Use `self.progress_bar(timesteps)` for progress tracking | ||
| - Don't subclass an existing pipeline for a variant — DO NOT use an existing pipeline class (e.g., `FluxPipeline`) to override another pipeline (e.g., `FluxImg2ImgPipeline`) which will be a part of the core codebase (`src`) | ||
| - See [pipelines.md](pipelines.md) for pipeline conventions, patterns, and gotchas. |
| ## Common model conventions | ||
|
|
||
| - Models use `ModelMixin` with `register_to_config` for config serialization | ||
| When adding a new transformer (or reviewing one), skim `transformer_flux.py`, `transformer_flux2.py`, `transformer_qwenimage.py`, `transformer_wan.py` first to establish the pattern. Most conventions (mixin set, file structure, naming, gradient-checkpointing implementation, `_no_split_modules` / `_supports_cache_class` settings, etc.) are easiest to internalize by comparison rather than from a fixed list. |
Member
There was a problem hiding this comment.
Suggested change
| When adding a new transformer (or reviewing one), skim `transformer_flux.py`, `transformer_flux2.py`, `transformer_qwenimage.py`, `transformer_wan.py` first to establish the pattern. Most conventions (mixin set, file structure, naming, gradient-checkpointing implementation, `_no_split_modules` / `_supports_cache_class` settings, etc.) are easiest to internalize by comparison rather than from a fixed list. | |
| * Models use `ModelMixin` with `register_to_config` for config serialization. | |
| * When adding a new transformer (or reviewing one), skim `src/diffusers/models/transformers/transformer_flux.py`, `src/diffusers/models/transformers/transformer_flux2.py`, `src/diffusers/models/transformers/transformer_qwenimage.py`, and `src/diffusers/models/transformers/transformer_wan.py` first to establish the pattern. Most conventions (mixin set, file structure, naming, gradient-checkpointing implementation, `_no_split_modules` settings, etc.) are easiest to internalize by comparison rather than from a fixed list. |
Member
There was a problem hiding this comment.
I removed _supports_cache_class because the attribute doesn't seem to exist on ModelMixin.
|
|
||
| - **No mask needed → pass `None`, not an all-zero tensor.** A dense 4D additive float mask of all `0.0` does no math but still hard-raises on `flash` / `_flash_3` / `_sage` (see `attention_dispatch.py:2328, 2544, 3266`). Only materialize a mask when it carries information. This is the Flux / Flux2 / Wan pattern: no mask, works on every backend, relies on the model having been trained tolerating consistent padding. | ||
| - **Padding mask → bool `(B, L)` or `(B, 1, 1, L)`.** Stays compatible with the `*_varlen` kernels via `_normalize_attn_mask` (`attention_dispatch.py:639`), which reduces bool masks to `cu_seqlens`. Dense additive-float masks *cannot* be reduced this way and so lose the varlen path. This is the Qwen pattern (`transformer_qwenimage.py:951`). | ||
| - **Structural mask (causal, sliding-window, band-diagonal) → dense `(1, 1, L, L)` is unavoidable.** Row-varying patterns can't be expressed as `(B, L)`. Expect SDPA/Flex-only for these layers; consider Flex's `sliding_window_mask_mod` or FA3's native `window_size=` kwarg if backend flexibility matters. |
Member
There was a problem hiding this comment.
Suggested change
| - **Structural mask (causal, sliding-window, band-diagonal) → dense `(1, 1, L, L)` is unavoidable.** Row-varying patterns can't be expressed as `(B, L)`. Expect SDPA/Flex-only for these layers; consider Flex's `sliding_window_mask_mod` or FA3's native `window_size=` kwarg if backend flexibility matters. | |
| - **Structural mask (causal, sliding-window, band-diagonal) → dense `(1, 1, L, L)` is unavoidable.** Row-varying patterns can't be expressed as `(B, L)`. Expect SDPA/Flex-only for these layers; consider Flex's `sliding_window_mask_mod` or FA3's native `window_size=` kwarg if backend flexibility matters. Consult `src/models/transformers/transformer_kandinsky.py` as a reference. |
| 6. **Config serialization gaps.** Every `__init__` parameter in a `ModelMixin` subclass must be captured by `register_to_config`. If you add a new param but forget to register it, `from_pretrained` will silently use the default instead of the saved value. | ||
|
|
||
| 7. **Forgetting to update `_import_structure` and `_lazy_modules`.** The top-level `src/diffusers/__init__.py` has both -- missing either one causes partial import failures. | ||
| 4. **Capability flags without matching implementation.** Class attributes like `_supports_cache_class`, `_no_split_modules`, `_supports_gradient_checkpointing`, `_supports_quantization` advertise a contract. Setting one without the corresponding logic (e.g. `_supports_gradient_checkpointing = True` with no `_set_gradient_checkpointing` method, or no `if gradient_checkpointing` branches in `forward`) means training code silently no-ops or errors deep inside `torch.utils.checkpoint`. For `_supports_cache_class` / `_no_split_modules` specifically, wrong values cause silent correctness bugs or OOM. Copy from a similar model, verify the corresponding logic is in place, and for inference-only ports just drop the flag. |
Member
There was a problem hiding this comment.
_supports_quantization -- we don't have this attribute.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.