Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: afmoe, cohere2, cwm, dots1, gemma2, gemma3, gemma3n, gpt_oss, minimax, ministral, olmo3, olmo_hybrid, qwen2, qwen2_5_omni, qwen2_5_vl, qwen2_vl |
|
This comment contains models: ["models/afmoe", "models/cohere2", "models/cwm", "models/dots1", "models/gemma2", "models/gemma3", "models/gemma3n", "models/gpt_oss", "models/minimax", "models/ministral", "models/olmo3", "models/olmo_hybrid", "models/qwen2", "models/qwen2_5_omni", "models/qwen2_5_vl", "models/qwen2_vl"] |
|
Can we merge #44699 first? |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen3_5, qwen3_5_moe, t5gemma, t5gemma2 |
What does this PR do?
Added Rule 11
forward() must not access non-nn.Module attributes on submodules (breaks pipeline parallelism with Identity replacement).
we want to make sure we just use metadata in config and elesewere when in that function