feat: add Qwen Image 2512 txt2img support by lstein · Pull Request #132 · lstein/InvokeAI

lstein · 2026-03-27T23:57:23Z

Summary

Adds Qwen Image 2512 text-to-image support by reusing the existing Qwen Image Edit infrastructure. Both models share the same base type (qwen-image-edit) since they use identical architecture (transformer, VAE, text encoder, scheduler).

Depends on: #131 (Qwen Image Edit 2511)

Changes

Text encoder: Auto-selects prompt template based on whether reference images are provided. Edit mode uses the image-editing system prompt (drop_idx=64); generate mode uses the "describe the image" prompt (drop_idx=34).
Denoise: Detects zero_cond_t on the transformer to decide whether to concatenate reference latents. Txt2img models (zero_cond_t=False) pass only noisy patches with a single-entry img_shapes.
Model config: Accepts QwenImagePipeline in addition to QwenImageEditPlusPipeline for Diffusers model detection.
LoRA: Handles transformer. key prefix from some training frameworks; updated config detection.
Starter models: Qwen-Image-2512 full Diffusers + 4 GGUF variants (Q2_K, Q4_K_M, Q6_K, Q8_0) + Lightning V2.0 LoRAs (4-step, 8-step bf16), all added to the Qwen Image Edit bundle.

Testing

Install "Qwen Image 2512" from Starter Models (or a GGUF variant + the Diffusers model as Component Source)
Enter a text prompt and generate — no reference image needed
Test with Lightning LoRA: Steps=4, CFG=1, Shift Override=3
Verify the Qwen Image Edit model still works correctly with reference images

🤖 Generated with Claude Code

Shares the QwenImageEdit base type and infrastructure with the edit model. Key changes: - Text encoder: auto-selects prompt template based on reference images — edit template (drop_idx=64) when images present, generate template (drop_idx=34) when absent - Denoise: detects zero_cond_t to determine whether to concatenate reference latents; txt2img models pass only noisy patches with a single-entry img_shapes - Model config: accept QwenImagePipeline in addition to QwenImageEditPlusPipeline - LoRA: handle "transformer." key prefix from some training frameworks, add to config detection - Starter models: Qwen-Image-2512 full + 4 GGUF variants + Lightning V2.0 LoRAs (4-step, 8-step), all added to the Qwen Image Edit bundle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…geEditMainModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The variant field with a default value was appended to the discriminator tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model detection for GGUF and Diffusers models. Making variant optional with default=None restores the correct tags (main.gguf_quantized.qwen-image). The variant is still set during Diffusers model probing via _get_qwen_image_variant() and can be manually set for GGUF models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The rename from qwen_image_edit -> qwen_image caused variable name collisions with the txt2img starter models. Give edit models the qwen_image_edit_* prefix to distinguish from qwen_image_* (txt2img). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…URLs The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511' inside the HuggingFace URLs, but the actual files on HF still have 'edit' in their names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When switching from an edit model to a generate model, reference images remain in state but the panel is hidden. Prevent them from being passed to the text encoder and VAE encoder by checking the model variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The txt2img model doesn't use zero_cond_t — setting it causes the transformer to double the timestep batch and create modulation indices for non-existent reference patches, producing noise output. Now checks the config variant before enabling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…n, shift) - Save qwen_image_component_source, qwen_image_quantization, and qwen_image_shift in generation metadata - Add metadata recall handlers so remix/recall restores these settings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which contain "transformer_blocks." as a substring, falsely matching the Qwen Image LoRA detection. Add single_transformer_blocks to the Flux exclusion set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added python invocations backend labels Mar 27, 2026

lstein force-pushed the feat/qwen-image-2512 branch from d7ff2ef to 1426ede Compare March 28, 2026 02:17

github-actions bot added services frontend labels Mar 28, 2026

lstein force-pushed the feat/qwen-image-2512 branch from dfe597f to 2f10d83 Compare March 28, 2026 02:53

lstein and others added 11 commits March 27, 2026 22:57

chore: ruff & lint:prettier

8b9e36f

fix: remove unused frontend exports (zQwenImageVariantType, isQwenIma…

25b45ca

…geEditMainModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove unnecessary async from QwenImageComponentSource parse

18d038c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: ruff

5c6ca30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Qwen Image 2512 txt2img support#132

feat: add Qwen Image 2512 txt2img support#132
lstein wants to merge 12 commits intofeat/qwen-image-edit-2511from
feat/qwen-image-2512

lstein commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lstein commented Mar 27, 2026

Summary

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant