Skip to content

feat: add Qwen Image 2512 txt2img support#132

Open
lstein wants to merge 12 commits intofeat/qwen-image-edit-2511from
feat/qwen-image-2512
Open

feat: add Qwen Image 2512 txt2img support#132
lstein wants to merge 12 commits intofeat/qwen-image-edit-2511from
feat/qwen-image-2512

Conversation

@lstein
Copy link
Copy Markdown
Owner

@lstein lstein commented Mar 27, 2026

Summary

Adds Qwen Image 2512 text-to-image support by reusing the existing Qwen Image Edit infrastructure. Both models share the same base type (qwen-image-edit) since they use identical architecture (transformer, VAE, text encoder, scheduler).

Depends on: #131 (Qwen Image Edit 2511)

Changes

  • Text encoder: Auto-selects prompt template based on whether reference images are provided. Edit mode uses the image-editing system prompt (drop_idx=64); generate mode uses the "describe the image" prompt (drop_idx=34).
  • Denoise: Detects zero_cond_t on the transformer to decide whether to concatenate reference latents. Txt2img models (zero_cond_t=False) pass only noisy patches with a single-entry img_shapes.
  • Model config: Accepts QwenImagePipeline in addition to QwenImageEditPlusPipeline for Diffusers model detection.
  • LoRA: Handles transformer. key prefix from some training frameworks; updated config detection.
  • Starter models: Qwen-Image-2512 full Diffusers + 4 GGUF variants (Q2_K, Q4_K_M, Q6_K, Q8_0) + Lightning V2.0 LoRAs (4-step, 8-step bf16), all added to the Qwen Image Edit bundle.

Testing

  1. Install "Qwen Image 2512" from Starter Models (or a GGUF variant + the Diffusers model as Component Source)
  2. Enter a text prompt and generate — no reference image needed
  3. Test with Lightning LoRA: Steps=4, CFG=1, Shift Override=3
  4. Verify the Qwen Image Edit model still works correctly with reference images

🤖 Generated with Claude Code

Shares the QwenImageEdit base type and infrastructure with the edit model.
Key changes:

- Text encoder: auto-selects prompt template based on reference images —
  edit template (drop_idx=64) when images present, generate template
  (drop_idx=34) when absent
- Denoise: detects zero_cond_t to determine whether to concatenate
  reference latents; txt2img models pass only noisy patches with a
  single-entry img_shapes
- Model config: accept QwenImagePipeline in addition to
  QwenImageEditPlusPipeline
- LoRA: handle "transformer." key prefix from some training frameworks,
  add to config detection
- Starter models: Qwen-Image-2512 full + 4 GGUF variants + Lightning
  V2.0 LoRAs (4-step, 8-step), all added to the Qwen Image Edit bundle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lstein lstein force-pushed the feat/qwen-image-2512 branch from dfe597f to 2f10d83 Compare March 28, 2026 02:53
lstein and others added 11 commits March 27, 2026 22:57
…geEditMainModelConfig)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The variant field with a default value was appended to the discriminator
tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model
detection for GGUF and Diffusers models. Making variant optional with
default=None restores the correct tags (main.gguf_quantized.qwen-image).

The variant is still set during Diffusers model probing via
_get_qwen_image_variant() and can be manually set for GGUF models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rename from qwen_image_edit -> qwen_image caused variable name
collisions with the txt2img starter models. Give edit models the
qwen_image_edit_* prefix to distinguish from qwen_image_* (txt2img).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…URLs

The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511'
inside the HuggingFace URLs, but the actual files on HF still have 'edit'
in their names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When switching from an edit model to a generate model, reference images
remain in state but the panel is hidden. Prevent them from being passed
to the text encoder and VAE encoder by checking the model variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The txt2img model doesn't use zero_cond_t — setting it causes the
transformer to double the timestep batch and create modulation indices
for non-existent reference patches, producing noise output. Now checks
the config variant before enabling it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n, shift)

- Save qwen_image_component_source, qwen_image_quantization, and
  qwen_image_shift in generation metadata
- Add metadata recall handlers so remix/recall restores these settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which
contain "transformer_blocks." as a substring, falsely matching the
Qwen Image LoRA detection. Add single_transformer_blocks to the Flux
exclusion set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant