fix(qwen): fix ghosting artifacts Qwen Image Edit by lstein · Pull Request #9155 · invoke-ai/InvokeAI

lstein · 2026-05-11T13:20:43Z

Summary

Qwen Image Edit was producing a ghost/doubling artifact across the whole frame outside the masked edit region. Root cause: qwen_image_denoise.py bilinear-resized the reference latents to the noisy latent's dimensions and used identical img_shapes tuples for both segments. QwenEmbedRope varies its spatial RoPE frequencies by each segment's H/W, so identical dims gave both segments the same spatial positions and cross-attention couldn't disentangle them.
The denoise now keeps reference latents at their own (H, W), packs them at those dims, and places (1, ref_h // 2, ref_w // 2) in the reference segment of img_shapes — matching diffusers' QwenImageEditPipeline (pipeline_qwenimage_edit.py:755-760) and QwenImageEditPlusPipeline (pipeline_qwenimage_edit_plus.py:743-751).
The reference qwen_image_i2l is now resized to ~1024² area preserving aspect ratio (matching diffusers' VAE_IMAGE_SIZE) so the reference token sequence stays in the distribution the model was trained on.
Defensive: qwen_image_image_to_latents.py bumps multiple_of from 8 to 16 so VAE-encoded latents always have even spatial dims (required by the 2×2 patch packing); a ValueError is raised if a directly-wired reference latent has spatial dims that align to less than 2.

Test plan

Backend: uv run pytest tests/app/invocations/test_qwen_image_denoise.py — 15 pass (8 new: _align_ref_latent_dims validation incl. zero-dim guard, _build_img_shapes distinct-dims regression)
Frontend: pnpm test:no-watch buildQwenImageGraph — 30 pass (6 new: calculateQwenImageEditRefDimensions matches diffusers calculate_dimensions for square/landscape/portrait/extreme-ratio inputs; verifies computed dims land on the ref i2l node)
Frontend lint: pnpm lint:eslint, pnpm lint:prettier, pnpm lint:knip, pnpm lint:tsc — clean
Backend lint: uv run ruff check / uv run ruff format --check on changed files — clean
Broader Qwen tests pass: text encoder, model loader, LoRA conversion utils, main config, GGUF variant detection, double variant regression
Manual: Using the full Qwen Image Edit diffusers model, upload a reference image and prompt for a targeted change, "e.g. change the subject's tee-shirt from blue to red.".
Manual: same with one of the quantized (GGUF) transformers, with and without applying a Lightning LoRA — confirm artifact is gone in both.
Manual: regular Qwen Image txt2img / img2img / inpaint / outpaint flows still produce expected output (the multiple_of=16 i2l change is a no-op for canvas-composited paths but worth a smoke test).
Manual: try with a reference image whose aspect ratio differs significantly from the output dimensions. It should renderwithout distortion.

🤖 Generated with Claude Code

…e Edit Qwen Image Edit was applying identical RoPE positions to the noisy and reference latent segments (both packed at the noisy latent's dimensions), so cross-attention couldn't disentangle them — reference content bled into the generation as a faintly offset ghost across the whole frame, outside the masked edit region. The denoise now keeps reference latents at their own (H, W) and uses those dims in the reference segment of img_shapes, matching diffusers' QwenImageEditPipeline / QwenImageEditPlusPipeline. The reference qwen_image_i2l is resized to ~1024² area preserving aspect ratio (matching diffusers' VAE_IMAGE_SIZE) so the reference token sequence stays in the distribution the model was trained on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

JPPhoto · 2026-05-11T14:05:05Z

One code issue to be resolved prior to functional testing:

invokeai/app/invocations/qwen_image_denoise.py:372-395, invokeai/app/invocations/qwen_image_image_to_latents.py:36-43,83-89, invokeai/frontend/web/src/features/nodes/util/graph/generation/buildQwenImageGraph.ts:198-210: the fix only clamps the reference image to the diffusers-style ~1024^2 size when the graph is built by the updated frontend. The backend qwen_image_denoise path now preserves and packs whatever reference latent size it is given, but qwen_image_i2l still defaults to encoding at the image's original size unless width and height are explicitly set. Direct API/manual graph callers, and any existing graph JSON that does not populate those fields, will still send native-resolution reference latents into the transformer. In those cases the model still receives out-of-distribution reference sequence lengths, so the original artifact can persist and VRAM usage can spike on large reference images. To expose this issue, add a backend integration test that wires qwen_image_i2l to qwen_image_denoise without frontend-provided width/height and asserts the ref segment is resized/clamped to the diffusers-derived dimensions instead of staying at native size.

The frontend resizes the reference image to ~1024² area before VAE encoding, but direct API callers and older graph JSON can wire qwen_image_i2l → qwen_image_denoise without explicit width/height, sending a native-resolution reference latent into the transformer. Without the clamp the model receives an out-of-distribution sequence length (artifact returns, VRAM spikes). Mirror diffusers' QwenImageEdit(Plus) VAE_IMAGE_SIZE behavior in latent space: bilinear-downscale the reference latent to calculate_dimensions(1024², aspect_ratio) snapped to multiples of 32 in pixel space (= multiples of 4 in latent space, so always packable). In-budget latents pass through untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lstein · 2026-05-11T14:55:41Z

Good catch, addressed in eaf116e — added _maybe_clamp_ref_latent_size in qwen_image_denoise.py that bilinear-downscales the reference latent to diffusers' calculate_dimensions(1024², aspect_ratio) (snapped to multiples of 32 in pixel space) whenever the input exceeds the budget. It runs before _align_ref_latent_dims, so direct API/manual graph callers, older graph JSON without width/height on qwen_image_i2l, or any other producer of native-resolution reference latents all end up at the same dims the frontend was already producing — and the same dims diffusers feeds the transformer. In-budget latents (≤ 128×128 = 1024² pixels) pass through untouched.

7 new unit tests in TestMaybeClampRefLatentSize cover the matrix you asked about: in-budget (unchanged), native landscape (1600×1200 → clamped to 1184×896 → 148×112 latent), native portrait, huge (4096² → 1024²), extreme aspect ratio (1920×1080 → 1376×768), and a parametric check that every clamp output is packable (multiple of 4 in latent space, so always even after _align_ref_latent_dims).

I didn't add a full I2L→denoise integration test because the i2l step requires loading the Qwen VAE model — heavy for CI. The unit tests assert the exact same invariant the integration test would: given a too-large reference latent, the dims fed to packing/img_shapes are the diffusers-derived ones, not native.

JPPhoto

I was able to do testing with the quantized edit model with and without the Lightning LoRA and it worked with a different source image aspect ratio than output. I think this change is good for the next RC and public testing!

lstein requested review from JPPhoto, Pfannkuchensack, blessedcoolant and dunkeroni as code owners May 11, 2026 13:20

github-actions Bot added python PRs that change python files invocations PRs that change invocations frontend PRs that change frontend files python-tests PRs that change python tests labels May 11, 2026

lstein changed the title ~~fix(qwen): use distinct img_shapes for reference latents in Qwen Image Edit~~ fix(qwen): fix ghosting artifacts Qwen Image Edit May 11, 2026

lstein assigned JPPhoto May 11, 2026

lstein added the v6.13.x label May 11, 2026

lstein added this to Invoke - Community Roadmap May 11, 2026

lstein moved this to 6.13.x Theme: MODELS in Invoke - Community Roadmap May 11, 2026

Merge branch 'main' into lstein/bugfix/qwen-image-edit-ghosting

b3fe77f

JPPhoto and others added 2 commits May 11, 2026 09:05

Merge branch 'main' into lstein/bugfix/qwen-image-edit-ghosting

b165bea

JPPhoto approved these changes May 11, 2026

View reviewed changes

Merge branch 'main' into lstein/bugfix/qwen-image-edit-ghosting

e907aa0

lstein assigned Pfannkuchensack May 12, 2026

JPPhoto added 2 commits May 11, 2026 22:27

Merge branch 'main' into lstein/bugfix/qwen-image-edit-ghosting

fce6792

Merge branch 'main' into lstein/bugfix/qwen-image-edit-ghosting

2168070

lstein merged commit 8f46d8b into main May 13, 2026
16 checks passed

lstein deleted the lstein/bugfix/qwen-image-edit-ghosting branch May 13, 2026 04:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qwen): fix ghosting artifacts Qwen Image Edit#9155

fix(qwen): fix ghosting artifacts Qwen Image Edit#9155
lstein merged 7 commits into
mainfrom
lstein/bugfix/qwen-image-edit-ghosting

lstein commented May 11, 2026 •

edited

Loading

Uh oh!

JPPhoto commented May 11, 2026

Uh oh!

lstein commented May 11, 2026

Uh oh!

JPPhoto left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lstein commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

JPPhoto commented May 11, 2026

Uh oh!

lstein commented May 11, 2026

Uh oh!

JPPhoto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lstein commented May 11, 2026 •

edited

Loading