Skip to content

Wan 2.2 stack: txt2img, img2img, inpaint, outpaint, LoRA support#9158

Open
lstein wants to merge 13 commits into
invoke-ai:mainfrom
lstein:lstein/feature/wan-image-2-2
Open

Wan 2.2 stack: txt2img, img2img, inpaint, outpaint, LoRA support#9158
lstein wants to merge 13 commits into
invoke-ai:mainfrom
lstein:lstein/feature/wan-image-2-2

Conversation

@lstein
Copy link
Copy Markdown
Collaborator

@lstein lstein commented May 12, 2026

Summary

Adds end-to-end support for the Wan 2.2 family of image-generation models to InvokeAI's linear view and canvas, covering all three variants (T2V A14B, I2V A14B, TI2V-5B) in both Diffusers and GGUF formats.

Background Info

Wan 2.2 is the last version of the Wan family to be released with open weights. It is primarily designed for video generation, but can be coerced into generating (pretty nice) images by requesting one video frame. The model comes in three flavors:

  • TI2V-5B -- a 5B parameter model that renders quickly but has middling esthetics; it generates both text2video and image2video (where a provided reference image becomes the first frame of the video)
  • T2V A14B -- a 14B parameter model designed for text2video
  • I2V A14B -- a 14B parameter model designed for image2video

Both A14B models come with a pair of denoiser transformers, similar to the SDXL main model/refiner model split. The "high noise" transformer is applied for the first set of steps and is then replaced with a "low noise" transformer for the remaining steps to add focus, texture and detail. You can run with just the high noise transformer, but the resulting image will be lacking in detail. When running the full diffusers models, the generation pipeline will switch from the high to low noise transformer silently.

Both the 5B and A14B models have quant variants that can be downloaded from the QuantStack repo on HuggingFace. The standalone A14B GGUF transformers come in "high noise" and "low noise" pairs. For best results, you must install and run both members of the pair. The starter models system makes sure that if you download the high noise transformer, you will get the low noise transformer as well, along with a compatible standalone VAE and encoder. Q5_M quants are installed by the Wan starter bundle, and Q8 quants are available as non-bundled starter models.

Typical settings for Wan 2.2 models are:

  • Steps: 30-40
  • CFG (high noise): 4
  • CFG (low noise): 3

These are very large models that render slowly. However, you can install a pair of Lightning LoRAs that will reduce the number of steps to 4. You will need two of these, one for high noise and the other for low noise. The LoRAs can be found in the HuggingFace lightx2 repo. They are installed by default in the starter bundle.

When using the Lightning LoRAs:

  • Steps: 4
  • CFG (high noise): 1
  • CFG (low noise): 1

Details

Generation modes

  • txt2img, img2img, inpaint, and outpaint for T2V A14B and TI2V-5B (canvas + linear view).
  • I2V A14B via the global Reference Images panel. See note

Model format support

  • Diffusers Wan mains (T2V A14B, I2V A14B, TI2V-5B).
  • GGUF transformer mains for all three variants. A14B GGUFs come as expert pairs (high-noise + low-noise); the linear view wires both automatically. The low-noise expert is optional — if omitted, the loader runs the high-noise expert for the full schedule.
  • Standalone Wan VAE and standalone UMT5-XXL T5 encoder selectable in the Advanced accordion (Qwen pattern). Backend loader priority: standalone > main (if Diffusers) > Component Source.

LoRAs

  • Wan LoRA collection loader with dual-expert routing — LoRAs tagged `high` / `low` by filename heuristic land on the matching expert; untagged LoRAs apply to both.
  • Variant-aware filtering at both the LoRA picker and graph-build time: A14B (inner_dim=5120) and TI2V-5B (inner_dim=3072) LoRAs are not interchangeable, so the wrong-variant entries are hidden and any that slip through are dropped with a warning rather than crashing the layer patcher.

UX polish

  • Reference Images panel surfaces only for the I2V A14B variant (T2V and TI2V-5B hide it since they don't consume ref images).
  • Auto-defaults the Component Source, standalone VAE, and standalone T5 encoder when the user picks a GGUF Wan main, matching the Component Source by variant family (A14B vs TI2V-5B) so the VAE channel count is correct.
  • Readiness check blocks enqueue when a GGUF main is selected without a VAE/encoder source (standalone or Component Source).
  • Metadata recall handlers for the Wan-specific fields (low-noise transformer, Component Source, standalone VAE / T5 encoder, low-noise CFG).

Starter pack

  • Wan 2.2 bundle: T2V A14B Q4_K_M + Q8_0 expert pairs, Lightning V1.1 LoRA pair, standalone A14B VAE, standalone UMT5-XXL encoder.
  • Additional browseable starter models: full Diffusers T2V/I2V A14B + TI2V-5B; I2V A14B Q4_K_M + Q8_0 expert pairs + Lightning V1 LoRA pair; TI2V-5B Q4_K_M + Q8_0; standalone TI2V-5B VAE.

Docs

  • Adds Wan 2.2 rows to the hardware requirements table.

Note on I2V

The I2V model accepts a single reference image and a prompt and modifies the image accordingly. However, it was designed to generate videos in which the first frame is the same as the reference image and the prompt is applied in subsequent frames. Unfortunately I discovered that since InvokeAI requests a single frame, I2V always gives you back the unmodified reference image. This is not very useful at the moment, but I am leaving support for this model in there in the event that we start generating video at some point.

Test plan

  • Linear view: T2V A14B Diffusers — txt2img produces an image.
  • Linear view: T2V A14B GGUF (Q4_K_M expert pair) with Lightning LoRA pair, 4 steps, CFG=1 — produces an image quickly.
  • Linear view: TI2V-5B Diffusers — txt2img produces an image.
  • Linear view: TI2V-5B GGUF (Q4_K_M) — txt2img produces an image with the auto-filled Component Source.
  • Linear view: I2V A14B Diffusers + Reference Image panel — produces an image conditioned on the ref image.
  • Canvas: T2V A14B img2img / inpaint / outpaint round-trip correctly with multiple-of-16 dimensions.
  • Switching A14B GGUF → TI2V-5B GGUF re-points the Component Source to a TI2V-5B Diffusers if one is installed (variant-family match).
  • LoRA picker hides A14B LoRAs when a TI2V-5B main is selected and vice versa.
  • Readiness fails with a clear reason when a GGUF main has no Component Source AND no standalone VAE/encoder selected.
  • Recall from a Wan-generated image populates main, low-noise transformer, Component Source, standalone VAE, standalone T5 encoder, low-noise CFG, LoRAs.
  • Starter pack: installing the Wan 2.2 bundle pulls in all 8 entries with their dependencies resolved.
  • Backend: all `tests/backend/model_manager` config tests pass (138 tests).

🤖 Generated with Claude Code

lstein and others added 9 commits May 9, 2026 10:33
Foundation + TI2V-5B MVP + A14B dual-expert MoE for Wan 2.2 image
generation. Wan was trained on video but is competitive with leading
open-source image models when run at num_frames=1; this commit wires
that path into InvokeAI.

Phase 0 — Foundation:
- BaseModelType.Wan + WanVariantType {T2V_A14B, TI2V_5B}
- SubModelType.Transformer2 for the dual-expert MoE
- MainModelDefaultSettings per variant
- step_callback Wan branch (16-channel preview; 48-channel TI2V-5B
  falls back to slicing first 16 channels until proper factors land)
- Frontend enums + node colour

Phase 1 — TI2V-5B Diffusers MVP:
- Main_Diffusers_Wan_Config probe (variant from transformer_2/ +
  vae/config.json::z_dim, with filename heuristic fallback)
- WanDiffusersModel loader (subclasses GenericDiffusersLoader)
- WanT5EncoderField, WanTransformerField (with dual-expert slots),
  WanConditioningField, WanConditioningInfo
- New invocations: wan_model_loader, wan_text_encoder, wan_denoise,
  wan_image_to_latents, wan_latents_to_image
- FlowMatchEulerDiscreteScheduler integration with on-disk config load
- RectifiedFlowInpaintExtension reused for inpaint
- 5D <-> 4D shape juggling: latents stay 4D in InvokeAI's pipeline,
  re-add T=1 only inside the transformer call / VAE encode-decode

Phase 2 — A14B dual-expert MoE:
- Probe reads boundary_ratio from model_index.json
- Loader emits both transformer (high-noise) and transformer_low_noise
  (low-noise expert at transformer_2/) for A14B
- _ExpertSwapper in wan_denoise drives GPU residency between experts:
  high-noise for t >= boundary_ratio * num_train_timesteps, low-noise
  below. Only one expert locked at a time so the cache can evict the
  other - relies on existing CachedModelWithPartialLoad to handle
  oversized models on lower-VRAM GPUs.
- guidance_scale_low_noise field for separate low-noise CFG override

Tests:
- 24 passing tests covering probe variant detection, default settings,
  noise sampling, end-to-end denoise on a synthetic transformer (CPU),
  dual-expert boundary swap, CFG branch
- 1 heavy-test placeholder gated by INVOKEAI_HEAVY_TESTS=1 for the
  real-weights smoke test

Phase 3+ deferred: standalone VAE/encoder configs, GGUF, LoRA,
ControlNet, ref image, inpaint UI, frontend wiring, starter models.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 adds standalone VAE and UMT5-XXL encoder configs so users can run
GGUF-quantized Wan transformers (Phase 4) without installing the full
~30 GB Diffusers pipeline.

VAE configs:
- VAE_Checkpoint_Wan_Config + VAE_Diffusers_Wan_Config (16-channel A14B
  vs 48-channel TI2V-5B, distinguished by decoder.conv_in z_dim).
- 16-channel files share the AutoencoderKLWan architecture with Qwen
  Image; disambiguated via filename heuristic ("wan" in name -> Wan,
  otherwise -> Qwen Image). Mirror exclusion in QwenImage's probe.
- VAELoader gets a Wan branch that builds AutoencoderKLWan(z_dim=...)
  via init_empty_weights, mirroring the QwenImage single-file pattern.
- Existing standard VAE probe excludes both QwenImage- and Wan-style
  state dicts.

UMT5-XXL encoder:
- New ModelType.WanT5Encoder + ModelFormat.WanT5Encoder.
- WanT5Encoder_WanT5Encoder_Config probes the diffusers folder layout
  (text_encoder/config.json with model_type=umt5, or flat layout with
  config.json at root). Refuses full Wan pipelines.
- WanT5EncoderLoader handles both layouts and loads UMT5EncoderModel +
  AutoTokenizer.

Component-source plumbing:
- WanModelLoaderInvocation now exposes wan_t5_encoder_model and
  component_source pickers (mirrors QwenImage pattern). Resolution
  order: standalone > main (if Diffusers) > component_source. Required
  when the main model is a single-file format in Phase 4.

Bug fix in wan_text_encoder:
- Tokenizer was loading via AutoTokenizer.from_pretrained(<root>)
  directly, which fails for nested layouts where files live in
  <root>/tokenizer/. Now routed through the model cache so the
  registered loaders handle layout differences correctly.

Frontend:
- New type guards (isWanVAEModelConfig, isWanT5EncoderModelConfig,
  isWanMainModelConfig, isWanDiffusersMainModelConfig) and hooks/
  selectors (useWanVAEModels, useWanT5EncoderModels,
  useWanDiffusersModels). New zSubModelType / zModelType / zModelFormat
  enum entries for transformer_2 and wan_t5_encoder.

Tests:
- 16 new tests covering z_dim detection, VAE checkpoint/diffusers
  probes, the bidirectional Qwen-vs-Wan filename deferral, and the
  UMT5 encoder probe (nested + flat + T5 + full-pipeline rejection).
- Total Wan test count: 41 passing, 1 heavy-test placeholder skipped.
- Full config test suite (63 tests) still passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): unbreak frontend lint after Wan additions

Five issues turned up running `make frontend-lint`:

1. wan_denoise.py used `from __future__ import annotations`, which made
   the `invoke()` return annotation a string ('LatentsOutput'). The
   InvocationRegistry's `get_output_annotation()` returns the raw
   annotation, so OpenAPI generation crashed with
   `'str' object has no attribute '__name__'`. Removed the future-import
   and added `Any` to the typing imports.

2. ModelRecordChanges.variant didn't list WanVariantType, so the
   generated schema's install/update endpoints rejected `t2v_a14b` and
   `ti2v_5b`. Added it.

3. Regenerated frontend/web/src/services/api/schema.ts from the live
   backend so it now includes BaseModelType.wan, ModelType.wan_t5_encoder,
   SubModelType.transformer_2, ModelFormat.wan_t5_encoder, the Wan
   variants, all Wan invocation types and their conditioning/transformer
   field types.

4. modelManagerV2/models.ts: added `wan_t5_encoder` to the category map,
   `wan` to the base color/long-name/short-name maps, the two Wan
   variants to the variant-name map, and `wan_t5_encoder` to the
   format-name map.

5. ModelManagerPanel/ModelFormatBadge.tsx: added `wan_t5_encoder` to
   FORMAT_NAME_MAP and FORMAT_COLOR_MAP.

`make frontend-lint` now passes cleanly (tsc, dpdm, eslint, prettier).
All 41 Wan Python tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(wan): drop unused FE exports flagged by knip

These were forward-compatibility wiring for Phase 9 (the FE graph
builder) that has no consumers yet; knip rightly flagged them. Removed
or de-exported. They'll come back when the graph builder lands and
needs them.

- common.ts: zWanVariantType drops `export` (still used internally by
  zAnyModelVariant).
- types.ts: drop isWanMainModelConfig, isWanDiffusersMainModelConfig,
  isWanVAEModelConfig (no callers). The remaining
  isWanT5EncoderModelConfig is used by models.ts. WanT5EncoderModelConfig
  type drops `export` (still used as the type guard's narrowing target).
- modelsByType.ts: drop the six unused useWan*/selectWan* hooks +
  selectors and their type-guard imports.

`make frontend-lint` (tsc + dpdm + eslint + prettier + knip) now green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(wan): use *-Diffusers HF repo names in plan

The Wan-AI org publishes two flavours of each release:
  * Wan-AI/Wan2.2-{TI2V-5B,T2V-A14B,I2V-A14B}            ← upstream native
  * Wan-AI/Wan2.2-{TI2V-5B,T2V-A14B,I2V-A14B}-Diffusers  ← convertible

The native release has _class_name=WanModel in config.json and ships
weights flat at the repo root with no transformer/, vae/, text_encoder/
subdirs. It is not loadable by Diffusers' WanPipeline.from_pretrained.

Update plan doc to reference the -Diffusers repos throughout (probe
notes, starter-model entries) so the plumbing path matches what the
Diffusers loader actually expects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): accept 0 as 'unset' sentinel for guidance_scale_low_noise

The frontend renders Optional[float] inputs with default 0 in the
numeric input rather than passing null/unset. Combined with ge=1.0,
this caused every wan_denoise invocation to fail Pydantic validation
with "Input should be greater than or equal to 1" until the user
manually entered a value (or knew to leave the field disconnected).

The validation error was rejected before invocation logging, so it
never showed up in the server log either - making the failure hard to
diagnose.

Relaxing the constraint to ge=0.0 and treating values below 1.0 as the
"fall back to primary Guidance Scale" sentinel. The user's natural FE
default (0) now works as expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): correct preview dimensions and colors for TI2V-5B

Two bugs in the Wan branch of the diffusion step callback:

1. Wrong dimensions. The reported preview size hardcoded `* 8` for the
   spatial downscale ratio, but TI2V-5B's Wan2.2-VAE uses 16x. A
   1024x1024 target was being announced to the FE as 512x512.

2. Wrong colors. The previous fallback for 48-channel TI2V-5B latents
   sliced the first 16 channels and applied the standard 16-channel
   Wan-VAE projection. Those channel layouts are unrelated, so the
   projection produced meaningless colors.

Adding the proper Wan2.2-VAE 48-channel RGB projection matrix (and
bias) from ComfyUI's Wan22 latent format, and selecting the right
matrix + spatial-scale by latent channel count: 16 → A14B (Wan VAE,
8x), 48 → TI2V-5B (Wan2.2-VAE, 16x).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): honor model's _class_name when building scheduler

TI2V-5B's scheduler_config.json declares _class_name=UniPCMultistepScheduler
with flow_shift=5.0. The previous code hardcoded
FlowMatchEulerDiscreteScheduler.from_pretrained(...), which silently
constructed a default-config FlowMatch instead of the UniPC the model
expects. The mismatched noise schedule manifests as soft / under-denoised
faces and global graininess in the final images.

Now: read scheduler_config.json, look up the named class on the diffusers
module, and instantiate that class via from_pretrained. UniPC and
FlowMatch share the same step()/set_timesteps()/sigmas/num_train_timesteps
interfaces, so the denoise loop works transparently for either.

A14B continues to use FlowMatchEulerDiscreteScheduler when its scheduler
config says so (its reference is FlowMatchEuler with shift=8.0). Falls
back to FlowMatchEulerDiscreteScheduler defaults when no on-disk config
is available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): match diffusers WanPipeline tokenizer length and latent dtype

Two divergences from the Diffusers reference that were hurting image
quality (soft / grainy / distorted faces at default settings):

1. Tokenizer max_sequence_length was 226 in wan_text_encoder, but the
   model was trained with 512-token sequences. The upstream native
   config.json has text_len: 512, and Diffusers' WanPipeline.__call__
   default is 512 (overriding _get_t5_prompt_embeds's stale 226 default).
   Wan's cross-attention sees padded zeros past the prompt's actual
   length but expects to be looking at a 512-position context window.

2. Latents were stored in bf16 throughout the denoise loop. Diffusers'
   WanPipeline.prepare_latents explicitly uses dtype=torch.float32 and
   only casts to the transformer's dtype right at the forward call:
       latent_model_input = latents.to(transformer_dtype)
   Storing in bf16 between steps accumulates ~40 steps of bf16
   quantization on the scheduler's small per-step deltas. Now
   latent_dtype = torch.float32 throughout, with a per-step cast for
   the transformer forward pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(wan): add diffusers reference comparison script

scripts/wan_diffusers_reference.py runs a Diffusers-format Wan 2.2
checkpoint directly via WanPipeline.from_pretrained, with the same
arguments InvokeAI's wan_denoise uses. Use to A/B against InvokeAI
output when image quality is questionable.

Defaults to enable_model_cpu_offload so the script fits on 16 GB cards
where the full pipeline (transformer + UMT5-XXL + VAE) would otherwise
OOM. --offload {model,sequential,none} controls the strategy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds single-file GGUF support for Wan 2.2 transformers, the path that
makes A14B usable on consumer GPUs (~7 GB/expert at Q4_K_M instead of
~28 GB at bf16).

Probe (configs/main.py):
- New helpers: _has_wan_keys (Wan vs Qwen/FLUX/Z-Image fingerprint via
  condition_embedder.text_embedder.linear_1 + patch_embedding);
  _detect_wan_gguf_variant (16ch -> A14B, 48ch -> TI2V-5B from
  patch_embedding.weight.shape[1]); _detect_wan_gguf_expert (filename
  heuristic for high_noise / low_noise / none).
- Main_GGUF_Wan_Config(base=Wan, format=GGUFQuantized, variant, expert).
  Tolerates the ComfyUI 'model.diffusion_model.' / 'diffusion_model.'
  prefixes via _has_wan_keys' multi-prefix scan.
- Registered in factory.py.

Loader (model_loaders/wan.py):
- WanGGUFCheckpointModel mirrors the QwenImage GGUF pattern:
  gguf_sd_loader -> strip ComfyUI prefix -> auto-detect arch from state
  dict shapes (num_layers, inner_dim, ffn_dim, text_dim, in_channels,
  num_heads = inner_dim/128) -> init_empty_weights +
  load_state_dict(strict=False, assign=True).

Loader invocation (wan_model_loader.py):
- New 'Transformer (Low Noise)' picker: optional second GGUF for the
  A14B dual-expert MoE. Auto-swaps if the user wired the experts in
  the wrong order. Warns when an A14B GGUF is loaded without a paired
  low-noise expert (single-expert run, degraded quality).
- GGUF mains require either a standalone VAE+encoder or a Diffusers
  Component Source (which can also supply boundary_ratio).
- Diffusers main path unchanged (still pulls both experts from
  transformer/ + transformer_2/).

Tests (tests/.../test_wan_gguf_config.py):
- 14 tests across key fingerprint, variant detection, expert filename
  heuristic, and the full probe (A14B high/low, TI2V-5B, GGUF rejection,
  unrecognised state-dict rejection, explicit override).

Total Wan tests: 55 passing (no regressions). FE lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): support QuantStack-style GGUFs and standalone Diffusers VAE

The city96 Wan 2.2 GGUF repos have been removed from Hugging Face,
leaving QuantStack as the surviving distributor. QuantStack ships the
native upstream Wan key layout (text_embedding.0/2, self_attn/cross_attn,
ffn.0/2, head.head, head.modulation, ...) rather than the diffusers
naming city96 used; biases are stored as F16 rather than BF16; and the
standalone Wan VAE installs as a flat AutoencoderKLWan folder which the
generic loader rejects. Three fixes:

1. Probe now recognises both diffusers and native key layouts via a new
   _is_native_wan_layout helper; _has_wan_keys accepts either text-proj
   fingerprint.

2. GGUF loader converts native -> diffusers keys (mirroring diffusers'
   convert_wan_transformer_to_diffusers) and unwraps non-quantized
   GGMLTensors to plain tensors at compute_dtype. The unwrap is needed
   because conv3d isn't in GGMLTensor's dispatch table, so the F16
   patch_embedding bias would otherwise hit conv3d against bf16 latents.

3. VAELoader gains a VAE_Diffusers_Wan_Config branch that loads
   AutoencoderKLWan directly; the generic path can't handle a flat
   single-class folder when a submodel_type is provided.

Adds 12 tests covering the native layout (probe + converter + unwrap).
Verified end-to-end against Wan2.2-T2V-A14B-Q4_K_M from QuantStack:
1095 tensors round-trip key-for-key against WanTransformer3DModel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probe + config (LoRA_LyCORIS_Wan_Config):
  - Detects Wan LoRAs in three layouts: diffusers PEFT, native upstream PEFT
    (ComfyUI), and Kohya (both naming variants).
  - Anti-pattern guards prevent collisions with Anima (Cosmos DiT q_proj
    convention), QwenImage (transformer_blocks), Flux (double/single blocks),
    and Z-Image (diffusion_model.layers).
  - Optional ``expert: "high" | "low" | None`` field; auto-detected from
    filename (high_noise / low_noise / hyphenated / concatenated variants).

Key conversion (wan_lora_conversion_utils):
  - Native upstream keys (self_attn/cross_attn, ffn.0/2) -> diffusers
    (attn1/attn2, ffn.net.0.proj / ffn.net.2).
  - Strips ``transformer.``, ``diffusion_model.``, ``base_model.model.transformer.``
    prefixes from PEFT-style keys.
  - Kohya layer names mapped through an explicit longest-match table.
  - Output paths use diffusers naming so the LayerPatcher can resolve them
    against WanTransformer3DModel parameter paths.

Loader integration:
  - Adds BaseModelType.Wan branch to LoRALoader._load_model.

Invocation nodes (wan_lora_loader.py):
  - WanLoRALoaderInvocation: single LoRA with auto/both/high/low target field.
  - WanLoRACollectionLoader: list of LoRAs, auto-routed by each LoRA's
    recorded expert tag.
  - Output WanLoRALoaderOutput carries the WanTransformerField with updated
    ``loras`` / ``loras_low_noise`` lists.

Denoise integration:
  - _ExpertSwapper now manages both the model_on_device context and the
    LayerPatcher.apply_smart_model_patches context per expert. LoRA patches
    are entered after device load and exited before device release, with
    fresh iterators per swap.
  - GGUF (quantized) experts request sidecar patching so GGMLTensor weights
    aren't touched directly.
  - Low-noise expert falls back to the primary loras list when
    ``loras_low_noise`` is empty (matches WanTransformerField semantics).

Tests: 81 new tests covering probe accept/reject across formats, anti-pattern
guards on competing architectures, converter round-trips for all three
layouts, invocation target resolution + routing + duplicate guards, and the
_ExpertSwapper lifecycle (lora context opens/closes in the right order
around the device swap, quantized flag forwards, no-LoRA path skips the
patch context, re-entering the same label is a no-op).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): probe Wan LoRA before Anima in the config union

Native-PEFT Wan LoRAs (lightx2v's Lightning, most ComfyUI-trained Wan
LoRAs) carry keys like ``diffusion_model.blocks.X.cross_attn.k.lora_A.weight``.
Anima's probe matches on the bare ``cross_attn``/``self_attn`` substring —
it does not require the Anima-specific ``_proj`` suffix nor any of the
``mlp``/``adaln_modulation`` Cosmos DiT markers — so these Wan LoRAs were
classified as ``BaseModelType.Anima`` because Anima happened to run first.

Reorder the LyCORIS section of ``AnyModelConfig`` so Wan probes first.
Wan's probe is strictly more restrictive (it rejects Anima's ``_proj``
attention suffix via the anti-pattern guard added in the previous commit),
so Anima LoRAs are still correctly classified after this reorder.

Existing users with mis-tagged installs need to delete the affected LoRA
records and reinstall.

Adds two regression tests: a union-ordering assertion, and a sanity check
that demonstrates Anima's probe *would* match Wan native keys if asked
directly — pinning the constraint that motivates the ordering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(i18n): add Wan2.2 T5 Encoder model-manager label

The frontend source already references ``modelManager.wanT5Encoder``;
the locale key was added with a casing typo (``want5Encoder``). Fix
the key so the Wan T5 Encoder model type renders its display name
correctly in the model manager UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-implementation after the first attempt — which used CLIP-vision
conditioning — was reverted. Wan 2.2 I2V-A14B does NOT use a CLIP-vision
encoder (the Diffusers repo ships ``image_encoder: [null, null]`` in
``model_index.json``); instead it conditions on a reference image by
VAE-encoding it and concatenating the resulting latents (plus a
first-frame mask) to the noise latents along the channel dim. The I2V
transformer therefore has ``in_channels=36`` (16 noise + 16 ref-image
latents + 4 mask) vs ``in_channels=16`` for T2V.

Taxonomy:
  - Re-adds ``WanVariantType.I2V_A14B``.

Probes:
  - Diffusers: ``_detect_wan_variant`` reads ``transformer/config.json::in_channels``;
    36 → I2V_A14B, 16 → T2V_A14B (both share the dual-expert layout).
  - GGUF: ``_detect_wan_gguf_variant`` recognises ``in_channels=36`` from the
    patch_embedding tensor shape and emits I2V_A14B.

Backend extension (``backend/wan/extensions/wan_ref_image_extension.py``):
  - ``preprocess_reference_image`` resizes + normalises to a 5D pixel tensor.
  - ``encode_reference_image_to_condition`` VAE-encodes the image and stacks
    a 4-channel first-frame mask on top, producing the
    ``[1, 20, 1, H/8, W/8]`` condition tensor the denoise loop consumes.
  - Mirrors diffusers ``WanImageToVideoPipeline.prepare_latents`` with
    ``num_frames=1`` and ``expand_timesteps=False``.

Invocation node (``wan_ref_image_encoder.py``):
  - "Reference Image - Wan 2.2": image + VAE + width/height pickers.
  - Output ``WanRefImageConditioningField`` carries the condition tensor
    name plus the dimensions used (so the denoise step can validate dim
    parity).

Denoise integration:
  - ``WanDenoiseInvocation`` gains an optional ``ref_image`` field.
  - Variant gate: rejects ref_image on T2V_A14B and TI2V-5B with a clear
    error before doing any work.
  - Dimension gate: rejects ref-image width/height mismatch vs denoise.
  - At every transformer call, concatenates the 20-channel condition
    tensor to the 16-channel noise latents along the channel dim before
    passing to the transformer (giving the 36-channel input I2V expects).

Tests: 14 new across the probe, the extension, and the denoise loop.
The synthetic ``_ZeroTransformer`` test stand-in now mirrors the real
I2V transformer's ``in_channels=36, out_channels=16`` asymmetry by
slicing its zero output back to 16 channels when the input is 36-wide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): derive GGUF out_channels from proj_out shape (I2V support)

The GGUF loader was setting ``out_channels = in_channels`` which is wrong for
Wan 2.2 I2V-A14B: that variant has ``in_channels=36`` (16 noise + 16 ref-image
latents + 4 first-frame mask, concatenated by the denoise loop) but
``out_channels=16`` since the transformer only predicts the noise component
back. Loading an I2V GGUF would build a transformer with the wrong proj_out
shape and crash:

  RuntimeError: Error(s) in loading state_dict for WanTransformer3DModel:
    size mismatch for proj_out.weight: copying a param with shape
    torch.Size([64, 5120]) from checkpoint, the shape in current model is
    torch.Size([144, 5120]).

(144 = 36 * 4, 64 = 16 * 4 — patch_size=(1, 2, 2) → prod=4)

Read out_channels directly from the ``proj_out.weight`` shape in the state
dict. This is correct for all three Wan 2.2 variants without needing to know
the variant in advance.

Also tighten the num_layers fallback: T2V_A14B and I2V_A14B share 40 layers;
only TI2V-5B has 30. The fallback is rarely hit in practice (the per-block
count comes from the state dict scan), but the previous code would have
defaulted I2V_A14B to 30 layers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(model): make Anima LoRA probe mutually exclusive with Wan

InvokeAI's ``Config_Base.CONFIG_CLASSES`` is a Python ``set``, so iteration
order during model probing is non-deterministic across process restarts.
First-match-wins ordering in ``AnyModelConfig`` is documentation only — it
has no effect on which config is iterated first.

Anima's previous probe accepted any state dict containing the substring
``cross_attn`` or ``self_attn``, which collides with Wan's native LoRA key
layout (``diffusion_model.blocks.X.cross_attn.q.lora_down.weight``). Both
probes accepted Wan native LoRAs (including lightx2v's Lightning T2V and I2V
distillations), and the ``matches.sort_key`` tiebreaker only disambiguates
by ModelType, not within LoRA configs. So which config "won" depended on
dict hash order — sometimes Wan, sometimes Anima.

The previous mitigation reordered the AnyModelConfig union to put Wan
before Anima. That worked by luck and was inherently fragile.

Tighten Anima's probe to require Cosmos-DiT-exclusive subcomponents:
``mlp``, ``adaln_modulation``, or ``_proj``-suffixed attention names
(``q_proj``/``k_proj``/``v_proj``/``output_proj``) — none of which appear
in any Wan LoRA. Wan native uses bare ``.q``/``.k``/``.v``/``.o`` on
``self_attn``/``cross_attn``, and ``ffn.N``/``ffn.net.N`` instead of ``mlp``.

The new strict detectors live alongside the original loose ones so the
Anima conversion utility (which runs after probing) still works.

Regression tests in ``test_wan_lora_probe_independence.py`` cover:
- I2V Lightning V1 (the bug-triggering LoRA), T2V Lightning V2, Wan Kohya
  and Wan diffusers PEFT layouts — Wan probe accepts, Anima probe rejects.
- Anima PEFT and Kohya layouts — Anima accepts, Wan rejects.
- A meta-test that runs every LoRA config in CONFIG_CLASSES against the
  Lightning state dicts and asserts exactly one accepts — this catches
  ANY future probe collision, not just Wan vs Anima.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): defer expert model loading in _ExpertSwapper to avoid cache thrash

The swapper used to take pre-loaded ``LoadedModel`` handles at construction:

    high_info = context.models.load(self.transformer.transformer)
    low_info  = context.models.load(self.transformer.transformer_low_noise)
    swapper = _ExpertSwapper(high_info=high_info, low_info=low_info, ...)

With dual ~9 GB A14B GGUF experts plus the ~10 GB UMT5-XXL encoder competing
for the same RAM cache, the LRU policy frequently dropped one expert by the
time the denoise loop swapped into it. The model manager then emitted

    [MODEL CACHE] Locking model cache entry ... but it has already been
    dropped from the RAM cache. This is a sign that the model loading
    order is non-optimal in the invocation code (See ... invoke-ai#7513).

and reloaded the weights from disk (~1.2s extra per swap).

Refactor the swapper to take the ``ModelIdentifierField`` plus the
``InvocationContext`` and call ``context.models.load(model_id)`` lazily
inside ``get()``. Each swap obtains a fresh handle, the LRU window is
small, and the warning goes away.

Config metadata (used to compute ``is_quantized``) is read upfront via
``context.models.get_config()`` — that's metadata, not weights, so it
doesn't put pressure on the cache.

Tests: existing swapper lifecycle tests refactored to use a fake context
whose ``models.load`` is logged. A new ``test_lazy_load_per_swap_not_upfront``
pins the regression — it asserts ``models.load`` is NOT called at swapper
construction, only at first get() per expert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The denoise_mask wiring + RectifiedFlowInpaintExtension integration in
wan_denoise.py was put in place during Phase 2/3 alongside the rest of
the denoise loop. Phase 8 of the plan was about ensuring this path
worked and is locked in by tests.

Three new tests under TestWanDenoiseInpaint:

1. test_preserved_region_matches_init_exactly: builds a half/half mask
   (left = preserve, right = regenerate in user-side convention), runs
   full denoise with the synthetic zero-output transformer, and asserts
   the preserved half of the final latents equals the init exactly while
   the regenerated half does not. Pins the mask-inversion + per-step
   merge behavior.

2. test_inpaint_requires_init_latents: a mask without init latents must
   raise a clear ValueError — the merge has nothing to weld back to.

3. test_no_mask_path_is_unchanged: regression that adding the inpaint
   extension didn't perturb the non-inpaint codepath (with init latents
   + denoising_start=0.5 but no mask, the loop just runs img2img).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(frontend): add I2V_A14B to Wan variant zod enum + manager label

Phase 7 added the I2V_A14B backend variant. The frontend's zod enum
(features/nodes/types/common.ts:zWanVariantType) and the model manager's
variant-label map (features/modelManagerV2/models.ts) were still on the
two-variant list, so:

  - ModelIdentifierField inputs with ui_model_variant filters on Wan
    couldn't list I2V models.
  - The model manager UI showed a raw 'i2v_a14b' string instead of the
    human label.

Phase 9 (full linear-view wiring — type guards, hooks, params slice,
graph builder, tab UI) is in progress on a follow-up commit; this lands
the two small enum fixes first so the I2V probe / install paths work
correctly end-to-end with the existing FE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the minimum frontend wiring needed to generate Wan 2.2 images from
the linear view:

  - buildWanGraph.ts (new): text-to-image graph (model_loader →
    text_encoder × 2 → denoise → l2i). Diffusers main model only —
    transformer, VAE and UMT5 encoder all resolve from the same repo, so
    no Wan-specific params slice fields are required yet. CFG-skip
    branch when guidance_scale ≤ 1.0.
  - useEnqueueGenerate / useEnqueueCanvas dispatchers: route
    base === 'wan' to buildWanGraph.
  - graph/types.ts: add wan_l2i / wan_i2l / wan_denoise / wan_model_loader
    to the relevant node-type unions.
  - addTextToImage / addImageToImage: include wan_denoise / wan_l2i so
    width/height are wired correctly and the txt2img helper accepts the
    Wan l2i node.
  - isMainModelWithoutUnet: include wan_model_loader (Wan has no UNet,
    same as the other modern bases).
  - metadata.py: add wan_txt2img / wan_img2img / wan_inpaint to the
    generation_mode enum (img2img / inpaint pieces land next).
  - schema.ts: regenerated to pick up the metadata enum + new
    Wan invocations.

Pieces left in Phase 9: params slice (standalone VAE / T5 / GGUF
low-noise / LoRA / ref-image fields + selectors), img2img + I2V + inpaint
branches in the graph builder, and Wan-specific UI components.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(wan): Phase 9 piece #2 - GGUF support and CFG-Low control in linear view

Adds the three Wan-specific params + UI controls that gate GGUF workflows
plus a separate low-noise CFG slider for A14B users.

Params slice:
  - wanTransformerLowNoise (the second-expert GGUF for A14B)
  - wanComponentSource (Diffusers Wan model providing VAE + UMT5-XXL
    when the main is a GGUF)
  - wanGuidanceScaleLowNoise (optional separate CFG for the low-noise
    expert; null = fall back to the primary CFG)

Plus a `selectIsWan` selector for accordion gating.

UI components:
  - ParamWanModelSelects.tsx (Advanced accordion): two model pickers —
    Transformer (Low Noise) filtered to Wan GGUF mains, and VAE/Encoder
    Source filtered to Wan Diffusers mains. Mirrors the
    ParamQwenImageComponentSourceSelect structure.
  - ParamWanGuidanceScaleLowNoise.tsx (Generation accordion): slider +
    number input with an "auto" indicator when cleared. Default 3.5
    matches the diffusers reference 4.0 / 3.0 split.

Wiring:
  - Generation accordion: ParamWanGuidanceScaleLowNoise shown when base
    is wan, scheduler excluded for wan (same pattern as Anima/Qwen).
  - Advanced accordion: ParamWanModelSelects shown when base is wan, and
    Wan excluded from the SD-family VAE/CFG-rescale blocks.
  - buildWanGraph.ts: forwards the three new params to the model loader
    and denoise nodes (transformer_low_noise_model, component_source,
    guidance_scale_low_noise) and adds them to the graph metadata.

Hooks/types:
  - useWanDiffusersModels + useWanGGUFModels in modelsByType.ts.
  - isWanDiffusersMainModelConfig + isWanGGUFMainModelConfig type guards.
  - Three new locale strings (wanComponentSource, wanTransformerLowNoise,
    wanGuidanceScaleLowNoise[Auto]).

GGUF workflow now works end-to-end in the linear view: pick a Wan GGUF
main, set Transformer (Low Noise) to the paired second-expert GGUF, set
VAE/Encoder Source to any Diffusers Wan repo (TI2V-5B is convenient at
~12 GB) — generate produces an image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): UX polish on the Wan linear-view controls

Bundles four small fixes applied during a usability review of the Wan
linear-view section (piece #2):

1. **Filter Main vs Transformer (Low Noise) dropdowns by expert tag.**
   The Wan GGUF probe records each file's ``expert`` field
   (``"high"`` / ``"low"`` / ``"none"``) via filename heuristic.
   - ``MainModelPicker``: hides ``expert === 'low'`` Wan GGUFs so users
     can't accidentally wire a low-noise expert as the primary main.
   - Transformer (Low Noise) picker (``useWanGGUFLowNoiseModels``):
     shows ``expert === 'low'`` Wan GGUFs only.

   Diffusers Wan mains and TI2V-5B aren't affected — they don't carry
   the ``expert`` field on their config schema. The backend's auto-swap
   safety net stays in place.

2. **Match the primary CFG slider's range.** The Wan low-noise CFG
   slider was constrained to 1–10 while the primary CFG ranges 1–20.
   With the diffusers reference 4/3 split, the low-noise slider thumb
   sat noticeably further right than the primary — visually misleading.
   Both sliders now share the 1–20 range with marks at [1, 10, 20].

3. **Label fits the form column.** "CFG (Low Noise)" → "CFG (Low)" so
   the slider fits cleanly next to its label instead of overlapping.

4. **Indicator state for the low-noise CFG slider.** Replaced the inline
   "(auto)" / "(same as cfg)" text — which kept overlapping the slider
   regardless of how short the label got — with an X-only reset button
   that's only visible when the user has set an explicit value. Absence
   of the X conveys auto/fallback state without any text overhang.

5. **Friendlier Transformer (Low Noise) placeholder.** "Second-expert
   GGUF for A14B (pair with the high-noise main)" → "Add for full
   detail" — concise nudge for users who haven't paired the second
   expert yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(wan): Phase 9 piece #3 - linear-view img2img branch

Adds Wan 2.2 image-to-image to the linear view, mirroring the Qwen Image
pattern. The mode switches on the canvas state — pure-prompt runs go
through addTextToImage as before; canvas runs with an init image go
through addImageToImage which wires a fresh wan_i2l (Image to Latents -
Wan 2.2) node between the init image and the denoise's `latents` input,
honoring the existing denoise_start slider.

buildWanGraph:
  - Drops the txt2img-only guard, branches on generationMode.
  - img2img: spins up a wan_i2l node and hands it to addImageToImage
    alongside the existing denoise / l2i / modelLoader (as vaeSource).
  - inpaint / outpaint still fail loudly — pieces #4-#6.

graphBuilderUtils.getDenoisingStartAndEnd:
  - Adds 'wan' to the simple-linear case (denoising_start = 1 -
    denoisingStrength). Note: Wan's flow-matching schedule is "sticky"
    on the init compared to SDXL — users will likely need denoisingStrength
    ≥ 0.7 to see substantial change, matching the user-found 0.15-0.3
    denoising_start sweet spot from earlier img2img testing. We may
    revisit this with an exponent rescale (like FLUX uses) if the
    response curve feels off.

addImageToImage:
  - Adds 'wan_i2l' to the i2l-node-type union so the Wan i2l can be
    threaded through the shared helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): add wan_denoise to addImageToImage/addInpaint/addOutpaint type checks

Three sibling graph-helper utilities had the same modern-base list as
addTextToImage did, and the buildWanGraph img2img branch tripped one of
them at canvas-Generate time:

    error  [generation]: Failed to build graph
    {name: 'Error', message: 'Wrong assertion encountered'}

The else-branch in each helper assumes 'denoise_latents' (the SD1.5/SDXL
legacy path) and asserts that — failing for any modern base not listed
above the branch. addTextToImage was already updated in Phase 9 piece #1;
this catches the parallel cases that the img2img/inpaint/outpaint flows
go through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(wan): Phase 9 piece #4 - linear-view inpaint and outpaint branches

Wires Wan 2.2 inpaint and outpaint through the existing addInpaint /
addOutpaint helpers. The backend's RectifiedFlowInpaintExtension was
plumbed into wan_denoise.py back in Phase 8 (commit ab54617); this
just connects the FE.

buildWanGraph:
  - generationMode === 'inpaint' → spin up a wan_i2l, call addInpaint
    with denoise + l2i + modelLoader (used as both vaeSource and
    modelLoader since the Wan model loader carries the VAE).
  - generationMode === 'outpaint' → parallel branch with addOutpaint.

addInpaint:
  - i2l-node-type union now includes 'wan_i2l' (the addImageToImage and
    addOutpaint type unions already do — different union shapes).

metadata.py:
  - generation_mode literal adds "wan_outpaint" alongside the existing
    wan_txt2img / wan_img2img / wan_inpaint entries.

isMainModelWithoutUnet already includes wan_model_loader (Phase 9 piece
create_gradient_mask when Wan is the main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(wan): Phase 9 piece #5 - linear-view I2V branch (raster as reference image)

Wan 2.2 I2V-A14B models condition on a reference image whose VAE-encoded
latents are concatenated to the noise along the channel dim each step
(in_channels=36 on the I2V transformer). In the linear view this maps
cleanly onto the existing canvas raster layer: pick an I2V model, drag
an image to raster, generate.

buildWanGraph:
  - Fetch the modelConfig early so the variant gate (i2v_a14b vs the
    rest) can drive the branch shape instead of being a post-hoc check.
  - I2V + txt2img: fail loudly ("Switch to the canvas tab and drag an
    image to the raster layer"). I2V models won't produce useful output
    without a reference, and the backend would crash trying to
    concatenate a missing condition tensor.
  - I2V + img2img: pull the raster image via the canvas compositor,
    wire it through a wan_ref_image_encoder (which VAE-encodes it and
    builds the 4-mask + 16-latent condition tensor backend-side), then
    feed the result into denoise.ref_image. Denoise runs from fresh
    noise (denoising_start=0, no init_latents) — the ref image is
    cross-attention/concat conditioning, not a noise-trajectory anchor.
  - I2V + inpaint/outpaint: fail clearly. Combining ref-image
    conditioning with a denoise mask is conceptually possible but the
    backend interaction hasn't been validated end-to-end.

metadata.py:
  - Adds "wan_i2v" to the generation_mode literal so the metadata field
    on I2V renders correctly.

T2V flows (txt2img / img2img / inpaint / outpaint) are unchanged for
non-I2V Wan variants (T2V-A14B and TI2V-5B).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): enforce multiple-of-16 dimensions to match transformer patch grid

Wan 2.2's transformer has ``patch_size=(1, 2, 2)``: it patch-embeds with
stride 2 then un-patches by 2. Combined with the VAE's 8x spatial scale,
canvas H/W must be a multiple of ``8 * 2 = 16`` — not just 8 — for the
patch round-trip to land exactly. Otherwise the latents and noise
prediction disagree by one in the spatial dim and the scheduler step
fails:

    RuntimeError: The size of tensor a (147) must match the size of
    tensor b (146) at non-singleton dimension 3

(here latent_w=147 → patch_w=73 → un-patched_w=146 ≠ 147)

This was silent for T2V at 1024x1024 (already a multiple of 16) but
fired for I2V at non-multiple-of-16 canvas sizes.

Fixes:

- ``optimalDimension.getGridSize``: Wan moves from the default 8 case to
  the multiple-of-16 case (alongside flux / sd-3 / qwen-image / z-image
  which have the same patch arithmetic). The canvas bbox UI now snaps
  Wan dimensions to multiples of 16.

- ``wan_denoise.py`` and ``wan_ref_image_encoder.py``: bump width/height
  ``multiple_of`` from 8 to 16. Defense-in-depth — workflow-editor
  users won't be able to send a non-16-aligned dim either.

Existing backend tests (23 passing) still hold — 1024 is divisible by 16
so the test fixtures didn't exercise the off-by-one path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): show negative prompt box in Wan linear-view

Wan was missing from SUPPORTS_NEGATIVE_PROMPT_BASE_MODELS, so the
linear-view negative-prompt input was hidden even though the Wan denoise
node already wires negative conditioning when CFG > 1
(buildWanGraph.ts:67-75). Adds 'wan' to the list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(wan): Phase 9 piece #6 - Wan LoRA collection in linear view

Adds Wan LoRA wiring to buildWanGraph, mirroring the Qwen Image pattern.
The shared LoRASelect / LoRAList UI in the linear view already filters
LoRAs by the selected main model's base, so Wan LoRAs surface
automatically when a Wan main is picked — no UI changes needed.

addWanLoRAs (new):
  - Filters state.loras.loras to enabled Wan LoRAs.
  - For each LoRA: spawns a ``lora_selector`` node and threads it
    through a single ``collect`` collector.
  - Routes the collector into a ``wan_lora_collection_loader`` which
    sits between modelLoader and denoise — modelLoader.transformer →
    loader, then loader.transformer → denoise (rerouting the original
    modelLoader → denoise edge).
  - Emits per-LoRA metadata so PNG metadata + workflow restore work.

The dual-expert routing (high-noise vs low-noise vs untagged) is
handled entirely on the backend by ``WanLoRACollectionLoader`` based on
each LoRA's recorded ``expert`` tag (set by the probe from the filename
heuristic in piece #5 of Phase 5). The FE just hands over the bag of
LoRAs; no per-list FE plumbing needed.

buildWanGraph:
  - Calls addWanLoRAs(state, g, denoise, modelLoader) after the base
    transformer edge is in place. The helper is a no-op when no Wan
    LoRAs are enabled, so it's safe to call unconditionally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(wan): detect LoRA variant and filter by main model

Wan 2.2 A14B (inner_dim=5120) and TI2V-5B (inner_dim=3072) LoRAs are not
interchangeable — applying one against the wrong main model crashes the
layer patcher with a tensor-shape error (e.g. A14B Lightning on TI2V-5B
mains produced ``shape '[3072, 3072]' is invalid for input of size 26214400``).

Probe Wan LoRAs' inner-dim at install time and record the family on a new
``variant`` field (``a14b`` / ``5b`` / null). The LoRA picker in the linear
view hides incompatible variants when the user selects a main, and the
graph builder filters any still-enabled mismatches at submit time with a
warning. Untagged LoRAs (probe couldn't identify) pass through so they
aren't silently hidden.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(wan): ref-image panel, GGUF readiness, and auto-default sources

Wan 2.2 I2V now uses the global Reference Images panel (same UX as Qwen
Image Edit and FLUX.2 Klein) instead of pulling the conditioning image
from a canvas raster layer. Adds:

  - WanReferenceImageConfig zod type + isWanReferenceImageConfig guard;
    integrated into the ref-image discriminated union, settings panel,
    layer hooks, and validators.
  - 'wan' added to SUPPORTS_REF_IMAGES_BASE_MODELS, but the panel only
    shows for the i2v_a14b variant (T2V and TI2V-5B don't consume ref
    images, so the panel is hidden for them).
  - buildWanGraph I2V branch reads the first enabled wan_reference_image
    from refImagesSlice; the canvas-raster-as-ref path is removed. I2V
    now only supports txt2img mode (canvas img2img/inpaint/outpaint
    assert with a clear message).

GGUF Wan readiness check: GGUF mains carry only the transformer, so the
loader needs a Diffusers Component Source (or standalone VAE + UMT5-XXL
encoder) to resolve the VAE and text encoder. Without one, enqueue is
now blocked with a clear reason. The low-noise A14B partner expert
remains optional (loader falls back to the high-noise expert when it's
missing).

Adds standalone Wan VAE and Wan T5 Encoder selectors to the Advanced
accordion (Qwen pattern). Wires them as vae_model / wan_t5_encoder_model
on the wan_model_loader node — backend priority is standalone > diffusers
main > component source.

Auto-default on Wan selection (so GGUF users don't have to fiddle with
Advanced): when the new main is a Wan GGUF, fill the Component Source,
standalone VAE, and standalone T5 encoder with first available matches
if not already set. Component Source is matched by variant family
(A14B GGUF prefers an A14B Diffusers; TI2V-5B prefers a TI2V-5B
Diffusers) since the two families use different VAE channel counts
(16 vs 48); within A14B, T2V and I2V share VAE/encoder so they're
interchangeable as a source. Runs on every Wan selection (including
Diffusers -> GGUF switches), only fills empty slots.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wan 2.2 starter pack (selected when the user picks the Wan 2.2 bundle)
brings up the minimal-cost path to running A14B T2V end-to-end:

  - Standalone UMT5-XXL encoder and A14B VAE (so GGUF mains don't need
    a full Diffusers download for their VAE/encoder sources).
  - T2V A14B Q4_K_M and Q8_0 GGUF expert pairs (high + low noise).
  - T2V Lightning V1.1 Seko rank-64 LoRA pair (4-step inference).

Additional Wan 2.2 starter models browseable from the model manager:

  - Full Diffusers T2V A14B, I2V A14B, and TI2V-5B.
  - I2V A14B Q4_K_M and Q8_0 GGUF expert pairs + Lightning V1 LoRA pair.
  - TI2V-5B Q4_K_M and Q8_0 GGUFs + the 48-channel TI2V-5B VAE.

Each "high noise" GGUF lists its low-noise partner plus the shared VAE
and UMT5-XXL encoder as dependencies, so installing one of them pulls
in everything the loader needs. QuantStack's HighNoise/LowNoise file
naming and lightx2v's high_noise_model/low_noise_model.safetensors are
both picked up by the existing filename heuristic in the GGUF probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(wan): add Wan 2.2 hardware requirements

Adds Wan 2.2 A14B (T2V/I2V) and TI2V-5B rows to the hardware
requirements table with rough VRAM/RAM guidance per quantization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…one VAE/T5

Wan-specific metadata fields embedded by the graph builder
(wan_transformer_low_noise, wan_component_source, wan_vae_model,
wan_t5_encoder_model, wan_guidance_scale_low_noise) had no recall
handlers in features/metadata/parsing.tsx, so recalling an image's
parameters would leave these fields empty. Adds a handler for each
that dispatches the matching paramsSlice action and renders a row in
the metadata viewer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files services PRs that change app services frontend PRs that change frontend files python-tests PRs that change python tests docs PRs that change docs labels May 12, 2026
@lstein lstein added the 6.14.x label May 12, 2026
@lstein lstein moved this to 6.14.x Theme: LIBRARY UPDATES in Invoke - Community Roadmap May 12, 2026
lstein and others added 3 commits May 11, 2026 22:28
Ships two default workflows in the library, tagged so they appear in
"Browse Workflows" under the wan2.2 / text to image / image to image
tags:

  - Text to Image - Wan 2.2: full T2V/TI2V-5B graph (model loader,
    positive + negative encoders, denoise, l2i). Exposes the five
    model slots, prompts, steps, dual CFG, and dimensions.
  - Image to Image - Wan 2.2: I2V A14B graph that adds a
    wan_ref_image_encoder. Exposes the reference image input plus
    the standard fields.

Both follow default-workflow rules: IDs prefixed with default_,
meta.category = "default", and no references to user-installed
resources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lstein lstein mentioned this pull request May 13, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.14.x api backend PRs that change backend files docs PRs that change docs frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests Root services PRs that change app services

Projects

Status: 6.14.x Theme: USER EXPERIENCE

Development

Successfully merging this pull request may close these issues.

2 participants