Skip to content

Feat: taeltx2_3_wide support#1535

Merged
leejet merged 1 commit into
leejet:masterfrom
stduhpf:taeltx
May 21, 2026
Merged

Feat: taeltx2_3_wide support#1535
leejet merged 1 commit into
leejet:masterfrom
stduhpf:taeltx

Conversation

@stduhpf
Copy link
Copy Markdown
Contributor

@stduhpf stduhpf commented May 20, 2026

Summary

The standard TAEHV-based taeltx2_3 decoder model suffers from blurry outputs. Taeltx2_3_wide uses an improved architecture to increase sharpness.

Download: https://github.com/madebyollin/taehv/blob/2026_03_11_taeltx23_wide/safetensors/taeltx2_3_wide.safetensors

Related Issue / Discussion

https://github.com/madebyollin/taehv:

  • The standard TAE decoder for LTX2.3 has blurry outputs (see thread) so I also trained a larger, less-blurry taeltx2_3_wide variant. Those taeltx2_3_wide weights are here and ComfyUI-bleh has taeltx2_3_wide-compatible previewing code here.

madebyollin/taehv@32ac014

Additional Information

Checklist

@leejet leejet merged commit 47d8198 into leejet:master May 21, 2026
15 checks passed
fszontagh added a commit to fszontagh/stable-diffusion.cpp that referenced this pull request May 22, 2026
25 new upstream commits since the previous sync. Highlights:

  3a8788c refactor: unify extra argument parsing (leejet#1540)
  449165c feat: stream LTX VAE temporal tile decoding (leejet#1539)
  adaa599 Feat: Temporal tile custom size with overlap (leejet#1510)
  2e35146 perf: run LTX audio VAE decode in one ggml graph (leejet#1538)
  47d8198 feat: add taeltx2_3_wide support (leejet#1535)
  ef92a00 feat: add graph cut markers for LTXAV transformer (leejet#1534)
  b3374e6 feat: add LTX spatial latent upscale hires support (leejet#1533)
  bdd937f feat: add taeltx2/taeltx2.3 support (leejet#1531)
  c51ec7c fix: always load runtime lora params on runtime backend (leejet#1532)
  e7eb92f feat: add Gradient Estimation sampler (leejet#1484)
  50134e5 refactor: split guidance composition (leejet#1506)
  e43b24c feat: add ltx2.3 flf2v support (leejet#1505)
  b706d68 fix: restore singleton dims for LLM outputs (leejet#1518)
  b758b7d fix: only enable TAE after successful load (leejet#1517)
  f683c88 feat: make negative max_vram control the amount of spare vram (leejet#1503)
  baf7eda refactor: minify vocab files (leejet#1509)
  22c8c40 sync: update ggml (leejet#1520)
  plus 8 CI / docs / docker fixes.

Conflict resolution:

src/stable-diffusion.cpp had a single conflict in the video-generation
post-sampling block. Our HEAD had the smart-offload-for-VAE-decode
hook (move diffusion model to CPU when free_params_immediately is
false and VRAM is tight). Upstream added the LTX spatial latent
upscale hires path that runs a second sampler invocation. Both pieces
are needed and they're complementary: smart offload is video-agnostic
and runs only on the non-upscale code path; the upscale block manages
its own params lifecycle through its own sampler+free invocation.

Resolution: upstream's `if (latent_upscale_enabled)` block kept as-is,
and our smart-offload + free_params_immediately handling moved into
the matching `else` branch. No semantic change to either feature.

All other touched files (include/stable-diffusion.h, src/llm.hpp,
src/ggml_extend.hpp, src/diffusion_model.hpp, examples/common/...)
auto-merged cleanly. Our additions (friend declaration in ggml_extend
for the streaming executor, forward_layer_block / forward_final_norm
helpers on LLM::TextModel, offload_config field on sd_ctx_params_t)
all interoperate with the upstream changes — Build is clean.

Smoke test: Z-Image-Turbo Q8 generates a valid cat image at 512x512
after the merge. Host CUDA driver currently shows NVML version
mismatch (220s wallclock); requires driver reload to re-validate
expected timings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants