Fix: always load runtime lora params on runtime backend by stduhpf · Pull Request #1532 · leejet/stable-diffusion.cpp

stduhpf · 2026-05-20T13:52:30Z

Summary

When running LoRAs at runtime, the LoRA params have to be loaded from param backend to runtime backend for every linear or conv operation, at every step. This performance overhead can cause very significant slowdown if the params are offloaded to a different device.

This PR ensures the lora weights are never offloaded, fixing the performance issue for a minimal cost of constant VRAM usage.

Related Issue / Discussion

#1463 (hidden comment)

Additional Information

In some extreme cases, this change helped me go from ~190s/it to 7.3s/it for LTX2.3 with distill LoRA.

Checklist

I have read and confirmed this PR follows the contribution guidelines.

25 new upstream commits since the previous sync. Highlights: 3a8788c refactor: unify extra argument parsing (leejet#1540) 449165c feat: stream LTX VAE temporal tile decoding (leejet#1539) adaa599 Feat: Temporal tile custom size with overlap (leejet#1510) 2e35146 perf: run LTX audio VAE decode in one ggml graph (leejet#1538) 47d8198 feat: add taeltx2_3_wide support (leejet#1535) ef92a00 feat: add graph cut markers for LTXAV transformer (leejet#1534) b3374e6 feat: add LTX spatial latent upscale hires support (leejet#1533) bdd937f feat: add taeltx2/taeltx2.3 support (leejet#1531) c51ec7c fix: always load runtime lora params on runtime backend (leejet#1532) e7eb92f feat: add Gradient Estimation sampler (leejet#1484) 50134e5 refactor: split guidance composition (leejet#1506) e43b24c feat: add ltx2.3 flf2v support (leejet#1505) b706d68 fix: restore singleton dims for LLM outputs (leejet#1518) b758b7d fix: only enable TAE after successful load (leejet#1517) f683c88 feat: make negative max_vram control the amount of spare vram (leejet#1503) baf7eda refactor: minify vocab files (leejet#1509) 22c8c40 sync: update ggml (leejet#1520) plus 8 CI / docs / docker fixes. Conflict resolution: src/stable-diffusion.cpp had a single conflict in the video-generation post-sampling block. Our HEAD had the smart-offload-for-VAE-decode hook (move diffusion model to CPU when free_params_immediately is false and VRAM is tight). Upstream added the LTX spatial latent upscale hires path that runs a second sampler invocation. Both pieces are needed and they're complementary: smart offload is video-agnostic and runs only on the non-upscale code path; the upscale block manages its own params lifecycle through its own sampler+free invocation. Resolution: upstream's `if (latent_upscale_enabled)` block kept as-is, and our smart-offload + free_params_immediately handling moved into the matching `else` branch. No semantic change to either feature. All other touched files (include/stable-diffusion.h, src/llm.hpp, src/ggml_extend.hpp, src/diffusion_model.hpp, examples/common/...) auto-merged cleanly. Our additions (friend declaration in ggml_extend for the streaming executor, forward_layer_block / forward_final_norm helpers on LLM::TextModel, offload_config field on sd_ctx_params_t) all interoperate with the upstream changes — Build is clean. Smoke test: Z-Image-Turbo Q8 generates a valid cat image at 512x512 after the merge. Host CUDA driver currently shows NVML version mismatch (220s wallclock); requires driver reload to re-validate expected timings.

always load runtimle lora params on runtime backend

e9998fb

stduhpf changed the title ~~Fix: always load runtimle lora params on runtime backend~~ Fix: always load runtime lora params on runtime backend May 20, 2026

leejet merged commit c51ec7c into leejet:master May 20, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: always load runtime lora params on runtime backend#1532

Fix: always load runtime lora params on runtime backend#1532
leejet merged 1 commit into
leejet:masterfrom
stduhpf:lora-backend

stduhpf commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stduhpf commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue / Discussion

Additional Information

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stduhpf commented May 20, 2026 •

edited

Loading