feat: add max-vram based segmented param offload by leejet · Pull Request #1476 · leejet/stable-diffusion.cpp

leejet · 2026-05-06T13:36:38Z

Added support for --max-vram to limit runtime GPU memory usage.
When a VRAM budget is set, the runner executes in segments based on tiled results, temporarily offloading only the parameters needed for the current sub-graph into VRAM instead of loading all parameters into GPU memory at once.
This allows larger models to run on devices with limited VRAM and keeps peak VRAM usage close to the specified budget; if not set or set to 0, the original full-graph execution behavior is preserved.

The destructor previously released runtime_params_buffer but missed partial_runtime_params_buffer (the buffer used by the segmented param offload path added in leejet#1476). On runner destruction with --max-vram active, that GPU memory leaked. Same class of leak as the existing runtime_params_buffer fix.

feat: add max-vram based segmented param offload

e4a01b5

leejet merged commit 90e87bc into master May 6, 2026
12 checks passed

fszontagh mentioned this pull request May 6, 2026

feat: cross-stage offload modes and layer-streaming for low-VRAM GPUs #1477

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add max-vram based segmented param offload#1476

feat: add max-vram based segmented param offload#1476
leejet merged 1 commit intomasterfrom
max-vram-segmented-param-offload

leejet commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leejet commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant