Skip to content

feat: add max-vram based segmented param offload#1476

Merged
leejet merged 1 commit intomasterfrom
max-vram-segmented-param-offload
May 6, 2026
Merged

feat: add max-vram based segmented param offload#1476
leejet merged 1 commit intomasterfrom
max-vram-segmented-param-offload

Conversation

@leejet
Copy link
Copy Markdown
Owner

@leejet leejet commented May 6, 2026

  • Added support for --max-vram to limit runtime GPU memory usage.
  • When a VRAM budget is set, the runner executes in segments based on tiled results, temporarily offloading only the parameters needed for the current sub-graph into VRAM instead of loading all parameters into GPU memory at once.
  • This allows larger models to run on devices with limited VRAM and keeps peak VRAM usage close to the specified budget; if not set or set to 0, the original full-graph execution behavior is preserved.

@leejet leejet merged commit 90e87bc into master May 6, 2026
12 checks passed
fszontagh added a commit to fszontagh/stable-diffusion.cpp that referenced this pull request May 6, 2026
The destructor previously released runtime_params_buffer but missed
partial_runtime_params_buffer (the buffer used by the segmented param
offload path added in leejet#1476). On runner destruction with --max-vram
active, that GPU memory leaked.

Same class of leak as the existing runtime_params_buffer fix.
fszontagh added a commit to fszontagh/stable-diffusion.cpp that referenced this pull request May 6, 2026
The destructor previously released runtime_params_buffer but missed
partial_runtime_params_buffer (the buffer used by the segmented param
offload path added in leejet#1476). On runner destruction with --max-vram
active, that GPU memory leaked.

Same class of leak as the existing runtime_params_buffer fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant