Skip to content

[Bug] flux compute buffer size much bigger after ggml update #907

@evcharger

Description

@evcharger

Git commit

b25785b

Operating System & Version

Ubuntu 24.04

GGML backends

Vulkan

Command-line arguments used

./sd -r /home/user/shirt8.png --diffusion-model /home/user/sd.cpp-webui/models/unet/flux1-kontext-dev-Q8_0.gguf -W 736 -H 1024 --lora-model-dir /home/user/sd.cpp-webui/models/loras/ --vae /home/user/sd.cpp-webui/models/vae/ae.safetensors --clip_l /home/user/sd.cpp-webui/models/clip/clip_l.safetensors --t5xxl /home/user/sd.cpp-webui/models/clip/t5xxl_fp16.safetensors -p "lora:tryanything_flux_kontext_lora:1 try on this outfit, man, copy shirt" --cfg-scale 1.0 --sampling-method euler -v --clip-on-cpu -o /home/user/sd.cpp-webui/outputs/imgedit/2.png --vae-on-cpu

Steps to reproduce

All versions prior to commit b25785b which syncs ggml when running the command would set the following amount of vram for buffer - [DEBUG] ggml_extend.hpp:1550 - flux compute buffer size: 2955.44 MB(VRAM).
With that commit the amount of vram for buffer jumps - [DEBUG] ggml_extend.hpp:1579 - flux compute buffer size: 7822.92 MB(VRAM) which overflows into GTT memory and becomes super slow.

What you expected to happen

The vram usage to be below 16 GB with [DEBUG] ggml_extend.hpp:1550 - flux compute buffer size: 2955.44 MB(VRAM)

What actually happened

vram usage spiked to 21 GB with [DEBUG] ggml_extend.hpp:1579 - flux compute buffer size: 7822.92 MB(VRAM) on same command

Logs / error messages / stack trace

No response

Additional context / environment details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions