Skip to content

[Bug] Bad performance on Windows: reallocating buffers automatically every step #1473

@silvertuanzi

Description

@silvertuanzi

Git commit

3d6064b

Operating System & Version

Windows 11 25H2

GGML backends

CUDA

Command-line arguments used

sd-cli.exe --diffusion-model "D:\AI\anima\split_files\diffusion_models\anima-preview3-base.safetensors" --vae "D:\AI\anima\split_files\vae\qwen_image_vae.safetensors" --llm "D:\AI\anima\split_files\text_encoders\qwen_3_06b_base.safetensors" -p "a lovely cat holding a sign says 'anima.cpp'" --cfg-scale 4.5 --fa -H 1024 -W 1024 --steps 20 --sampling-method euler_a --scheduler sgm_uniform -v

Steps to reproduce

Just clone and build with -DSD_CUDA=ON

What you expected to happen

Inference with high GPU usage.
I have a Linux on the same machine. I built sd.cpp on Linux and use the same command.
On Linux my GPU usage stays more than 80% until inference finished, get 4s/it default or 1s/it with --type f16

What actually happened

On Windows, every steps triggers graph has different number of nodes and reallocating buffers automatically.
So my GPU works with 1 second 90% and 3 seconds idle, waiting for reallocation.
The first step is 2s/it, and then 6s/it, and then more than 10s/it.

Logs / error messages / stack trace

[DEBUG] ggml_extend.hpp:1883 - anima compute buffer size: 206.05 MB(VRAM)

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_reserve_n_impl: reallocating CUDA0 buffer from size 206.05 MiB to 206.06 MiB

  |==>                                               | 1/20 - 2.07s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |=====>                                            | 2/20 - 6.88s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |=======>                                          | 3/20 - 8.73s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |==========>                                       | 4/20 - 9.64s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |============>                                     | 5/20 - 10.16s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |===============>                                  | 6/20 - 10.49s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |=================>                                | 7/20 - 10.72s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |====================>                             | 8/20 - 11.01s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |======================>                           | 9/20 - 11.15s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |=========================>                        | 10/20 - 11.21s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |===========================>                      | 11/20 - 11.30s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically
^C

Additional context / environment details

CPU models: Intel Ivy Bridge which is lack of AVX2 (the pre-built binary will crash)
GPU: NVIDIA RTX 2080Ti with 22GB VRAM
The behavior is similar whether no quantization or f16 or q8_0.
With or without --fa also no impact.
Compile with CUDA 13.2, VS Build tools 2026, cmake 4.3.2 and ninja 1.13.2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions