Skip to content

Vulkan: is it possible to work around maxBufferSize? #673

@drHuangMHT

Description

@drHuangMHT

Hi! Thanks in advance for your great work with sd and vulkan support!

I have a MI50 32GB GPU but despite the ample VRAM, I cannot run VAE decode on it:

System Info:
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:174  - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro VII (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | matrix cores: none
// model info omitted
[INFO ] stable-diffusion.cpp:244  - Version: SDXL
[INFO ] stable-diffusion.cpp:277  - Weight type:                 f32
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     f32
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
// FP16 VAE warning on SDXL omitted
[DEBUG] ggml_extend.hpp:1174 - clip params backend buffer size =  469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1174 - clip params backend buffer size =  2649.92 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1174 - unet params backend buffer size =  9099.29 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:1174 - vae params backend buffer size =  94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:419  - loading weights
[DEBUG] model.cpp:1727 - loading tensors
  |==================================================| 2641/2641 - 200.00it/s
[INFO ] stable-diffusion.cpp:503  - total params memory size = 12313.11MB (VRAM 12313.11MB, RAM 0.00MB): clip 3119.36MB(VRAM), unet 9099.29MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:522  - loading model from 'D:\hassakuXLIllustrious_v2.safetensors' completed, taking 13.68s
[INFO ] stable-diffusion.cpp:556  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:600  - finished loaded file
// logs related to image generation omitted
[DEBUG] ggml_extend.hpp:1126 - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 20/20 - 3.95s/it
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 80.33s
[INFO ] stable-diffusion.cpp:1486 - generating 1 latent images completed, taking 80.59s
[INFO ] stable-diffusion.cpp:1489 - decoding 1 latents
ggml_vulkan: Device memory allocation of size 6979321856 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 6979321856 // ~6GB
[ERROR] ggml_extend.hpp:1119 - vae: failed to allocate the compute buffer

I believe it is caused by maxBufferSize being 2 147 483 648(~2GB out of 32GB total) in core 1.3 property of the GPU. The same applies to unet compute buffer. For reference, my Titan V reports maxBufferSize being 1 099 511 627 776(~10GB out of 12GB total).

However, using DirectML, all VRAM can be successfully allocated(at least that's what showed up on my Task Manager) with similar generation speed.

From what I can tell, the possibility of solving the issue on software level is very small, but I'm asking anyways because there is definitely someone who understand it better and answer it once and for all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions