-
Notifications
You must be signed in to change notification settings - Fork 429
Description
Hi! Thanks in advance for your great work with sd and vulkan support!
I have a MI50 32GB GPU but despite the ample VRAM, I cannot run VAE decode on it:
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:174 - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro VII (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | matrix cores: none
// model info omitted
[INFO ] stable-diffusion.cpp:244 - Version: SDXL
[INFO ] stable-diffusion.cpp:277 - Weight type: f32
[INFO ] stable-diffusion.cpp:278 - Conditioner weight type: f32
[INFO ] stable-diffusion.cpp:279 - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:280 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:282 - ggml tensor size = 400 bytes
// FP16 VAE warning on SDXL omitted
[DEBUG] ggml_extend.hpp:1174 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1174 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1174 - unet params backend buffer size = 9099.29 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:1174 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:419 - loading weights
[DEBUG] model.cpp:1727 - loading tensors
|==================================================| 2641/2641 - 200.00it/s
[INFO ] stable-diffusion.cpp:503 - total params memory size = 12313.11MB (VRAM 12313.11MB, RAM 0.00MB): clip 3119.36MB(VRAM), unet 9099.29MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:522 - loading model from 'D:\hassakuXLIllustrious_v2.safetensors' completed, taking 13.68s
[INFO ] stable-diffusion.cpp:556 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:600 - finished loaded file
// logs related to image generation omitted
[DEBUG] ggml_extend.hpp:1126 - unet compute buffer size: 830.86 MB(VRAM)
|==================================================| 20/20 - 3.95s/it
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 80.33s
[INFO ] stable-diffusion.cpp:1486 - generating 1 latent images completed, taking 80.59s
[INFO ] stable-diffusion.cpp:1489 - decoding 1 latents
ggml_vulkan: Device memory allocation of size 6979321856 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 6979321856 // ~6GB
[ERROR] ggml_extend.hpp:1119 - vae: failed to allocate the compute buffer
I believe it is caused by maxBufferSize
being 2 147 483 648
(~2GB out of 32GB total) in core 1.3 property of the GPU. The same applies to unet compute buffer. For reference, my Titan V reports maxBufferSize
being 1 099 511 627 776
(~10GB out of 12GB total).
However, using DirectML, all VRAM can be successfully allocated(at least that's what showed up on my Task Manager) with similar generation speed.
From what I can tell, the possibility of solving the issue on software level is very small, but I'm asking anyways because there is definitely someone who understand it better and answer it once and for all.