Skip to content

Conversation

@leejet
Copy link
Owner

@leejet leejet commented Oct 22, 2025

.\bin\Release\sd.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\Chroma1-Radiance-v0.4-Q8_0.gguf --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'chroma  radiance cpp'" --cfg-scale 4.0 --sampling-method euler -v
output

@leejet leejet mentioned this pull request Oct 22, 2025
@stduhpf

This comment was marked as resolved.

@stduhpf

This comment was marked as resolved.

@stduhpf
Copy link
Contributor

stduhpf commented Oct 22, 2025

Vulkan backend shows some deep-frying issue of its own that is not happening on ROCm:
image

(same prompt and settings as the example)

It looks kinda cool, but that's obviously not the expected result.

But at least it doesn't crash at higher resolutions:
image

Edit: running it with previews, the first couple of steps look fine, but then as the denoising progresses, the image gets more and more deep fried at every step.

@Green-Sky
Copy link
Contributor

Even when using the Q4_K_S quant (v0.3) I had to close down every other process, to make it fit into 8gig vram.

same settings as op, but clip-on-cpu + offload

chroma_radiance_a1

and with smoothstep schedule

chroma_radiance_ss1

(so 20steps 4.0cfg and euler)

@leejet
Copy link
Owner Author

leejet commented Oct 23, 2025

.\bin\Release\sd.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\Chroma1-Radiance-v0.4-Q8_0.gguf --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'chroma  radiance cpp'" --cfg-scale 4.0 --sampling-method euler -v -H 1024 -W 1024 --diffusion-fa --chroma-disable-dit-mask
output

@leejet
Copy link
Owner Author

leejet commented Oct 23, 2025

I implemented a simple workaround, and now large images can be generated correctly.

@leejet
Copy link
Owner Author

leejet commented Oct 23, 2025

This workaround is temporary and can be removed after the PR ggml-org/llama.cpp#16744 is merged.

@Green-Sky

This comment has been minimized.

@leejet
Copy link
Owner Author

leejet commented Oct 23, 2025

.\bin\Release\sd.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\Chroma1-Radiance-v0.4-Q8_0.gguf --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'chroma  radiance cpp'" --cfg-scale 4.0 --sampling-method euler -v -H 1024 -W 1024 --diffusion-fa --chroma-disable-dit-maskoma  radiance cpp'" --cfg-scale 4.0 --sampling-method euler -v -H 1024 -W 1024 --diffusion-fa --chroma-disable-dit-mask

Paste error in the command.

Fixed.

@leejet
Copy link
Owner Author

leejet commented Oct 23, 2025

I can generate images using the Vulkan backend that are similar to those produced by the CUDA backend. @stduhpf could you try again? Use the latest code.

@stduhpf
Copy link
Contributor

stduhpf commented Oct 23, 2025

@leejet I still have the exact same issue on Vulkan as before. Maybe it's driver-related?

@leejet
Copy link
Owner Author

leejet commented Oct 24, 2025

@leejet I still have the exact same issue on Vulkan as before. Maybe it's driver-related?

@stduhpf I wonder if you’ve tried updating the Vulkan SDK or your graphics driver to the latest version?

@stduhpf
Copy link
Contributor

stduhpf commented Oct 24, 2025

@stduhpf I wonder if you’ve tried updating the Vulkan SDK or your graphics driver to the latest version?

I haven't.

> vulkaninfo --summary
WARNING: [Loader Message] Code 0 : Layer VK_LAYER_OBS_HOOK uses API version 1.3 which is older than the application specified API version of 1.4. May cause issues.
WARNING: [Loader Message] Code 0 : Layer VK_LAYER_RTSS uses API version 1.3 which is older than the application specified API version of 1.4. May cause issues.
==========
VULKANINFO
==========

Vulkan Instance Version: 1.4.304


Instance Extensions: count = 13
-------------------------------
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_swapchain_colorspace            : extension revision 5
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_win32_surface                   : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 17
---------------------------
VK_LAYER_AMD_switchable_graphics    AMD switchable graphics layer                 1.4.308  version 1
VK_LAYER_EOS_Overlay                Vulkan overlay layer for Epic Online Services 1.2.136  version 1
VK_LAYER_EOS_Overlay                Vulkan overlay layer for Epic Online Services 1.2.136  version 1
VK_LAYER_KHRONOS_profiles           Khronos Profiles layer                        1.3.283  version 1
VK_LAYER_KHRONOS_shader_object      Khronos Shader object layer                   1.3.283  version 1
VK_LAYER_KHRONOS_synchronization2   Khronos Synchronization2 layer                1.3.283  version 1
VK_LAYER_KHRONOS_validation         Khronos Validation Layer                      1.3.283  version 1
VK_LAYER_LUNARG_api_dump            LunarG API dump layer                         1.3.283  version 2
VK_LAYER_LUNARG_gfxreconstruct      GFXReconstruct Capture Layer Version 1.0.4    1.3.283  version 4194308
VK_LAYER_LUNARG_monitor             Execution Monitoring Layer                    1.3.283  version 1
VK_LAYER_LUNARG_screenshot          LunarG image capture layer                    1.3.283  version 1
VK_LAYER_OBS_HOOK                   Open Broadcaster Software hook                1.3.216  version 1
VK_LAYER_RENDERDOC_Capture          Debugging capture layer for RenderDoc         1.2.131  version 17
VK_LAYER_ROCKSTAR_GAMES_social_club Rockstar Games Social Club Layer              1.0.70   version 1
VK_LAYER_RTSS                       RTSS overlay hook bootstrap                   1.3.224  version 1
VK_LAYER_VALVE_steam_fossilize      Steam Pipeline Caching Layer                  1.4.303  version 1
VK_LAYER_VALVE_steam_overlay        Steam Overlay Layer                           1.3.207  version 1

Devices:
========
GPU0:
        apiVersion         = 1.4.308
        driverVersion      = 2.0.342
        vendorID           = 0x1002
        deviceID           = 0x731f
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 5700 XT
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 25.6.1 (AMD proprietary shader compiler)
        conformanceVersion = 1.4.0.0
        deviceUUID         = 00000000-2700-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
GPU1:
        apiVersion         = 1.4.308
        driverVersion      = 2.0.342
        vendorID           = 0x1002
        deviceID           = 0x73bf
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 6800
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 25.6.1 (AMD proprietary shader compiler)
        conformanceVersion = 1.4.0.0
        deviceUUID         = 00000000-2a00-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000

I know my drivers are a few months out of date, but I don't think that should matter too much?

@leejet
Copy link
Owner Author

leejet commented Oct 24, 2025

I'm not quite sure because I didn't have any issues when testing Vulkan on my end.

@stduhpf
Copy link
Contributor

stduhpf commented Oct 24, 2025

> test-backend-ops.exe | Select-String -pattern "FAIL"
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon RX 6800 (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none

[SSM_SCAN] NMSE = 0.006895084 > 0.000000100
SSM_SCAN(type=f32,d_state=128,head_dim=64,n_head=16,n_group=2,n_seq_tokens=32,n_seqs=4): FAIL
[SSM_SCAN] NMSE = 0.018997427 > 0.000000100
SSM_SCAN(type=f32,d_state=256,head_dim=64,n_head=8,n_group=2,n_seq_tokens=32,n_seqs=4): FAIL
  Backend Vulkan0: FAIL
[SSM_SCAN] NMSE = 0.014833752 > 0.000000100
SSM_SCAN(type=f32,d_state=128,head_dim=64,n_head=16,n_group=2,n_seq_tokens=32,n_seqs=4): FAIL
[SSM_SCAN] NMSE = 0.018720127 > 0.000000100
SSM_SCAN(type=f32,d_state=256,head_dim=64,n_head=8,n_group=2,n_seq_tokens=32,n_seqs=4): FAIL
  Backend Vulkan1: FAIL
FAIL

I don't think sd.cpp uses SSM_SCAN anywhere?

@leejet
Copy link
Owner Author

leejet commented Oct 25, 2025

Yes, ssm_scan isn’t used in sd.cpp. Since I can’t reproduce your issue and both Vulkan and CUDA work fine in my tests, I think this PR can be merged.

@stduhpf
Copy link
Contributor

stduhpf commented Oct 25, 2025

@leejet I just noticed a somewhat similar (but less obvious) issue with Qwen-Image on Vulkan.

ROCm Vulkan
output output

I'm thinking it could be the ggml_chunk operation that is behaving strangely on my end. As far as I know, only Chroma Radiance, Qwen Image and Wan use that op, right?

Edit: There is a (slight) diffrence between ROCm and Vulkan builds on most models I tested so far. Only for Qwen and especially Chroma Radiance the image looks significantly worse on Vulkan.
Wan 14B is the only one I got so far where the images were absolutely identical (tested sd1.x, Flux Dev, Wan 14b and 5b, Qwen Image, Chroma HD, and Chroma Radiance).

@leejet
Copy link
Owner Author

leejet commented Oct 25, 2025

As far as I know, only Chroma Radiance, Qwen Image and Wan use that op, right?

Yes. If the issue is really caused by ggml_chunk, I’m a bit suspicious that it might actually be a buffer management problem with Vulkan on certain devices.

@leejet leejet merged commit 9e28be6 into master Oct 25, 2025
8 checks passed
@stduhpf
Copy link
Contributor

stduhpf commented Oct 25, 2025

At least, whatever causes it, the error is deterministic. I get the same broken image everytime.

@leejet
Copy link
Owner Author

leejet commented Oct 25, 2025

My device can't reproduce this issue, which makes it very difficult for me to locate and fix this bug.

@stduhpf
Copy link
Contributor

stduhpf commented Oct 25, 2025

I'm mostly using ROCm anyways, I just tried Vulkan because I wanted to test the model at higher resolutions before you implemented the workaround. I'm going to try how it behaves on my "GPU0", it will take a while.
Edit: I won't be able to test it before Monday, I'm away for the weekend and forgot to set up my PC for remote access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants