Vulkan backend fails to compile a number of shaders on Adreno #6395

woachk · 2024-03-30T08:34:20Z

Hello,

Tried to run llama.cpp with Vulkan on Adreno 690 (Snapdragon 8cx Gen 3) on Windows 11 version 24H2 and this is what I get:

ggml_vk_create_pipeline(matmul_q4_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128)
Thread 0, Frame 0:
vkCreateShaderModule(device, pCreateInfo, pAllocator, pShaderModule) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 00000174669B6F50
    pCreateInfo:                    const VkShaderModuleCreateInfo* = 0000004041920B80:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO (16)
        pNext:                          const void* = NULL
        flags:                          VkShaderModuleCreateFlags = 0
        codeSize:                       size_t = 12072
        pCode:                          const uint32_t* = SHADER DATA
    pAllocator:                     const VkAllocationCallbacks* = NULL
    pShaderModule:                  VkShaderModule* = 0000017466C7FBC0

Thread 0, Frame 0:
vkCreateDescriptorSetLayout(device, pCreateInfo, pAllocator, pSetLayout) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 00000174669B6F50
    pCreateInfo:                    const VkDescriptorSetLayoutCreateInfo* = 0000004041920C48:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO (32)
        pNext:                          const void* = VkDescriptorSetLayoutBindingFlagsCreateInfo
        flags:                          VkDescriptorSetLayoutCreateFlags = 0
        bindingCount:                   uint32_t = 3
        pBindings:                      const VkDescriptorSetLayoutBinding* = 000001745EDD4BD0
            pBindings[0]:                   const VkDescriptorSetLayoutBinding = 000001745EDD4BD0:
                binding:                        uint32_t = 0
                descriptorType:                 VkDescriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER (7)
                descriptorCount:                uint32_t = 1
                stageFlags:                     VkShaderStageFlags = 32 (VK_SHADER_STAGE_COMPUTE_BIT)
                pImmutableSamplers:             const VkSampler* = UNUSED
            pBindings[1]:                   const VkDescriptorSetLayoutBinding = 000001745EDD4BE8:
                binding:                        uint32_t = 1
                descriptorType:                 VkDescriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER (7)
                descriptorCount:                uint32_t = 1
                stageFlags:                     VkShaderStageFlags = 32 (VK_SHADER_STAGE_COMPUTE_BIT)
                pImmutableSamplers:             const VkSampler* = UNUSED
            pBindings[2]:                   const VkDescriptorSetLayoutBinding = 000001745EDD4C00:
                binding:                        uint32_t = 2
                descriptorType:                 VkDescriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER (7)
                descriptorCount:                uint32_t = 1
                stageFlags:                     VkShaderStageFlags = 32 (VK_SHADER_STAGE_COMPUTE_BIT)
                pImmutableSamplers:             const VkSampler* = UNUSED
        pNext:                          VkDescriptorSetLayoutBindingFlagsCreateInfo = 0000004041920C08:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_BINDING_FLAGS_CREATE_INFO (1000161000)
            pNext:                          const void* = NULL
            bindingCount:                   uint32_t = 3
            pBindingFlags:                  const VkDescriptorBindingFlags* = 00000174667DE6F0
                pBindingFlags[0]:               const VkDescriptorBindingFlags = 0
                pBindingFlags[1]:               const VkDescriptorBindingFlags = 0
                pBindingFlags[2]:               const VkDescriptorBindingFlags = 0
    pAllocator:                     const VkAllocationCallbacks* = NULL
    pSetLayout:                     VkDescriptorSetLayout* = 0000017466A468F0

Thread 0, Frame 0:
vkCreatePipelineLayout(device, pCreateInfo, pAllocator, pPipelineLayout) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 00000174669B6F50
    pCreateInfo:                    const VkPipelineLayoutCreateInfo* = 0000004041920D90:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO (30)
        pNext:                          const void* = NULL
        flags:                          VkPipelineLayoutCreateFlags = 0
        setLayoutCount:                 uint32_t = 1
        pSetLayouts:                    const VkDescriptorSetLayout* = 0000017466B13170
            pSetLayouts[0]:                 const VkDescriptorSetLayout = 0000017466A468F0
        pushConstantRangeCount:         uint32_t = 1
        pPushConstantRanges:            const VkPushConstantRange* = 0000004041920C30
            pPushConstantRanges[0]:         const VkPushConstantRange = 0000004041920C30:
                stageFlags:                     VkShaderStageFlags = 32 (VK_SHADER_STAGE_COMPUTE_BIT)
                offset:                         uint32_t = 0
                size:                           uint32_t = 56
    pAllocator:                     const VkAllocationCallbacks* = NULL
    pPipelineLayout:                VkPipelineLayout* = 0000017472C5FC60

Thread 0, Frame 0:
vkCreateComputePipelines(device, pipelineCache, createInfoCount, pCreateInfos, pAllocator, pPipelines) returns VkResult VK_ERROR_UNKNOWN (-13):
    device:                         VkDevice = 00000174669B6F50
    pipelineCache:                  VkPipelineCache = 0000000000000000
    createInfoCount:                uint32_t = 1
    pCreateInfos:                   const VkComputePipelineCreateInfo* = 0000004041920E60
        pCreateInfos[0]:                const VkComputePipelineCreateInfo = 0000004041920E60:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO (29)
            pNext:                          const void* = NULL
            flags:                          VkPipelineCreateFlags = 0
            stage:                          VkPipelineShaderStageCreateInfo = 0000004041920E78:
                sType:                          VkStructureType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO (18)
                pNext:                          const void* = NULL
                flags:                          VkPipelineShaderStageCreateFlags = 0
                stage:                          VkShaderStageFlagBits = 32 (VK_SHADER_STAGE_COMPUTE_BIT)
                module:                         VkShaderModule = 0000017466C7FBC0
                pName:                          const char* = "main"
                pSpecializationInfo:            const VkSpecializationInfo* = 0000004041920E00:
                    mapEntryCount:                  uint32_t = 10
                    pMapEntries:                    const VkSpecializationMapEntry* = 000001746699DEF0
                        pMapEntries[0]:                 const VkSpecializationMapEntry = 000001746699DEF0:
                            constantID:                     uint32_t = 0
                            offset:                         uint32_t = 0
                            size:                           size_t = 4
                        pMapEntries[1]:                 const VkSpecializationMapEntry = 000001746699DF00:
                            constantID:                     uint32_t = 1
                            offset:                         uint32_t = 4
                            size:                           size_t = 4
                        pMapEntries[2]:                 const VkSpecializationMapEntry = 000001746699DF10:
                            constantID:                     uint32_t = 2
                            offset:                         uint32_t = 8
                            size:                           size_t = 4
                        pMapEntries[3]:                 const VkSpecializationMapEntry = 000001746699DF20:
                            constantID:                     uint32_t = 3
                            offset:                         uint32_t = 12
                            size:                           size_t = 4
                        pMapEntries[4]:                 const VkSpecializationMapEntry = 000001746699DF30:
                            constantID:                     uint32_t = 4
                            offset:                         uint32_t = 16
                            size:                           size_t = 4
                        pMapEntries[5]:                 const VkSpecializationMapEntry = 000001746699DF40:
                            constantID:                     uint32_t = 5
                            offset:                         uint32_t = 20
                            size:                           size_t = 4
                        pMapEntries[6]:                 const VkSpecializationMapEntry = 000001746699DF50:
                            constantID:                     uint32_t = 6
                            offset:                         uint32_t = 24
                            size:                           size_t = 4
                        pMapEntries[7]:                 const VkSpecializationMapEntry = 000001746699DF60:
                            constantID:                     uint32_t = 7
                            offset:                         uint32_t = 28
                            size:                           size_t = 4
                        pMapEntries[8]:                 const VkSpecializationMapEntry = 000001746699DF70:
                            constantID:                     uint32_t = 8
                            offset:                         uint32_t = 32
                            size:                           size_t = 4
                        pMapEntries[9]:                 const VkSpecializationMapEntry = 000001746699DF80:
                            constantID:                     uint32_t = 9
                            offset:                         uint32_t = 36
                            size:                           size_t = 4
                    dataSize:                       size_t = 40
                    pData:                          const void* = 000001746531B410
            layout:                         VkPipelineLayout = 0000017472C5FC60
            basePipelineHandle:             VkPipeline = 0000000000000000
            basePipelineIndex:              int32_t = 0
    pAllocator:                     const VkAllocationCallbacks* = NULL
    pPipelines:                     VkPipeline* = 0000004041920AC8
        pPipelines[0]:                  VkPipeline = 0000000000000000

When uncommenting shaders it turned out that the problematic ones also included dequant_q4_0 among other ones.

This is bug #5739 on Android.

The text was updated successfully, but these errors were encountered:

woachk · 2024-03-30T08:46:03Z

And in the shader validation errors category (in other shaders, that compiled without errors despite those):

VUID-RuntimeSpirv-storageBuffer8BitAccess-06328(ERROR / SPEC): msgNum: -1143895426 - Validation Error: [ VUID-RuntimeSpirv-storageBuffer8BitAccess-06328 ] | MessageID = 0xbbd18a7e | vkCreateShaderModule():  SPIR-V contains an 8-bit OpVariable with StorageBuffer Storage Class, but storageBuffer8BitAccess was not enabled.
%264 = OpVariable %263 12
. The Vulkan spec states: If storageBuffer8BitAccess is VK_FALSE, then objects containing an 8-bit integer element must not have Storage Class of StorageBuffer, ShaderRecordBufferKHR, or PhysicalStorageBuffer (https://vulkan.lunarg.com/doc/view/1.3.280.0/windows/1.3-extensions/vkspec.html#VUID-RuntimeSpirv-storageBuffer8BitAccess-06328)

and

VUID-VkShaderModuleCreateInfo-pCode-08737(ERROR / SPEC): msgNum: -1520283006 - Validation Error: [ VUID-VkShaderModuleCreateInfo-pCode-08737 ] | MessageID = 0xa5625282 | vkCreateShaderModule(): pCreateInfo->pCode (spirv-val produced an error):
Invalid SPIR-V binary version 1.5 for target environment SPIR-V 1.3 (under Vulkan 1.1 semantics). The Vulkan spec states: If pCode is a pointer to SPIR-V code, pCode must adhere to the validation rules described by the Validation Rules within a Module section of the SPIR-V Environment appendix (https://vulkan.lunarg.com/doc/view/1.3.280.0/windows/1.3-extensions/vkspec.html#VUID-VkShaderModuleCreateInfo-pCode-08737)

storageBuffer8BitAccess is not exposed in this particular Adreno.

and other ones:

ggml_vk_create_pipeline(matmul_q8_0_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128)
VUID-RuntimeSpirv-Workgroup-06530(ERROR / SPEC): msgNum: -1405964136 - Validation Error: [ VUID-RuntimeSpirv-Workgroup-06530 ] Object 0: handle = 0x95ff2600000000b7, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0xac32b098 | vkCreateComputePipelines(): pCreateInfos[0].stage SPIR-V uses 33792 bytes of shared memory, which is more than maxComputeSharedMemorySize (32768). The Vulkan spec states: The sum of size in bytes for variables and padding in the Workgroup Storage Class in the GLCompute Execution Model must be less than or equal to maxComputeSharedMemorySize (https://vulkan.lunarg.com/doc/view/1.3.280.0/windows/1.3-extensions/vkspec.html#VUID-RuntimeSpirv-Workgroup-06530)
    Objects: 1

woachk · 2024-03-30T21:51:23Z

I get much further when using Kompute with some hacks to skip feature probing/checks that we're on Vulkan 1.2 or above.

Using:
main.exe -ngl 999 -m D:\Downloads\ggml-model-f16.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e

llama_print_timings:        load time =    4777.78 ms
llama_print_timings:      sample time =     338.93 ms /   400 runs   (    0.85 ms per token,  1180.20 tokens per second)
llama_print_timings: prompt eval time =    1207.33 ms /    19 tokens (   63.54 ms per token,    15.74 tokens per second)
llama_print_timings:        eval time =   52685.29 ms /   399 runs   (  132.04 ms per token,     7.57 tokens per second)
llama_print_timings:       total time =   57399.03 ms /   418 tokens
Log end

but only f16 works there, not q4_0 (tested with https://huggingface.co/ggml-org/models/blob/main/tinyllama-1.1b/ggml-model-f16.gguf)

0cc4m · 2024-04-01T07:11:13Z

While I don't have the time to figure out Adreno support myself, I'm happy to assist if someone wants to take it on. The lack of storageBuffer8BitAccess would need a fallback to int16_t and some bitshifting, I guess. I think the mobile GPUs also don't like my big matrix matrix multiplication shader and would need a simpler version of that.

github-actions · 2024-05-16T01:06:38Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

woachk added the bug-unconfirmed label Mar 30, 2024

github-actions bot added the stale label May 2, 2024

Jeximo mentioned this issue May 2, 2024

Tidy Android Instructions README.md #7016

Merged

github-actions bot closed this as completed May 16, 2024

Jeximo mentioned this issue May 20, 2024

[Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only #7351

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan backend fails to compile a number of shaders on Adreno #6395

Vulkan backend fails to compile a number of shaders on Adreno #6395

woachk commented Mar 30, 2024 •

edited

woachk commented Mar 30, 2024 •

edited

woachk commented Mar 30, 2024

0cc4m commented Apr 1, 2024

github-actions bot commented May 16, 2024

Vulkan backend fails to compile a number of shaders on Adreno #6395

Vulkan backend fails to compile a number of shaders on Adreno #6395

Comments

woachk commented Mar 30, 2024 • edited

woachk commented Mar 30, 2024 • edited

woachk commented Mar 30, 2024

0cc4m commented Apr 1, 2024

github-actions bot commented May 16, 2024

woachk commented Mar 30, 2024 •

edited

woachk commented Mar 30, 2024 •

edited