Skip to content

Feature Request: Vulkan: Implement CPY op for quantized types #11127

@stduhpf

Description

@stduhpf

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

This is mostly related to ggml, but I was advised to report the issue here.

Basically, this would require implementing quantization shaders for Vulkan (that's the easy part), and supporting them in the cpp code.

Motivation

With stable-diffusion.cpp compiled with Vulkan backend, when attempting to load a lora on a quantized model (any non float type), the program prints Missing CPY op for types: f32 q8_0 (for example) and crashes at this line.

Having more ops implemented is a good thing, especially if it fixes a crash downstream.

Possible Implementation

I'm guessing something like this for the shaders (q8_0):

#version 450

#include "quant_head.comp" //do not exixt

layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

layout (binding = 0) readonly buffer A {float data_a[];};
layout (binding = 1) writeonly buffer D {block_q8_0 data_b[];};

void main() {
    const uint i = gl_WorkGroupID.x * 4 + gl_LocalInvocationID.x / 64;

    const uint tid = gl_LocalInvocationID.x % 64;
    const uint il = tid / 32;
    const uint ir = tid % 32;
    const uint ib = 32 * i + ir;
    if (ib >= p.nel / 32) {
        return;
    }

    const uint b_idx = 1024 * i + 32 * ir + 16 * il;

    float absmax = 0.0;
    [[unroll]] for (uint j = 0; j < 32; ++j) {
        absmax = max(absmax, abs(data_a[b_idx + j]));
    }
 
    float d= absmax / 127.0;
   float id = d != 0. ? 1./d : d;
    data_b[ib].d = float16_t(d);
    [[unroll]] for (uint j = 0; j < 32; ++j) {
        data_b[ib].qs[16 * il + j] = uint8_t(clamp(data_a[b_idx + j] * id, -128.0, 127.0));
    }
}

I don't know how to proceed further in the implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions