Add a new `x8-packq` microkernel that packs and per-row dynamically quantizes `fp32` to `qp8`. #6424

copybara-service · 2024-05-15T14:09:00Z

Add a new x8-packq microkernel that packs and per-row dynamically quantizes fp32 to qp8.

The microkernels themselves are just wrappers for the corresponding KleidiAI kernels.

fbarchard · 2024-05-16T00:09:52Z

src/x8-packq/x8-packq-scalar-f32qp8-u1.c

+  const size_t k_block_len = kr / sr;
+
+  for (size_t row_idx = 0; row_idx < num_rows; ++row_idx) {
+    float max0 = -FLT_MAX;


use params or initialize to the first value, like rmax

fbarchard · 2024-05-16T00:11:19Z

src/x8-packq/x8-packq-scalar-f32qp8-u1.c

+        // Scale the values.
+        int32_t v0_s32 = (int32_t)(round(src0_0 * scale0));
+
+        v0_s32 = v0_s32 + nudged_zero_point0;


can we make this adjustable for vnni to support +128?

fbarchard · 2024-05-16T00:12:43Z

src/x8-packq/x8-packq-scalar-f32qp8-u1.c

+        const float src0_0 = *(src_ptr + k_idx + k_block_idx);
+
+        // Scale the values.
+        int32_t v0_s32 = (int32_t)(round(src0_0 * scale0));


fbarchard · 2024-05-16T00:13:50Z

src/x8-packq/x8-packq-scalar-f32qp8-u1.c

+        v0_s32 = v0_s32 + nudged_zero_point0;
+        v0_s32 = fmaxf(v0_s32, INT8_MIN);
+        v0_s32 = fminf(v0_s32, INT8_MAX);
+        *((int8_t*)(dst_ptr)) = (int8_t)v0_s32;


can this be made to do 8 values from src0 and 8 values from src1 for i8mm?

fbarchard · 2024-05-16T00:16:08Z

src/xnnpack/microfnptr.h

+    size_t m_idx_start,         // Starting index in `lhs_packed`.
+    const float* XNN_RESTRICT lhs,  // Left-hand operator to pack.
+    size_t lhs_stride,          // Stride in bytes between the rows of `lhs`.
+    void* XNN_RESTRICT lhs_packed   // The quantized and packed output.


can lhs output be standard gemm qd8 int8 with stride and quantization params as (row sum) a different pointer?

…uantizes `fp32` to `qp8`. The microkernels themselves are just wrappers for the corresponding KleidiAI kernels. PiperOrigin-RevId: 643324022

fbarchard reviewed May 16, 2024

View reviewed changes

copybara-service bot force-pushed the test_633914713 branch 7 times, most recently from ba3ec60 to 9d5e952 Compare May 27, 2024 12:03

copybara-service bot force-pushed the test_633914713 branch 8 times, most recently from 3d53157 to 46834e5 Compare June 14, 2024 12:07

Add a new x8-packq microkernel that packs and per-row dynamically q…

bffaa6f

…uantizes `fp32` to `qp8`. The microkernels themselves are just wrappers for the corresponding KleidiAI kernels. PiperOrigin-RevId: 643324022

copybara-service bot force-pushed the test_633914713 branch from 46834e5 to bffaa6f Compare June 14, 2024 12:53

copybara-service bot merged commit bffaa6f into master Jun 14, 2024

copybara-service bot deleted the test_633914713 branch June 14, 2024 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new `x8-packq` microkernel that packs and per-row dynamically quantizes `fp32` to `qp8`. #6424

Add a new `x8-packq` microkernel that packs and per-row dynamically quantizes `fp32` to `qp8`. #6424

copybara-service bot commented May 15, 2024

fbarchard May 16, 2024

fbarchard May 16, 2024

fbarchard May 16, 2024

fbarchard May 16, 2024

fbarchard May 16, 2024

Add a new x8-packq microkernel that packs and per-row dynamically quantizes fp32 to qp8. #6424

Add a new x8-packq microkernel that packs and per-row dynamically quantizes fp32 to qp8. #6424

Conversation

copybara-service bot commented May 15, 2024

fbarchard May 16, 2024

Choose a reason for hiding this comment

fbarchard May 16, 2024

Choose a reason for hiding this comment

fbarchard May 16, 2024

Choose a reason for hiding this comment

fbarchard May 16, 2024

Choose a reason for hiding this comment

fbarchard May 16, 2024

Choose a reason for hiding this comment

Add a new `x8-packq` microkernel that packs and per-row dynamically quantizes `fp32` to `qp8`. #6424

Add a new `x8-packq` microkernel that packs and per-row dynamically quantizes `fp32` to `qp8`. #6424