hexagon: optimize HMX matmul operations#21071
hexagon: optimize HMX matmul operations#21071chraac wants to merge 17 commits intoggml-org:masterfrom
Conversation
…ales initialization
| TIMER_START(total); | ||
|
|
||
| HAP_compute_res_hmx_lock(ctx->vtcm_rctx); | ||
| hmx_set_output_scales(vtcm_scales); |
There was a problem hiding this comment.
Key change: move scale and bias initialization out of the loop instead of reinitializing the same scale each iteration, to reduces some HMX register setup.
There was a problem hiding this comment.
Potential follow-up: reuse this scale for quant-block scaling to avoid the dequantization vmpy, at the cost of extra VTCM cache for scale storage...
# Conflicts: # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
|
@chraac |
Np, lets wait for your PR merged first, then redo this one based on that. |
Overview
This pull request refactors several matrix multiplication and data handling routines in
ggml-hexagon/htp/hmx-matmul-ops.cto improve type safety, consistency, and code clarity. The main changes involve standardizing loop counters and size-related variables to usesize_tinstead ofint, updating function signatures accordingly, and simplifying tile indexing logic. Additionally, the initialization of column scales is made more consistent, and some redundant or legacy code paths are removed.Type safety and consistency improvements:
inttosize_tacross multiple functions (e.g.,core_dot_chunk_fp16,core_mma_chunk_fp16,transfer_output_chunk_fp16_to_fp32) and updated related calculations and function signatures for better type safety and to prevent integer overflow issues. [1] [2] [3] [4] [5]const size_tfor sizes and counts, improving code clarity and reducing potential bugs from type mismatches. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]Additional information
Tested with Qwen3.5-2b-q4, works well
Requirements