hexagon: optimize HMX matmul operations by chraac · Pull Request #21071 · ggml-org/llama.cpp

chraac · 2026-03-27T15:05:35Z

Overview

This pull request refactors several matrix multiplication and data handling routines in ggml-hexagon/htp/hmx-matmul-ops.c to improve type safety, consistency, and code clarity. The main changes involve standardizing loop counters and size-related variables to use size_t instead of int, updating function signatures accordingly, and simplifying tile indexing logic. Additionally, the initialization of column scales is made more consistent, and some redundant or legacy code paths are removed.

Type safety and consistency improvements:

Changed loop counters and size-related variables from int to size_t across multiple functions (e.g., core_dot_chunk_fp16, core_mma_chunk_fp16, transfer_output_chunk_fp16_to_fp32) and updated related calculations and function signatures for better type safety and to prevent integer overflow issues. [1] [2] [3] [4] [5]
Updated function signatures and local variable declarations to consistently use const size_t for sizes and counts, improving code clarity and reducing potential bugs from type mismatches. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Additional information

Tested with Qwen3.5-2b-q4, works well

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, for commit log and PR descriptions

…front

…e readability

…r tile counts

…ation

…ales initialization

chraac · 2026-03-27T15:10:27Z

ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c

    TIMER_START(total);

    HAP_compute_res_hmx_lock(ctx->vtcm_rctx);
+    hmx_set_output_scales(vtcm_scales);


Key change: move scale and bias initialization out of the loop instead of reinitializing the same scale each iteration, to reduces some HMX register setup.

Potential follow-up: reuse this scale for quant-block scaling to avoid the dequantization vmpy, at the cost of extra VTCM cache for scale storage...

…ing and locking

…output

# Conflicts: # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c

max-krasnyansky · 2026-04-01T15:50:41Z

@chraac
Sorry for the delay on this. We have a couple of decent size changes coming in this area (op request batching/buffer management and HMX optimizations).
We'll probably merge that stuff first (hopefully by end of this week), then I'm going to work with you to rebase/update/merge things on top.

chraac · 2026-04-02T06:11:45Z

We'll probably merge that stuff first (hopefully by end of this week), then I'm going to work with you to rebase/update/merge things on top.

Np, lets wait for your PR merged first, then redo this one based on that.

… initialization

max-krasnyansky · 2026-04-10T18:49:07Z

@chraac here is that PR that I mentioned #21705
There are more HMX specific updates coming on top shortly.

chraac added 12 commits March 26, 2026 21:28

optimize hmx_mat_mul functions by calculating row and column tiles up…

de56c35

…front

refactor core_dot_chunk_fp16 to use size_t for tile counts and improv…

b2b21a3

…e readability

wip

5e18f4e

set scale outside of loop

a262832

wip

ee95d92

refactor core_mma_chunk_fp16 and mat_mul_qk_0_d16a32 to use size_t fo…

33d9431

…r tile counts

wip

3a97015

wip

6e291d8

refactor transfer_output_chunk_fp16_to_fp32 to use size_t for dimensions

f43d68c

refactor core_dot_chunk_fp16 to use size_t for tile row stride calcul…

42bd08c

…ation

wip

ee95146

refactor hmx_mat_mul functions to use hvx_vec_splat_f16 for column sc…

91d88a3

…ales initialization

chraac requested a review from a team as a code owner March 27, 2026 15:05

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Hexagon labels Mar 27, 2026

chraac commented Mar 27, 2026

View reviewed changes

chraac added 3 commits March 27, 2026 23:32

refactor hmx_mat_mul_permuted_w16a32_batched to streamline scale sett…

55d7258

…ing and locking

refactor core_dot_chunk_fp16 to improve tile stride calculations for …

362c62c

…output

Merge branch 'master' into dev-hmx-opt

e31e30a

# Conflicts: # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c

chraac added 2 commits April 4, 2026 21:33

refactor hmx_mat_mul functions to use Q6_V_vsplat_R for column scales…

7c1a5a3

… initialization

Merge branch 'master' into dev-hmx-opt

3cd8041

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hexagon: optimize HMX matmul operations#21071

hexagon: optimize HMX matmul operations#21071
chraac wants to merge 17 commits intoggml-org:masterfrom
chraac:dev-hmx-opt

chraac commented Mar 27, 2026 •

edited

Loading

Uh oh!

chraac Mar 27, 2026

Uh oh!

chraac Mar 27, 2026

Uh oh!

max-krasnyansky commented Apr 1, 2026

Uh oh!

chraac commented Apr 2, 2026

Uh oh!

max-krasnyansky commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chraac commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

chraac Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

chraac Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

max-krasnyansky commented Apr 1, 2026

Uh oh!

chraac commented Apr 2, 2026

Uh oh!

max-krasnyansky commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chraac commented Mar 27, 2026 •

edited

Loading