Skip to content

hexagon: optimize HMX matmul operations#21071

Open
chraac wants to merge 17 commits intoggml-org:masterfrom
chraac:dev-hmx-opt
Open

hexagon: optimize HMX matmul operations#21071
chraac wants to merge 17 commits intoggml-org:masterfrom
chraac:dev-hmx-opt

Conversation

@chraac
Copy link
Copy Markdown
Contributor

@chraac chraac commented Mar 27, 2026

Overview

This pull request refactors several matrix multiplication and data handling routines in ggml-hexagon/htp/hmx-matmul-ops.c to improve type safety, consistency, and code clarity. The main changes involve standardizing loop counters and size-related variables to use size_t instead of int, updating function signatures accordingly, and simplifying tile indexing logic. Additionally, the initialization of column scales is made more consistent, and some redundant or legacy code paths are removed.

Type safety and consistency improvements:

  • Changed loop counters and size-related variables from int to size_t across multiple functions (e.g., core_dot_chunk_fp16, core_mma_chunk_fp16, transfer_output_chunk_fp16_to_fp32) and updated related calculations and function signatures for better type safety and to prevent integer overflow issues. [1] [2] [3] [4] [5]
  • Updated function signatures and local variable declarations to consistently use const size_t for sizes and counts, improving code clarity and reducing potential bugs from type mismatches. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Additional information

Tested with Qwen3.5-2b-q4, works well

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, for commit log and PR descriptions

@chraac chraac requested a review from a team as a code owner March 27, 2026 15:05
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Hexagon labels Mar 27, 2026
TIMER_START(total);

HAP_compute_res_hmx_lock(ctx->vtcm_rctx);
hmx_set_output_scales(vtcm_scales);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key change: move scale and bias initialization out of the loop instead of reinitializing the same scale each iteration, to reduces some HMX register setup.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential follow-up: reuse this scale for quant-block scaling to avoid the dequantization vmpy, at the cost of extra VTCM cache for scale storage...

@max-krasnyansky
Copy link
Copy Markdown
Member

@chraac
Sorry for the delay on this. We have a couple of decent size changes coming in this area (op request batching/buffer management and HMX optimizations).
We'll probably merge that stuff first (hopefully by end of this week), then I'm going to work with you to rebase/update/merge things on top.

@chraac
Copy link
Copy Markdown
Contributor Author

chraac commented Apr 2, 2026

We'll probably merge that stuff first (hopefully by end of this week), then I'm going to work with you to rebase/update/merge things on top.

Np, lets wait for your PR merged first, then redo this one based on that.

@max-krasnyansky
Copy link
Copy Markdown
Member

@chraac here is that PR that I mentioned #21705
There are more HMX specific updates coming on top shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Hexagon

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants