Skip to content

Conversation

@lhez
Copy link
Collaborator

@lhez lhez commented Nov 24, 2025

This PR add new ops - sqr, sqrt, mean and ssm_conv needed by models like gemma3n and lfm2 models.

@CISC
Copy link
Collaborator

CISC commented Nov 24, 2025

@lhez It's time to update ops.md and OpenCL.csv too. :)

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Nov 24, 2025
@lhez lhez force-pushed the lh/sqr-sqrt-mean-ssm-conv branch from 253dd70 to 338d9ac Compare November 25, 2025 06:41
@lhez lhez marked this pull request as ready for review November 25, 2025 18:15
Copy link
Collaborator

@max-krasnyansky max-krasnyansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

SSM_CONV in particular gives a nice perf boost for the LFM2 models.

LFM2-1.2B-Q4_0.gguf

Before:
common_perf_print: prompt eval time =     286.12 ms /   212 tokens (    1.35 ms per token,   740.95 tokens per second)
common_perf_print:        eval time =     966.67 ms /    50 runs   (   19.33 ms per token,    51.72 tokens per second)

After:
common_perf_print: prompt eval time =     250.62 ms /   212 tokens (    1.18 ms per token,   845.91 tokens per second)
common_perf_print:        eval time =    1126.55 ms /    63 runs   (   17.88 ms per token,    55.92 tokens per second)

I checked the SCHED_DEBUG output and we now claim all the SSM_CONV ops in LFM2.

@max-krasnyansky max-krasnyansky merged commit 7cba58b into ggml-org:master Nov 26, 2025
65 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants