Skip to content

Conversation

cern1710
Copy link
Contributor

@cern1710 cern1710 commented Oct 12, 2025

This PR adds Metal backend for the OPT_STEP_SGD operator.

Implementation

  • Added kernel_opt_step_sgd_f32 kernel in ggml-metal.metal
  • Implemented argument struct ggml_metal_kargs_opt_step_sgd
  • Storingnp in the struct as a passed constant & args
  • Threadgroup size calculation, similar to ggml_metal_op_opt_step_sgd

I've ran ./test-opt and ./test-backend-ops to verify the validity of this implementation.

Note: The implementation is mostly identical to #16529, with the only major difference being the kernel itself.

@cern1710 cern1710 requested a review from ggerganov as a code owner October 12, 2025 19:19
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Oct 12, 2025
@ggerganov ggerganov merged commit 3f750f8 into ggml-org:master Oct 13, 2025
67 checks passed
@cern1710 cern1710 deleted the metal-opt-step-sgd branch October 13, 2025 08:27
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Oct 13, 2025
* origin/master: (32 commits)
metal : FA support F32 K and V and head size = 32 (ggml-org#16531)
graph : support cacheless embeddings with FA and iSWA (ggml-org#16528)
opencl: fix build targeting CL 2 (ggml-org#16554)
CUDA: fix numerical issues in tile FA kernel (ggml-org#16540)
ggml : fix build broken with -march=armv9-a on MacOS (ggml-org#16520)
CANN: fix CPU memory leak in CANN backend (ggml-org#16549)
fix: add remark plugin to render raw HTML as literal text (ggml-org#16505)
metal: add support for opt_step_sgd (ggml-org#16539)
ggml : fix scalar path for computing norm (ggml-org#16558)
CANN: Update several operators to support FP16 data format (ggml-org#16251)
metal : add opt_step_adamw and op_sum (ggml-org#16529)
webui: remove client-side context pre-check and rely on backend for limits (ggml-org#16506)
[SYCL] fix UT fault cases: count-equal, argsort, pad OPs (ggml-org#16521)
ci : add Vulkan on Ubuntu with default packages build (ggml-org#16532)
common : handle unicode during partial json parsing (ggml-org#16526)
common : update presets (ggml-org#16504)
ggml : Fix FP16 ELU positive branch (ggml-org#16519)
hparams : add check for layer index in is_recurrent (ggml-org#16511)
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (ggml-org#16518)
CUDA: faster tile FA, add oob checks, more HSs (ggml-org#16492)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants