Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b4966) from ggml-org/llama.cpp

CISC and others added 8 commits March 25, 2025 23:03
* Fix Mistral3/Gemma3 model hparams init

* set positional args correctly

* use existing hparams if passed
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* ggml : fix MUL_MAT_ID repack with Q8_K

ggml-ci

* ggml : improve repack templates

ggml-ci
* convert : fix squeeze for ssm_conv tensors

* convert : match ssm_conv tensors by type

---------

Co-authored-by: Francis Couture-Harpin <git@compilade.net>
… backend (ggml-org#12566)

* [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize.
After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA.

* [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.
* metal : refactor mat-vec code

ggml-ci

* metal : rename all_sum -> sum_all

ggml-ci

* metal : fix comments [no ci]

* metal : fix nr constant [no ci]

* metal : mv q6_K support nr0 > 1

ggml-ci

* metal : reduce register pressure

ggml-ci

* metal : fix typo [no ci]

* metal : reduce register pressure

ggml-ci
@jan-service-account jan-service-account merged commit fcf9298 into dev Mar 27, 2025
16 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-03-27-00-08 branch March 27, 2025 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants