forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 4
Sync master with upstream release b6907 #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jan-service-account
merged 16 commits into
dev
from
update-dev-from-master-2025-11-01-00-37
Nov 1, 2025
Merged
Sync master with upstream release b6907 #310
jan-service-account
merged 16 commits into
dev
from
update-dev-from-master-2025-11-01-00-37
Nov 1, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Update requirements-convert_legacy_llama.txt Updated requirements to support Qwen3-VL in transformers 4.57.1 version * Update requirements/requirements-convert_legacy_llama.txt Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
…ml-org#16836) * respect input size when getting/setting tensor data allows partial repacking/copying when get tensor size is smaller than the actual tensor * Removed duplicate repack_mxfp4_mxfp4x4x2 function
* vulkan: fix shmem overrun in mmq id shader * metal : fix mul_mm_id --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
…ml-org#16796) * Experimenting crash fix * added assert for aborting and fixed comment * changed to check if a pipeline is empty or not * Moved function in class definition * replaced with is_empty * Modified is_empty to check only unaligned pipelines
* CUDA: add expert reduce kernel * contigous checks, better formatting, use std::vector instead of array * use vector empty instead of size Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* CUDA: Volta tensor core support for MMF * more generic checks for hardware support * Update ggml/src/ggml-cuda/mmf.cuh Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* Model: Minimax M2 * Cleanup * Cleanup pt. 2 * Cleanup pt. 3 * Update convert_hf_to_gguf_update.py - merge catch blocks Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Remove vocab models and test * Remove all redundant hparam settings covered by TextModel * Move super to start, don't set block_count * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/constants.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Sqashed: llama-model.cpp refactoring * Fix formatting of attn / ffn / ffn_moe calls * Fix import regression / unify spacing in models.h * totally DID NOT miss those! * Add missing qwen3vl(moe) models * Add missing new .cpp files to build * Remove extra semicolons * Editor checker * Update src/models/models.h Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates dev branch with latest release (b6907) from ggml-org/llama.cpp