-
Notifications
You must be signed in to change notification settings - Fork 19.5k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
CUDA: size routed MoE MMQ N-tiles from typical expert width on RDNA3
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#24546
opened Jun 12, 2026 by
ravel7524
Contributor
Loading…
docs: add eagle3 to speculative doc
documentation
Improvements or additions to documentation
#24540
opened Jun 12, 2026 by
LiaXLiang
Loading…
spec: add spec metrics mean acceptance length and acceptance rate per position
examples
server
#24536
opened Jun 12, 2026 by
ruixiang63
Contributor
Loading…
ggml cpu: Improve dequantize vectorisation for q4, q5
ggml
changes relating to the ggml tensor library for machine learning
#24535
opened Jun 12, 2026 by
mkj
Loading…
CUDA: only support F32/F16 for GGML_OP_REPEAT
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
testing
Everything test related
#24533
opened Jun 12, 2026 by
leonardHONG
Contributor
Loading…
ggml-webgpu: improve i-quants mul_mat performance and speed up prefill
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#24530
opened Jun 12, 2026 by
yomaytk
Contributor
Loading…
fix sycl links in release notes
devops
improvements to build systems and github actions
#24527
opened Jun 12, 2026 by
muhammad-salem
Loading…
cuda : fix flash attention crash for changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
d_head=512 with gqa_ratio=2
ggml
#24526
opened Jun 12, 2026 by
nhs000
Contributor
Loading…
Preliminary MiniMax-M3 support
model
Model specific
python
python script changes
testing
Everything test related
#24523
opened Jun 12, 2026 by
danielhanchen
Contributor
•
Draft
fit : wrap llama_device_memory_data
examples
server
#24522
opened Jun 12, 2026 by
ggerganov
Member
Loading…
bench : add --offline
examples
python
python script changes
server
#24511
opened Jun 12, 2026 by
angt
Member
Loading…
opencl: optimize mul_mat_f16_f32 for decode
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
openvino: OV 2026.2, context-shift, Q5_1 support, gemma4 dense/embedding, and -fa off
devops
improvements to build systems and github actions
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
OpenVINO
#24503
opened Jun 12, 2026 by
wine99
Contributor
Loading…
1 task done
UI : fix SSE transport detection and routing through CORS proxy. Assi…
examples
server/ui
#24500
opened Jun 12, 2026 by
hrpnr
Loading…
server : reset fit_params_target to baseline on model (re)load
examples
server
#24498
opened Jun 12, 2026 by
liminfei-amd
Loading…
1 task done
ggml: Update VMM Pool allocation ggml-cuda.cu - Turing P2P access fix (fixes #24489)
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#24491
opened Jun 11, 2026 by
VexxieCode
Loading…
common: update logging to enforce max_capacity and optimize queue resizing
#24490
opened Jun 11, 2026 by
max-krasnyansky
Member
Loading…
CUDA: Fuse MMVQ post-scale for NVFP4
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
testing
Everything test related
#24481
opened Jun 11, 2026 by
ORippler
Collaborator
Loading…
[SYCL] add dev2dev memcpy by SYCL API
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#24476
opened Jun 11, 2026 by
arthw
Contributor
Loading…
fix(hexagon): use padded stride for ssm-conv weights
ggml
changes relating to the ggml tensor library for machine learning
Hexagon
#24470
opened Jun 11, 2026 by
BiReRa
Loading…
docs(server): add llama-server WebUI settings complete guide
documentation
Improvements or additions to documentation
#24467
opened Jun 11, 2026 by
xusk999
Loading…
docs : fix typos in CUDA-FEDORA.md and grammars/README.md
documentation
Improvements or additions to documentation
#24459
opened Jun 11, 2026 by
m-atharkhan
Loading…
ggml: improve RVV q4_0 GEMM prefill locality
ggml
changes relating to the ggml tensor library for machine learning
#24456
opened Jun 11, 2026 by
ZephyrLi-pro
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-05-12.