Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

CUDA: size routed MoE MMQ N-tiles from typical expert width on RDNA3 ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#24546 opened Jun 12, 2026 by ravel7524 Contributor Loading…
docs: add eagle3 to speculative doc documentation Improvements or additions to documentation
#24540 opened Jun 12, 2026 by LiaXLiang Loading…
ggml cpu: Improve dequantize vectorisation for q4, q5 ggml changes relating to the ggml tensor library for machine learning
#24535 opened Jun 12, 2026 by mkj Loading…
CUDA: only support F32/F16 for GGML_OP_REPEAT ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
#24533 opened Jun 12, 2026 by leonardHONG Contributor Loading…
ggml-webgpu: improve i-quants mul_mat performance and speed up prefill ggml changes relating to the ggml tensor library for machine learning WebGPU
#24530 opened Jun 12, 2026 by yomaytk Contributor Loading…
fix sycl links in release notes devops improvements to build systems and github actions
#24527 opened Jun 12, 2026 by muhammad-salem Loading…
cuda : fix flash attention crash for d_head=512 with gqa_ratio=2 ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#24526 opened Jun 12, 2026 by nhs000 Contributor Loading…
Preliminary MiniMax-M3 support model Model specific python python script changes testing Everything test related
#24523 opened Jun 12, 2026 by danielhanchen Contributor Draft
fit : wrap llama_device_memory_data examples server
#24522 opened Jun 12, 2026 by ggerganov Member Loading…
bench : add --offline examples python python script changes server
#24511 opened Jun 12, 2026 by angt Member Loading…
opencl: optimize mul_mat_f16_f32 for decode ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend
#24504 opened Jun 12, 2026 by lhez Contributor Draft
openvino: OV 2026.2, context-shift, Q5_1 support, gemma4 dense/embedding, and -fa off devops improvements to build systems and github actions documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning OpenVINO
#24503 opened Jun 12, 2026 by wine99 Contributor Loading…
1 task done
common: add sampling env vars examples server
#24494 opened Jun 11, 2026 by Abrosimov-a-a Loading…
ggml: Update VMM Pool allocation ggml-cuda.cu - Turing P2P access fix (fixes #24489) ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#24491 opened Jun 11, 2026 by VexxieCode Loading…
CUDA: Fuse MMVQ post-scale for NVFP4 ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
#24481 opened Jun 11, 2026 by ORippler Collaborator Loading…
[SYCL] add dev2dev memcpy by SYCL API documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#24476 opened Jun 11, 2026 by arthw Contributor Loading…
fix(hexagon): use padded stride for ssm-conv weights ggml changes relating to the ggml tensor library for machine learning Hexagon
#24470 opened Jun 11, 2026 by BiReRa Loading…
docs(server): add llama-server WebUI settings complete guide documentation Improvements or additions to documentation
#24467 opened Jun 11, 2026 by xusk999 Loading…
docs : fix typos in CUDA-FEDORA.md and grammars/README.md documentation Improvements or additions to documentation
#24459 opened Jun 11, 2026 by m-atharkhan Loading…
ggml: improve RVV q4_0 GEMM prefill locality ggml changes relating to the ggml tensor library for machine learning
#24456 opened Jun 11, 2026 by ZephyrLi-pro Loading…
ProTip! What’s not been updated in a month: updated:<2026-05-12.