-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
Name and Version
vulkan:
version: 6719 (aa4711d)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu
sycl:
version: 6719 (aa4711d)
built with Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205) for x86_64-unknown-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
libllama (core library)
Command line
vulkan:
llama-server --threads 12 --prio 2 --ctx-size 12288 --gpu-layers 100 --model xxxx --host 0.0.0.0 --port 9091 --no-webui --props --no-slots
sycl:
ONEAPI_DEVICE_SELECTOR="level_zero:0" ZES_ENABLE_SYSMAN=1 llama-server --threads 12 --prio 2 --ctx-size 12288 --gpu-layers 100 --model xxxx --host 0.0.0.0 --port 9091 --no-webui --props --no-slotsProblem description & steps to reproduce
1:
Compared to Vulkan, SYCL fits less context given the exact same model and parameters:
SYCL can handle ~9200 tokens before crashing with OOM while Vulkan can handle the full 12288 tokens
CUDA on an nvidia card with the same amount of VRAM can also handle the full 12288 tokens
Thats a difference of ~3000 tokens while all three runs were run on 12GB cards with no desktop environment or other gpu-using programs running
2:
ext_intel_free_memory is not supported gets printed like 4 times, suggesting to set ZES_ENABLE_SYSMAN even when ZES_ENABLE_SYSMAN=1 is already set
log-vulkan.txt
log-sycl.txt
build-commands.txt
os.txt
hw.txt
First Bad Commit
No response
Relevant log output
logs attached as files above