Skip to content

Misc. bug: Various memory related issues with SYCL #16516

@AaronBeier

Description

@AaronBeier

Name and Version

vulkan:
version: 6719 (aa4711d)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu

sycl:
version: 6719 (aa4711d)
built with Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205) for x86_64-unknown-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

libllama (core library)

Command line

vulkan:
llama-server --threads 12 --prio 2 --ctx-size 12288 --gpu-layers 100 --model xxxx --host 0.0.0.0 --port 9091 --no-webui --props --no-slots

sycl:
ONEAPI_DEVICE_SELECTOR="level_zero:0" ZES_ENABLE_SYSMAN=1 llama-server --threads 12 --prio 2 --ctx-size 12288 --gpu-layers 100 --model xxxx --host 0.0.0.0 --port 9091 --no-webui --props --no-slots

Problem description & steps to reproduce

1:
Compared to Vulkan, SYCL fits less context given the exact same model and parameters:
SYCL can handle ~9200 tokens before crashing with OOM while Vulkan can handle the full 12288 tokens
CUDA on an nvidia card with the same amount of VRAM can also handle the full 12288 tokens
Thats a difference of ~3000 tokens while all three runs were run on 12GB cards with no desktop environment or other gpu-using programs running

2:
ext_intel_free_memory is not supported gets printed like 4 times, suggesting to set ZES_ENABLE_SYSMAN even when ZES_ENABLE_SYSMAN=1 is already set

log-vulkan.txt
log-sycl.txt
build-commands.txt
os.txt
hw.txt

First Bad Commit

No response

Relevant log output

logs attached as files above

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions