SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations by PMZFX · Pull Request #21597 · ggml-org/llama.cpp

PMZFX · 2026-04-08T01:06:20Z

Summary

Replace sycl::malloc_device with zeMemAllocDevice for GPU memory allocation in the SYCL backend
Replace sycl::free with zeMemFree for corresponding deallocations
Replace host-staged dev2dev_memcpy with direct Level Zero cross-device copy
Link against ze_loader for Level Zero API access
All changes include automatic fallback to original SYCL path if Level Zero is unavailable

Problem

On Intel multi-GPU systems, sycl::malloc_device triggers the xe kernel driver's DMA-buf/TTM export path (xe_gem_prime_export -> ttm_pool_alloc_page), which creates a 1:1 mirror of every VRAM allocation in system RAM. This causes system RAM to scale linearly with total VRAM allocated across GPUs, leading to OOM crashes during multi-GPU inference even when models fit entirely in VRAM.

Measured on dual Intel Arc Pro B70 (32GB each, 64GB total VRAM) with 64GB system RAM:

sycl::malloc_device 4 GiB = +4,112 MiB system RAM (1:1 mirror)
zeMemAllocDevice 4 GiB = +8 MiB system RAM (no mirror)

A 15.6 GiB Q4_K_M model consumed 60 GiB of system RAM during dual-GPU inference with sycl::malloc_device, causing repeated OOM crashes.

Solution

zeMemAllocDevice allocates GPU memory through Level Zero's SVM/P2P path instead of the DMA-buf/TTM path, avoiding the host memory staging entirely. SYCL kernels can read zeMemAllocDevice pointers with full interop, no compatibility issues.

Changes:

New ggml_sycl_malloc_device() / ggml_sycl_free_device() helpers that try Level Zero first, fall back to SYCL
Replaced 3 allocation sites: single-device buffer, split buffer, memory pool
Replaced 3 deallocation sites: buffer destructor, pool destructor, pool overflow
Updated dpct_malloc helper with same Level Zero path
Updated release_extra_gpu with zeMemFree
Updated dev2dev_memcpy to use zeCommandListAppendMemoryCopy for direct cross-device transfers

Test results

Dual Intel Arc Pro B70 (32GB each), AMD Ryzen 5 9600X, 64GB DDR5, Ubuntu 26.04, kernel 7.0, compute-runtime 26.09. Model: Qwen3.5-27B.

Q4_K_M, 48K context, dual GPU (-sm layer):

Metric	Before	After
Peak system RAM	60,034 MiB (100%), OOM crash	~6.7 GiB (10%), flat
pp48000	OOM crash	782 t/s
pp512	348 t/s	359 t/s
tg128	17.92 t/s	17.82 t/s

Q8_0, 32K context, dual GPU: 915 t/s, system RAM flat.

Single GPU: No regression. 467 t/s pp512, 17.12 tg128.

Correctness: Output is byte-for-byte identical between single and dual GPU with same seed (verified Q4_K_M, Q6_K).

Test plan

Single GPU inference (no regression)
Dual GPU pp512/tg128 (Q4_K_M, Q6_K, Q8_0)
Dual GPU large context (48K Q4_K_M, 48K Q6_K, 32K Q8_0)
System RAM stays flat during all dual-GPU tests
Correctness: single vs dual GPU output matches with fixed seed
Clean exit (no crash during cleanup/teardown)
Fallback path: builds and works without Level Zero

…ions Replace sycl::malloc_device with zeMemAllocDevice for GPU memory allocation in the SYCL backend. sycl::malloc_device triggers the xe kernel driver's DMA-buf/TTM path which mirrors every VRAM allocation 1:1 in system RAM. zeMemAllocDevice uses the SVM/P2P path with no host staging. On a dual Intel Arc Pro B70 system (64GB VRAM, 64GB RAM), a 15.6 GiB model consumed 60 GiB of system RAM via sycl::malloc_device, causing OOM crashes. With zeMemAllocDevice, the same workload uses ~6.7 GiB of system RAM with no performance regression. All Level Zero calls include automatic fallback to the original SYCL allocation path if Level Zero interop is unavailable.

arthw

don't use try .. cache in malloc/free memory function.
It will add more cost.
Just check the return value and call backup function.

arthw

I will test it on windows.
I will feedback the result.

Thank you!

arthw · 2026-04-08T02:48:07Z

ggml/src/ggml-sycl/common.cpp

-            SYCL_CHECK(
-                CHECK_TRY_ERROR(sycl::free(extra->data_device[i], *(streams[i]))));
+            bool freed = false;
+            try {


Use new function to replace the duplicated code to free memory.
Handle the result by SYCL_CHECK(CHECK_TRY_ERROR()) which print out stack info.

ggml/src/ggml-sycl/ggml-sycl.cpp

arthw · 2026-04-08T02:50:58Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+
+static void ggml_sycl_free_device(void *ptr, sycl::queue &q) {
+    if (!ptr) return;
+    try {


remove try ... catch

arthw · 2026-04-08T02:53:15Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+
 static void dev2dev_memcpy(sycl::queue &q_dst, sycl::queue &q_src, void *ptr_dst,
                    const void *ptr_src, size_t size) {
+    try {


The legacy code support memcpy between iGPU and dGPU.
System API only support between dGPUs.
So, check the dev's type before call ze API.
In case that dGPU to dGPU, use the new code.

Remove try... catch which is expensive.

arthw · 2026-04-08T02:53:41Z

ggml/src/ggml-sycl/ggml-sycl.cpp

-    void * dev_ptr;
-    SYCL_CHECK(CHECK_TRY_ERROR(dev_ptr = (void *)sycl::malloc_device(
-                                    size, *stream)));
+    void * dev_ptr = ggml_sycl_malloc_device(size, *stream);


add SYCL_CHECK(CHECK_TRY_ERROR())

still need add SYCL_CHECK(CHECK_TRY_ERROR() to print out the call stack when crash.

arthw · 2026-04-08T02:54:02Z

ggml/src/ggml-sycl/ggml-sycl.cpp

-        */
-        SYCL_CHECK(CHECK_TRY_ERROR(buf = (char *)sycl::malloc_device(
-                                        size, *stream)));
+        char * buf = (char *)ggml_sycl_malloc_device(size, *stream);


add SYCL_CHECK(CHECK_TRY_ERROR())

still need SYCL_CHECK(CHECK_TRY_ERROR()

ggml/src/ggml-sycl/ggml-sycl.cpp

arthw · 2026-04-08T06:35:44Z

@PMZFX
The windows build is not supported by this PR.
Please use following patch to support windows build.
The windows build can work well, but I can't test the performance.
Maybe someone can test the performance on windows.

diff --git a/ggml/src/ggml-sycl/CMakeLists.txt b/ggml/src/ggml-sycl/CMakeLists.txt
index f87835b3c..90a416505 100644
--- a/ggml/src/ggml-sycl/CMakeLists.txt
+++ b/ggml/src/ggml-sycl/CMakeLists.txt
@@ -39,6 +39,19 @@ if (WIN32)
         set(CMAKE_CXX_COMPILER "icx")
         set(CMAKE_CXX_COMPILER_ID "IntelLLVM")
     endif()
+    if(DEFINED ENV{LEVEL_ZERO_V1_SDK_PATH})
+        message(STATUS "LEVEL_ZERO_V1_SDK_PATH is set to: $ENV{LEVEL_ZERO_V1_SDK_PATH}")
+        set(LEVEL_ZERO_V1_SDK_PATH $ENV{LEVEL_ZERO_V1_SDK_PATH})
+        if(EXISTS "${LEVEL_ZERO_V1_SDK_PATH}")
+            target_include_directories(ggml-sycl PRIVATE "${LEVEL_ZERO_V1_SDK_PATH}/include")
+            set(LEVEL_ZERO_V1_SDK_LIB_PATH $ENV{LEVEL_ZERO_V1_SDK_PATH}/lib)
+        else()
+            message(FATAL_ERROR "Miss to detect folder ${LEVEL_ZERO_V1_SDK_PATH}, please install the Intel GPU Driver.")
+        endif()
+     else()
+        message(WARNING "LEVEL_ZERO_V1_SDK_PATH is NOT set")
+        message(FATAL_ERROR "Miss to detect ENV LEVEL_ZERO_V1_SDK_PATH, please install the Intel GPU Driver.")
+     endif()
 endif()
 
 macro(detect_and_find_package package_name)
@@ -96,7 +109,7 @@ target_compile_options(ggml-sycl PRIVATE "-Wno-narrowing")
 # Link against Level Zero loader for direct device memory allocation.
 # Avoids sycl::malloc_device triggering DMA-buf/TTM system RAM staging
 # in the xe kernel driver during multi-GPU inference.
-find_library(ZE_LOADER_LIB ze_loader HINTS ${ONEAPI_ROOT}/lib ENV LD_LIBRARY_PATH)
+find_library(ZE_LOADER_LIB ze_loader HINTS ${ONEAPI_ROOT}/lib ${LEVEL_ZERO_V1_SDK_LIB_PATH} ENV LD_LIBRARY_PATH)
 if(ZE_LOADER_LIB)
     target_link_libraries(ggml-sycl PRIVATE ${ZE_LOADER_LIB})
     message(STATUS "Level Zero loader found: ${ZE_LOADER_LIB}")

@arthw

… deduplicate - Remove try/catch from malloc/free/memcpy helpers, check backend and device type upfront instead (ggml_sycl_is_level_zero, ggml_sycl_is_dgpu) - Move shared helpers (is_level_zero, is_dgpu, free_device) to common.cpp and declare in common.hpp to eliminate code duplication - Use SYCL_CHECK(CHECK_TRY_ERROR()) for fallback sycl::free calls - Guard dev2dev_memcpy L0 path to dGPU-to-dGPU only, preserving the host-staged path for iGPU-to-dGPU transfers - Add Windows Level Zero SDK path detection (LEVEL_ZERO_V1_SDK_PATH) in CMakeLists.txt (co-authored with @arthw)

PMZFX · 2026-04-08T08:45:39Z

@arthw Thanks for the thorough review. I've pushed a follow-up commit addressing your feedback:

Removed all try/catch, replaced with upfront backend/device type checks (ggml_sycl_is_level_zero, ggml_sycl_is_dgpu)
Moved shared helpers to common.cpp/common.hpp to eliminate duplication
Added SYCL_CHECK(CHECK_TRY_ERROR()) for fallback free calls
Guarded dev2dev_memcpy L0 path to dGPU-to-dGPU only
Incorporated your Windows Level Zero SDK path patch in CMakeLists.txt

Let me know if anything else needs attention.

arthw

Because this PR involve the level zero API firstly. There are more issues to be considered.

Build level zero API need to install the GPU driver (level zero running-time) in building server. In some CI, the building server is pure CPU(Xeon) machine. That will break the building of level zero API.
Some SYCL memory features are on the way. like SYCL graph and SVM. These feature still need SYCL memory API.
SYCL memory API is based on level zero memory API. Skip SYCL to call level zero API will lose some benefit of SYCL code.

Suggestion：

define building parameter： GGML_SYCL_SUPPORT_LEVEL_ZERO in ggml/CMakeLists.txt
refer to GGML_SYCL_GRAPH.
default value is "ON"
In code, use this macro (GGML_SYCL_SUPPORT_LEVEL_ZERO) to screen the all level-zero code/include. So that if it's off, the code can be built without installing level zero lib and headers.
Define an ENV variable GGML_SYCL_ENABLE_LEVEL_ZERO in ggml-sycl.cpp, like GGML_SYCL_DISABLE_GRAPH. It will control in running time.
SYCL backend memory APIs include two sub functions: SYCL and Level Zero.
If GGML_SYCL_SUPPORT_LEVEL_ZERO = ON, it includes two branchs: SYCL and Level Zero. GGML_SYCL_ENABLE_LEVEL_ZERO is used to control the branch in running time.
If GGML_SYCL_SUPPORT_LEVEL_ZERO = OFF, it includes one branchs: SYCL in code level.
So, it won't appear that mix SYCL and Level Zero memory API usage in a session: only one style APIs are used. If malloc is fault, the code won't switch to another API.
SYCL.md should be updated to guide for above new parameters and dependence of Intel GPU driver installation to build for level zero API usage.

How do you think?

Thank you!

arthw · 2026-04-08T10:08:09Z

ggml/src/ggml-sycl/dpct/helper.hpp


        static inline void *dpct_malloc(size_t size, sycl::queue &q)
        {
+            try {


remove try... catch.

This code is duplicated with ggml-sycl.cpp. Suggest defining new function for ze memory.

arthw · 2026-04-08T10:09:18Z

ggml/src/ggml-sycl/common.cpp

  return sycl_down_blk_size;
 }

+bool ggml_sycl_is_level_zero(sycl::queue &q) {


SYCL backend is designed to run on level-zero only.
No need to check the level-zero running time here.

arthw · 2026-04-08T10:19:05Z

ggml/src/ggml-sycl/common.cpp

+    return q.get_backend() == sycl::backend::ext_oneapi_level_zero;
+}
+
+bool ggml_sycl_is_dgpu(sycl::queue &q) {


Suggest to save the hardware info in initial stage.
Refer to:
ggml-sycl.cpp:

info.devices[i].smpbo = prop.get_local_mem_size();

common.hpp:

struct sycl_device_info { size_t smpbo; ... }

ggml/src/ggml-sycl/ggml-sycl.cpp

arthw · 2026-04-08T10:23:21Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+// via xe_gem_prime_export, consuming system RAM equal to VRAM allocated.
+// zeMemAllocDevice uses the SVM/P2P path with no host staging.
+static void * ggml_sycl_malloc_device(size_t size, sycl::queue &q) {
+    if (ggml_sycl_is_level_zero(q) && ggml_sycl_is_dgpu(q)) {


define the malloc/free memory by ze into new functions.

arthw · 2026-04-08T10:25:02Z

ggml/src/ggml-sycl/ggml-sycl.cpp

                    const void *ptr_src, size_t size) {
+    // Use Level Zero direct copy for dGPU-to-dGPU transfers.
+    // The legacy host-staged path supports iGPU-to-dGPU copies.
+    if (ggml_sycl_is_level_zero(q_dst) && ggml_sycl_is_dgpu(q_dst) && ggml_sycl_is_dgpu(q_src)) {


no need to check the level zero.

arthw · 2026-04-08T10:25:30Z

ggml/src/ggml-sycl/ggml-sycl.cpp

-    void * dev_ptr;
-    SYCL_CHECK(CHECK_TRY_ERROR(dev_ptr = (void *)sycl::malloc_device(
-                                    size, *stream)));
+    void * dev_ptr = ggml_sycl_malloc_device(size, *stream);


still need add SYCL_CHECK(CHECK_TRY_ERROR() to print out the call stack when crash.

arthw · 2026-04-08T10:25:44Z

ggml/src/ggml-sycl/ggml-sycl.cpp

-        */
-        SYCL_CHECK(CHECK_TRY_ERROR(buf = (char *)sycl::malloc_device(
-                                        size, *stream)));
+        char * buf = (char *)ggml_sycl_malloc_device(size, *stream);


still need SYCL_CHECK(CHECK_TRY_ERROR()

@arthw

Implements the architecture suggested by @arthw: compile-time and runtime flags to cleanly separate Level Zero and SYCL memory API paths. - Add GGML_SYCL_SUPPORT_LEVEL_ZERO cmake option (default ON). All Level Zero code is wrapped in #ifdef so the build works on systems without the Level Zero SDK installed (e.g. CPU-only CI servers). Both the loader library and headers are checked before enabling. - Add GGML_SYCL_ENABLE_LEVEL_ZERO runtime env var (default 1). Controls whether Level Zero or SYCL memory APIs are used. Only one API style is used per session, no mixing. If Level Zero is enabled but the devices don't support the Level Zero backend, it auto-disables with a warning. - Remove Level Zero code from dpct_malloc. It was unused (dpct::device_memory is not called anywhere in the backend) and used try/catch for flow control. - Update SYCL.md with documentation for both new parameters. Tested on Intel Arc Pro B70 (32GB), single-GPU and dual-GPU, with both GGML_SYCL_SUPPORT_LEVEL_ZERO=ON and OFF builds. AI-assisted development (Claude). Code reviewed and tested on my hardware.

PMZFX · 2026-04-08T12:51:12Z

@arthw Thanks for the additional suggestions on the build/runtime flag architecture. Pushed a new commit implementing your approach:

Added GGML_SYCL_SUPPORT_LEVEL_ZERO cmake option (default ON), all L0 code/includes wrapped in #ifdef. Checks for both the loader library and headers before enabling, so it degrades cleanly on systems without the L0 SDK.
Added GGML_SYCL_ENABLE_LEVEL_ZERO runtime env var (default 1) to control which API is used. No mixing of L0 and SYCL memory APIs within a session.
Added a startup check that verifies devices actually use the Level Zero backend before enabling L0 APIs. Auto-disables with a warning if they don't.
Removed the L0 code from dpct_malloc (it was dead code and still had the try/catch issue).
Updated SYCL.md with both new parameters.

Tested with both GGML_SYCL_SUPPORT_LEVEL_ZERO=ON and OFF builds, and with the runtime flag toggled both ways. Let me know what you think.

HumerousGorgon · 2026-04-09T00:02:37Z

Will this need a docs update with the new build variable?

arthw · 2026-04-09T06:47:02Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO
+    dev_ptr = ggml_sycl_malloc_device(size, *stream);
+#else
+    SYCL_CHECK(CHECK_TRY_ERROR(dev_ptr = (void *)sycl::malloc_device(size, *stream)));
+#endif


Suggested change

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

dev_ptr = ggml_sycl_malloc_device(size, *stream);

#else

SYCL_CHECK(CHECK_TRY_ERROR(dev_ptr = (void *)sycl::malloc_device(size, *stream)));

#endif

SYCL_CHECK(CHECK_TRY_ERROR(dev_ptr = (void *)ggml_sycl_malloc_device(size, *stream)));

arthw · 2026-04-09T06:47:37Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO
+        buf = (char *)ggml_sycl_malloc_device(size, *stream);
+#else
+        SYCL_CHECK(CHECK_TRY_ERROR(buf = (char *)sycl::malloc_device(size, *stream)));
+#endif


Suggested change

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

buf = (char *)ggml_sycl_malloc_device(size, *stream);

#else

SYCL_CHECK(CHECK_TRY_ERROR(buf = (char *)sycl::malloc_device(size, *stream)));

#endif

SYCL_CHECK(CHECK_TRY_ERROR(dev_ptr = (void *)ggml_sycl_malloc_device(size, *stream)));

arthw · 2026-04-09T06:48:29Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO
+        ptr = ggml_sycl_malloc_device(look_ahead_size, *qptr);
+#else
+        SYCL_CHECK(CHECK_TRY_ERROR(ptr = (void *)sycl::malloc_device(look_ahead_size, *qptr)));
+#endif


Suggested change

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

ptr = ggml_sycl_malloc_device(look_ahead_size, *qptr);

#else

SYCL_CHECK(CHECK_TRY_ERROR(ptr = (void *)sycl::malloc_device(look_ahead_size, *qptr)));

#endif

SYCL_CHECK(CHECK_TRY_ERROR(ggml_sycl_malloc_device(look_ahead_size, *qptr)));

arthw · 2026-04-09T06:49:06Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO
+        ggml_sycl_free_device(ptr, *qptr);
+#else
        SYCL_CHECK(CHECK_TRY_ERROR(sycl::free(ptr, *qptr)));
+#endif


Suggested change

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

ggml_sycl_free_device(ptr, *qptr);

#else

SYCL_CHECK(CHECK_TRY_ERROR(sycl::free(ptr, *qptr)));

#endif

SYCL_CHECK(CHECK_TRY_ERROR(ggml_sycl_free_device(ptr, *qptr)));

arthw · 2026-04-09T06:49:41Z

ggml/CMakeLists.txt

 option(GGML_SYCL                            "ggml: use SYCL"                                  OFF)
 option(GGML_SYCL_F16                        "ggml: use 16 bit floats for sycl calculations"   OFF)
 option(GGML_SYCL_GRAPH                      "ggml: enable graphs in the SYCL backend"         ON)
+option(GGML_SYCL_SUPPORT_LEVEL_ZERO         "ggml: use Level Zero for device memory in SYCL"  ON)


Suggested change

option(GGML_SYCL_SUPPORT_LEVEL_ZERO "ggml: use Level Zero for device memory in SYCL" ON)

option(GGML_SYCL_SUPPORT_LEVEL_ZERO "ggml: use Level Zero API in SYCL backend" ON)

arthw · 2026-04-09T06:58:07Z

ggml/src/ggml-sycl/common.hpp

+#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO
+extern int g_ggml_sycl_enable_level_zero;
+void ggml_sycl_free_device(void *ptr, sycl::queue &q);
+#endif


Suggested change

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

extern int g_ggml_sycl_enable_level_zero;

void ggml_sycl_free_device(void *ptr, sycl::queue &q);

#endif

extern int g_ggml_sycl_enable_level_zero;

void ggml_sycl_free_device(void *ptr, sycl::queue &q);

arthw · 2026-04-09T06:59:00Z

ggml/src/ggml-sycl/common.cpp

+#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO
+            ggml_sycl_free_device(extra->data_device[i], *(streams[i]));
+#else
+            SYCL_CHECK(CHECK_TRY_ERROR(sycl::free(extra->data_device[i], *(streams[i]))));
+#endif


Suggested change

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

ggml_sycl_free_device(extra->data_device[i], *(streams[i]));

#else

SYCL_CHECK(CHECK_TRY_ERROR(sycl::free(extra->data_device[i], *(streams[i]))));

#endif

SYCL_CHECK(CHECK_TRY_ERROR(ggml_sycl_free_device(extra->data_device[i], *(streams[i]))));

arthw · 2026-04-09T07:01:39Z

ggml/src/ggml-sycl/common.cpp

+#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO
+void ggml_sycl_free_device(void *ptr, sycl::queue &q) {
+    if (!ptr) return;
+    if (g_ggml_sycl_enable_level_zero) {
+        auto ze_ctx = sycl::get_native<sycl::backend::ext_oneapi_level_zero>(q.get_context());
+        zeMemFree(ze_ctx, ptr);
+        return;
+    }
+    SYCL_CHECK(CHECK_TRY_ERROR(sycl::free(ptr, q)));
+}
+#endif
+


Suggested change

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

void ggml_sycl_free_device(void *ptr, sycl::queue &q) {

if (!ptr) return;

if (g_ggml_sycl_enable_level_zero) {

auto ze_ctx = sycl::get_native<sycl::backend::ext_oneapi_level_zero>(q.get_context());

zeMemFree(ze_ctx, ptr);

return;

}

SYCL_CHECK(CHECK_TRY_ERROR(sycl::free(ptr, q)));

}

#endif

void ggml_sycl_free_device(void *ptr, sycl::queue &q) {

if (!ptr) return;

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

if (g_ggml_sycl_enable_level_zero) {

auto ze_ctx = sycl::get_native<sycl::backend::ext_oneapi_level_zero>(q.get_context());

zeMemFree(ze_ctx, ptr);

return;

}

#endif

SYCL_CHECK(CHECK_TRY_ERROR(sycl::free(ptr, q)));

return;

}

arthw · 2026-04-09T07:06:06Z

ggml/src/ggml-sycl/common.cpp

+#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO
+#include <sycl/backend.hpp>
+#include <level_zero/ze_api.h>
+#endif


Suggested change

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

#include <sycl/backend.hpp>

#include <level_zero/ze_api.h>

#endif

#include <sycl/backend.hpp>

#ifdef GGML_SYCL_SUPPORT_LEVEL_ZERO

#include <level_zero/ze_api.h>

#endif

arthw · 2026-04-09T07:07:30Z

ggml/src/ggml-sycl/CMakeLists.txt

+if (GGML_SYCL_SUPPORT_LEVEL_ZERO)
+    message(STATUS "GGML_SYCL_SUPPORT_LEVEL_ZERO enabled")


Suggested change

if (GGML_SYCL_SUPPORT_LEVEL_ZERO)

message(STATUS "GGML_SYCL_SUPPORT_LEVEL_ZERO enabled")

message(STATUS "GGML_SYCL_SUPPORT_LEVEL_ZERO ${GGML_SYCL_SUPPORT_LEVEL_ZERO}")

if (GGML_SYCL_SUPPORT_LEVEL_ZERO)

Move ggml_sycl_malloc_device to common.cpp alongside ggml_sycl_free_device. Both functions are now unconditionally available — Level Zero code is #ifdef'd inside the functions, not at call sites. All call sites use uniform SYCL_CHECK(CHECK_TRY_ERROR()) wrapping with no #ifdef blocks. Addresses arthw's review: wrap all malloc/free in SYCL_CHECK for stack traces on failure, eliminate duplicated #ifdef/else patterns at 6 call sites (-29 lines net). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

NeoZhangJianyu · 2026-04-10T15:43:00Z

Will this need a docs update with the new build variable?

Yes, the SYCL.md is updated to add the discription.

arthw

Because the level zero lib is mandatory part of build system by default.
The CI (compile) of SYCL need to install the level zero lib for windows and Ubuntu.
Please update in .github/workflows/build.yml.
Refer to the installation of Intel GPU driver of Windows/Ubuntu.

Here is the example code for reference:
Ubuntu:

 wget -qO - https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | sudo gpg --dearmor --output /usr/share/keyrings/oneapi-archive-keyring.gpg
    echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
    sudo apt-get update
    sudo apt-get install -y level-zero level-zero-devel intel-level-zero-gpu

Windows:

    $release = Invoke-RestMethod -Uri "https://api.github.com/repos/oneapi-src/level-zero/releases/latest"
    $asset = $release.assets | Where-Object { $_.name -like "level-zero-win-sdk*.zip" } | Select-Object -First 1

    Invoke-WebRequest -Uri $asset.browser_download_url -OutFile "level-zero-win-sdk.zip"

    Expand-Archive -Path "level-zero-win-sdk.zip" -DestinationPath "C:\level-zero-sdk" -Force

    # Set environment variables for the build (MSVC / CMake)
    echo "LEVEL_ZERO_INCLUDE_DIR=C:\level-zero-sdk\include" | Out-File -FilePath $env:GITHUB_ENV -Append
    echo "LEVEL_ZERO_LIBRARY_DIR=C:\level-zero-sdk\lib" | Out-File -FilePath $env:GITHUB_ENV -Append
    echo "C:\level-zero-sdk\lib" | Out-File -FilePath $env:GITHUB_PATH -Append   # if needed for runtime DLL

PMZFX · 2026-04-11T20:20:08Z

Thanks for the guidance and the Windows CI examples, I'll get that updated!

Add Level Zero SDK installation to Ubuntu and Windows SYCL CI jobs so the Level Zero code path is compiled and tested in CI. Fix two bugs found during extended dual-GPU testing (no ONEAPI_DEVICE_SELECTOR set): - The Level Zero backend check was iterating all SYCL devices including CPU. The OpenCL CPU device caused Level Zero to be disabled for the GPUs, defeating the fix on multi-GPU systems. Added is_gpu() filter so only GPU devices are checked. - sycl_ext_malloc_device/sycl_ext_free (tensor reorder temp buffers) were still calling sycl::malloc/sycl::free directly, bypassing the Level Zero path. Routed through ggml_sycl_malloc_device/free_device for consistency with the other device memory call sites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PMZFX · 2026-04-13T12:58:35Z

Pushed the CI update for Level Zero SDK installation (Ubuntu and Windows), and two additional fixes found during extended dual-GPU testing (no ONEAPI_DEVICE_SELECTOR set):

The Level Zero backend check now skips non-GPU devices. Without this, the OpenCL CPU device was causing Level Zero to be disabled for the GPUs, which defeats the fix on systems that don't set ONEAPI_DEVICE_SELECTOR.
Routed sycl_ext_malloc_device/sycl_ext_free (tensor reorder temp buffers) through the Level Zero allocation path for consistency with the other device memory call sites.

Tested all configurations on dual B70: L0 on (single and dual GPU), L0 off via env var, and GGML_SYCL_SUPPORT_LEVEL_ZERO=OFF build. All clean.

PMZFX requested a review from a team as a code owner April 8, 2026 01:06

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 8, 2026

arthw reviewed Apr 8, 2026

View reviewed changes

arthw reviewed Apr 9, 2026

View reviewed changes

github-actions bot added the documentation Improvements or additions to documentation label Apr 9, 2026

PMZFX mentioned this pull request Apr 9, 2026

[SYCL] Use native subgroup size for K-quant DMMV kernels on Intel #21700

Open

1 task

arthw reviewed Apr 10, 2026

View reviewed changes

Conradzz mentioned this pull request Apr 11, 2026

[XPU] Use Level Zero zeMemAllocDevice to avoid host memory shadowing pytorch/pytorch#180145

Closed

5 tasks

NeoZhangJianyu mentioned this pull request Apr 13, 2026

Feature Request: Avoid memcpy for mmap-ed weights on Unified Memory architectures (Intel Lunar Lake) #21827

Open

4 tasks

PMZFX force-pushed the sycl-fix-multigpu-ram branch from 3c0a1da to d145fc5 Compare April 13, 2026 12:44

PMZFX requested a review from a team as a code owner April 13, 2026 12:44

PMZFX force-pushed the sycl-fix-multigpu-ram branch from d145fc5 to c474bba Compare April 13, 2026 12:58

github-actions bot added the devops improvements to build systems and github actions label Apr 13, 2026

	option(GGML_SYCL_SUPPORT_LEVEL_ZERO "ggml: use Level Zero for device memory in SYCL" ON)
	option(GGML_SYCL_SUPPORT_LEVEL_ZERO "ggml: use Level Zero API in SYCL backend" ON)

		if (GGML_SYCL_SUPPORT_LEVEL_ZERO)
		message(STATUS "GGML_SYCL_SUPPORT_LEVEL_ZERO enabled")

Conversation

PMZFX commented Apr 8, 2026

Summary

Problem

Solution

Test results

Test plan

Uh oh!

arthw left a comment

Choose a reason for hiding this comment

Uh oh!

arthw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arthw commented Apr 8, 2026

Uh oh!

PMZFX commented Apr 8, 2026

Uh oh!

arthw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PMZFX commented Apr 8, 2026

Uh oh!

HumerousGorgon commented Apr 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu commented Apr 10, 2026

arthw left a comment •

edited

Loading